Open Source Infrastructure Engineer

Open Source Infrastructure Engineer#

An Open Source Infrastructure (OSIE) focuses on infrastructure that supports interactive computing. It intersects job titles such as “dev-ops engineer”, “site reliability engineer”, “software engineer”, and “cloud engineer”.

Outcomes#

Ensure the reliable operation of the 2i2c infrastructure by deploying and being responsible for various tools we deploy (such as Kubernetes, Dask, JupyterHub, Prometheus/Grafana, etc), subject to various SLOs and SLAs (when they exist)

Develop a scalable system that minimizes operational toil. That includes making the platform itself as automated and push-button as possible, and continuously improving our systems of work to reduce manual labor and complexity.

Key responsibilities#

  • Ensure the reliable operation of the 2i2c infrastructure by deploying and consistently being responsible for tools such as JupyterHub, Kubernetes, Grafana, Prometheus, BinderHub and Dask

  • Proactively monitor for issues in our infrastructure and participate in processes that resolve them before they become emergencies, and resolve operational issues that are surfaced by our support team

  • Explore and learn emerging technologies as we identify needs within our communities - this role may co-create new infrastructure alongside research and education user communities

  • Participate in upstream open source communities we rely on (such as JupyterHub, BinderHub, Dask, Kubernetes, etc), advocating for our communities’ common operational needs

  • Collaborate with members of the Product group in the education and outreach around cloud computing; advise and support community members’ use of 2i2c’s services

  • Work in fast, flexible, agile ways as part of a highly collaborative product & services team

  • Work with a distributed and global team - team members are given a lot of autonomy, and expected to be proactive at communicating with one another and working with others to allocate effort that will maximize our impact.

Necessary qualities#

  • Experience with deploying applications on cloud infrastructure.

  • Experience deploying and developing with Linux container-based technologies, such as Docker and Kubernetes.

  • Experience with continuous integration services (e.g. Circle CI, GitHub Actions).

  • Experience developing tools in a general purpose programming language (eg. Python).

  • Experience collaborating and coordinating work via online platforms, such as GitHub, GitLab, or BitBucket, and distributed revision control.

  • Experience working with distributed service teams that use asynchronous methods of communication

Useful qualities#

  • Experience with major cloud providers.

  • Experience in programming and software engineering with a track record of leadership in open, collaborative projects with broad community adoption.

  • Experience working on geographically distributed open-source projects.

  • Experience with the Jupyter ecosystem and other tools for interactive computing.

  • Evidence of existing connections and relationships in the worldwide ecosystem of open source software for data-intensive research and ability to establish new ones.

  • Experience with common data science methods, platforms, workflows, and infrastructures; with data management systems, practices, and standards; and the capacity to gain familiarity with new related topics.

  • Experience engaging with highly technical researchers across a variety of methodological fields, research domains, and computational platforms.

  • Experience building and maintaining continuous deployment pipelines.

  • Interpersonal skills to work with researchers and students. Including the skills to communicate complex information in a clear and concise manner both verbally and in writing