Open Source Infrastructure Engineer

Open Source Infrastructure Engineer#

An Open Source Infrastructure (OSIE) focuses on infrastructure that supports interactive computing. It intersects job titles such as “dev-ops engineer”, “site reliability engineer”, “software engineer”, and “cloud engineer”.

Outcomes#

Ensure the reliable operation of the 2i2c infrastructure by deploying and being responsible for various tools we deploy (such as Kubernetes, Dask, JupyterHub, Prometheus/Grafana, etc), subject to various SLOs and SLAs (when they exist)

Develop a scalable system that minimizes operational toil. That includes making the platform itself as automated and push-button as possible, and continuously improving our systems of work to reduce manual labor and complexity.

Key responsibilities#

Ensure the reliable operation of the 2i2c infrastructure by deploying and consistently being responsible for tools such as JupyterHub, Kubernetes, Grafana, Prometheus, BinderHub and Dask
Proactively monitor for issues in our infrastructure and participate in processes that resolve them before they become emergencies, and resolve operational issues that are surfaced by our support team
Explore and learn emerging technologies as we identify needs within our communities - this role may co-create new infrastructure alongside research and education user communities
Participate in upstream open source communities we rely on (such as JupyterHub, BinderHub, Dask, Kubernetes, etc), advocating for our communities’ common operational needs
Collaborate with members of the Product group in the education and outreach around cloud computing; advise and support community members’ use of 2i2c’s services
Work in fast, flexible, agile ways as part of a highly collaborative product & services team
Work with a distributed and global team - team members are given a lot of autonomy, and expected to be proactive at communicating with one another and working with others to allocate effort that will maximize our impact.

Necessary qualities#

Experience with deploying applications on cloud infrastructure.
Experience deploying and developing with Linux container-based technologies, such as Docker and Kubernetes.
Experience with continuous integration services (e.g. Circle CI, GitHub Actions).
Experience developing tools in a general purpose programming language (eg. Python).
Experience collaborating and coordinating work via online platforms, such as GitHub, GitLab, or BitBucket, and distributed revision control.
Experience working with distributed service teams that use asynchronous methods of communication

Useful qualities#

Experience with major cloud providers.
Experience in programming and software engineering with a track record of leadership in open, collaborative projects with broad community adoption.
Experience working on geographically distributed open-source projects.
Experience with the Jupyter ecosystem and other tools for interactive computing.
Evidence of existing connections and relationships in the worldwide ecosystem of open source software for data-intensive research and ability to establish new ones.
Experience with common data science methods, platforms, workflows, and infrastructures; with data management systems, practices, and standards; and the capacity to gain familiarity with new related topics.
Experience engaging with highly technical researchers across a variety of methodological fields, research domains, and computational platforms.
Experience building and maintaining continuous deployment pipelines.
Interpersonal skills to work with researchers and students. Including the skills to communicate complex information in a clear and concise manner both verbally and in writing