CircleCI is seeking a Staff Site Reliability Engineer to work closely with our Software Engineers to deliver and manage the high-performance and scalable infrastructure underlying our multi-tenant Cloud offering as well as our Server-installed, on-premises solution. You will not only have the chance to automate and optimize infrastructure through the construction of appropriate tooling, but you will help software engineers through the design phase to optimize their services for scale in our production environment.
The CircleCI SRE team is globally distributed and remote-friendly. We take advantage of multiple timezones to manage a platform for our global customer base.
CircleCI is the best platform for software teams looking to rapidly build quality projects, at scale. Our intelligent continuous integration and delivery tools are simple yet powerful. Our aim is to provide the wisdom of a connected development ecosystem to every team member making technology decisions.
We run 7M+ builds a month on our platform for companies like Spotify, Kickstarter, Sony, and Coinbase. Over 25,000 organizations and 300,000 developers actively build, test, and deploy on CircleCI. We’ve raised $59.5M in venture capital from Industry Ventures, Top Tier Capital, Scale Venture Partners, DFJ, Harrison Metal Capital, and Baseline Ventures.
What will make you successful:
- Experience managing a container-based microservice architecture, including orchestration, service-discovery, monitoring, and debugging
- Understanding of standard networking protocols and components such as: TCP/IP, HTTP, DNS, ICMP, the OSI Model, Subnetting, and Load Balancing
- In-depth knowledge of operating systems (processes, threads, IPC, concurrency, locks, mutexes, semaphores, etc.).
- Proficiency in one or more of: C, C++, Java, Python, Go
- Comprehensive knowledge of the internal workings of at least one of Postgres, Mongo, Redis
- Systematic problem solving approach, coupled with a strong sense of ownership and drive
- Track-record of working cooperatively with software engineering teams
- Focus on security in the delivery of all levels of a system
- Passion for modern software development and operation, including agile, CI/CD, and infrastructure-as-code
- Desire to learn and grow
- 6+ years of experience
What you will do:
- Design and deliver solutions to improve the availability, scalability, latency, and efficiency of CircleCI’s services.
- Engage in service capacity planning and demand forecasting, anticipating performance bottlenecks
- Diagnose and resolve production issues in conjunction with software engineering teams
- Architect and implement shared infrastructure used by all services within the CircleCI platform, for both SaaS and on-prem configurations
- Support and advise software engineering teams in the design of scalable services
- Build and maintain tools for deployment, monitoring, and debugging
- Plan and execute disaster recovery drills
- Participate in rotating on-call duties, including incident management
If you’re interested in joining the team at CircleCI, please send a resumé and let us know why you’d be a great fit for our team. If you contribute to an open source project, write a blog, or have a presence on the web (Twitter, GitHub, LinkedIn, etc.) we would love to hear about it.
We care deeply about diversity and inclusivity. We’re hiring at all experience levels, and seek talented teammates from a wide variety of backgrounds and experiences who are equally committed to cultivating a work environment of respect and kindness. We carefully consider every applicant that takes the time to apply.