🧨 Site Reliability Engineer, Remote Job, October 2024

Mural, the leading visual work platform for the enterprise, makes teamwork feel like less work. Our intuitive visual workspace enables teams to easily work together and collaborate better using proven design-thinking techniques. Built for enterprise teams, Mural meets the most stringent of IT and regulatory requirements. Industry leaders — including IBM, ‌Microsoft, SAP, and Abercrombie & Fitch — choose Mural to help their teams accelerate innovation and problem solving at scale. Whether your team is fully remote, distributed, in the office, or still figuring it out, Mural brings teams across the enterprise together to do the work that matters most.

ABOUT THE TEAM

Our small but highly skilled team of SRE engineers is dedicated to ensuring the stability and reliability of Mural’s services. We thrive on teamwork, adapting quickly to challenges, and taking ownership to ensure our systems perform at their best. Our work is essential to Mural's success, as we maintain the backbone of the platform, enabling seamless user experiences and fostering innovation across the company.

YOUR MISSION

In this role, you will design, implement, and maintain our cloud-based infrastructure on Azure, playing a key part in ensuring the scalability and reliability of our systems through automation. You will actively participate in an on-call rotation, practicing sustainable incident response and contributing to continuous improvement through postmortems. Collaboration with your team and cross-functional partners will be crucial as you work to enhance the resilience and efficiency of our services.

WHAT YOU'LL DO

Design and maintain cloud-based infrastructure on Azure, ensuring high availability and performance.
Automate infrastructure and application deployment processes using Terraform and CI/CD pipelines, enhancing system scalability and reliability.
Collaborate with development and operations teams to ensure seamless integration and delivery of new features, maintaining a focus on reliability and scalability.
Participate in on-call rotations and respond to incidents promptly, conducting post mortems to improve system resilience.
Optimize and monitor infrastructure to proactively identify issues and enhance the overall stability and security of the platform.

WHAT YOU'LL BRING

2 years of experience working as a software engineer.
Experience in building reliable, scalable, and maintainable systems.
Good at finding and solving problems.
Ready to handle production issues and operational tasks.
Familiar with Terraform for infrastructure as code.
Basic knowledge of Kubernetes.
Good communication skills.
Basic experience with Docker containers.
Familiar with cloud platforms and monitoring tools.
Basic Linux administration skills.
Understanding of basic security principles.
Strong understanding of network technologies and protocols, routing protocols, load balancing techniques, firewall configurations, and network security best practices.

Equal Opportunity

We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodation.

Site Reliability Engineer