The Staff Site Reliability Engineer (SRE) will be responsible for ensuring the reliability, availability, and performance of our company's cybersecurity products and services. The SRE will work closely with development teams to design, implement, and maintain systems that are highly available and scalable.
- Design and implement systems to improve the availability, scalability, and performance of our products and services
- Identify and troubleshoot issues that arise in production environments
- Develop and maintain automation for provisioning, deployment, and monitoring of our products and services using EKS.
- Collaborate with development teams to design and implement new features and services
- Work with other teams to ensure compliance with security standards and best practices
- Continuously improve monitoring and alerting systems to ensure early detection of issues
- Participate in an on-call rotation for incident response and problem resolution
- Strong understanding of Linux and experience with scripting languages (e.g. Python, Bash)
- Experience with containerization and orchestration using EKS (e.g. Docker, Kubernetes)
- Experience with cloud infrastructure (e.g. AWS, GCP, Azure)
- Knowledge of security best practices and compliance standards
- Experience with monitoring and alerting tools (e.g. Prometheus, Grafana, Nagios)
- Strong problem-solving and troubleshooting skills
- Strong communication and collaboration skills
- Experience with Cybersecurity is a plus
We are looking for a highly motivated and experienced Site Reliability Engineer who is passionate about building and maintaining highly available and scalable systems using EKS. If you are a problem-solver and have a strong understanding of Linux and cloud infrastructure, we want to hear from you!
Job Title: Site Reliability Engineer
Company: Leading Series D Cybersecurity Company
Contract Length: 12months min - potential to go permanent