Site Reliability Engineer (Middle) ID38916 
 1 month ago Be among the first 25 applicants 
 AgileEngine is an Inc.
5000 company that creates award-winning software for Fortune 500 brands and trailblazing startups across 17+ industries.
We rank among the leaders in areas like application development and AI/ML, and our people‑first culture has earned us multiple Best Place to Work awards.
 WHY JOIN US 
 If you're looking for a place to grow, make an impact, and work with people who care, we'd love to meet you! 
 WHAT YOU WILL DO 
  - Shift: Monday – Thursday 8AM – 7PM PST (11AM – 10PM EST) with rotating on‑call; 
- On‑call shifts: every 6 weeks, for one week as primary responder and next week as secondary; 
- Manage alerts daily, check systems, and escalate issues as needed; 
- Be part of a team that provides 24×7 on‑call support for critical SaaS events; 
- Be available in case of emergencies when team members are not available or need help; 
- Document issues and remediation steps; 
- Proactively create appropriate monitors in the EKS/K8S ecosystem; 
- Deploy to EKS/K8s cluster using Terraform and Helm; 
- Learn and maintain existing infrastructure running under Docker Swarm; 
- Improve existing infrastructure health by implementing checks and scripts to correct known issues; 
- Maintain and develop deployment code; 
- Automate manual tasks; 
- Implement/integrate new technologies in our Cloud Infrastructure; 
- Collaborate with other teams and departments to provide the highest level of support and assistance; 
- Apply a real customer focus when planning deployments/updates, having the customer in the forefront of the mind, and considering the impact on them before making changes; 
- Work closely on solutions with Support, Customer Success, Migration, and Professional Services teams to provide the best in class SaaS service to our customers; 
- Perform RCA and take necessary corrective actions to prevent the recurrence of issues; 
- Create and assign alert‑related actions to the appropriate team after the investigation; 
- Handle support requests for environment‑specific actions; 
- Identify and provide automation requirements to improve RCA.
 
 
MUST HAVES 
  - 2+ years of professional experience; 
- Experience working with Datadog; 
- Hands‑on experience as an AWS Cloud Engineer; 
- Working knowledge of EKS/Terraform/Helm; 
- Working Experience with Docker and Docker Swarm; 
- Good understanding of AWS IAM roles and policies; 
- Experience logging and monitoring AWS resources using CloudWatch logs; 
- Experience working in a Linux environment; 
- Proficient in Bash and/or Python scripting; 
- A strong understanding of web technologies such as REST APIs; 
- Working Experience with monitoring solutions, such as Grafana and Prometheus; 
- Excellent oral and written communication skills; 
- Customer‑facing communication skills to effectively explain issues and RCAs to them; 
- Experience in Product/Application Support for SaaS‑based products; 
- Understanding of APIs, Databases, Systems Architecture, and Design; 
- Designing, implementing, and operating in a DevSecOps; 
- Excellent communication skills, both written and verbal; 
- Ability to work independently as well as within a collaborative environment; 
- A technical aptitude with the desire to learn new and evolving technologies; 
- Upper‑Intermediate English level.
 
 
NICE TO HAVES 
  - Experience with GCP or Azure; 
- Certifications: AWS Certified DevOps Engineer – Professional or AWS Certified Advanced Networking Specialty.
 
 
PERKS AND BENEFITS 
  - Professional growth: Accelerate your professional journey with mentorship, TechTalks, and personalized growth roadmaps.
 
 
- Competitive compensation: We match your ever‑growing skills, talent, and contributions with competitive USD‑based compensation and budgets for education, fitness, and team activities.
 
 
- A selection of exciting projects: Join projects with modern solutions development and top‑tier clients that include Fortune 500 enterprises and leading product brands.
 
 
- Flextime: Tailor your schedule for an optimal work‑life balance, by having the options of working from home and going to the office – whatever makes you the happiest and most productive.
 
 
Seniority level 
  Referrals increase your chances of interviewing at AgileEngine by 2x 
  #J-18808-Ljbffr