Overview
Site Reliability Engineer (Middle) ID38916 – AgileEngine.
3 weeks ago Be among the first 25 applicants.
Join to apply for the Site Reliability Engineer (Middle) ID38916 role at AgileEngine .
AgileEngine is an Inc.
5000 company that creates award-winning software for Fortune 500 brands and trailblazing startups across 17+ industries.
We rank among the leaders in areas like application development and AI/ML, and our people-first culture has earned us multiple Best Place to Work awards.
WHY JOIN US
If you’re looking for a place to grow, make an impact, and work with people who care, we’d love to meet you!
What you will do
- Shift: Monday – Thursday 8AM – 7PM PST (11AM – 10PM EST) with rotating on-call.
- On call shifts: every 6 weeks, for one week as primary responder and next week as secondary.
- Manage alerts daily, check systems, and escalate issues as needed.
- Be part of a team that provides 24×7 on-call support for critical SaaS events.
- Be available in case of emergencies when team members are not available or need help.
- Document issues and remediation steps.
- Proactively create appropriate monitors in the EKS/K8S ecosystem.
- Deploy to EKS/K8s cluster using Terraform and Helm.
- Learn and maintain existing infrastructure running under Docker Swarm.
- Improve existing infrastructure health by implementing checks and scripts to correct known issues.
- Maintain and develop deployment code.
- Automate manual tasks.
- Implement/integrate new technologies in our Cloud Infrastructure.
- Collaborate with other teams and departments to provide the highest level of support and assistance.
- Apply a real customer focus when planning deployments/updates, having the customer in the forefront of the mind, and considering the impact on them before making changes.
- Work closely on solutions with Support, Customer Success, Migration, and Professional Services teams to provide the best in class SaaS service to our customers.
- Perform RCA and take necessary corrective actions to prevent recurrence of issues.
- Create and assign alert-related actions to the appropriate team after the investigation.
- Handle support requests for environment-specific actions.
- Identify and provide automation requirements to improve RCA.
Must Haves
- 2+ years of professional experience.
- Experience working with Datadog.
- Hands-on experience as an AWS Cloud Engineer.
- Working knowledge of EKS/Terraform/Helm.
- Working experience with Docker and Docker Swarm.
- Good understanding of AWS IAM roles and policies.
- Experience logging and monitoring AWS resources using CloudWatch logs.
- Experience working in a Linux environment.
- Proficient in Bash and/or Python scripting.
- Strong understanding of web technologies such as REST APIs.
- Working experience with monitoring solutions, such as Grafana and Prometheus.
- Excellent oral and written communication skills; customer-facing communication skills to effectively explain issues and RCAs.
- Experience in Product/Application Support for SaaS-based products.
- Understanding of APIs, Databases, Systems Architecture, and Design.
- Designing, implementing, and operating in a DevSecOps environment.
- Excellent communication skills, both written and verbal; ability to work independently as well as within a collaborative environment.
- A technical aptitude with the desire to learn new and evolving technologies.
- Upper-Intermediate English level.
Nice to Have
- Experience with GCP or Azure.
- Certifications: AWS Certified DevOps Engineer – Professional or AWS Certified Advanced Networking Specialty.
Perks and Benefits
- Professional growth: Accelerate your professional journey with mentorship, TechTalks, and personalized growth roadmaps.
- Competitive compensation: We match your ever-growing skills, talent, and contributions with competitive USD-based compensation and budgets for education, fitness, and team activities.
- A selection of exciting projects: Join projects with modern solutions development and top-tier clients that include Fortune 500 enterprises and leading product brands.
- Flextime: Tailor your schedule for an optimal work-life balance, by having the options of working from home and going to the office – whatever makes you the happiest and most productive.
Employment details
- Seniority level: Mid-Senior level
- Employment type: Full-time
- Job function: IT Services and IT Consulting
#J-18808-Ljbffr