2 weeks ago Be among the first 25 applicants
EPAM is a leading global provider of digital platform engineering and development services.
We are committed to having a positive impact on our customers, our employees, and our communities.
We embrace a dynamic and inclusive culture.
Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow.
No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.
We are seeking a talented and experienced Senior Site Reliability Engineer (SRE) to join our dynamic team.
As a Senior SRE, you will play a critical role in designing, developing, and maintaining highly reliable systems and processes to ensure optimal performance and scalability of applications and infrastructure across diverse environments.
Responsibilities
- Build and containerize applications and deploy them using open-source container management tools such as Docker or Podman
- Design and maintain Kubernetes resource manifests, deploying them into clusters on platforms like AKS or GKE
- Configure and deploy Prometheus agents to monitor infrastructure and application behaviors, raising alerts when necessary
- Create and manage continuous deployment pipelines using tools like Helm and ArgoCD
- Optimize observability by implementing monitoring, logging, and tracing solutions
- Maintain and manage CI/CD processes within Azure DevOps or similar environments
- Develop and implement solutions on cloud platforms, leveraging expertise in at least one provider (e.g., Microsoft Azure, GCP, AWS)
- Troubleshoot infrastructural and application issues by utilizing logs and traces to isolate events effectively
Requirements
- Minimum 3+ years of programming experience, preferably in GoLang
- Hands-on experience with at least one scripting language (e.g., Bash or Python)
- Proficiency with Kubernetes, with at least 3 years of practical expertise
- Fundamental knowledge of observability tools, with a focus on Prometheus or similar monitoring platforms
- Skills in configuring and managing CI/CD pipelines using Azure DevOps or tools like Helm and ArgoCD for GitOps-style continuous deployment
- Background in cloud platforms with competency in at least one provider (e.g., Microsoft Azure, Google Cloud, AWS)
- Flexibility to use open-source tools like Docker or Podman to containerize applications and manage their runtime environments effectively
Nice to have
- Familiarity with multiple cloud providers, including AWS and GCP alongside Azure
- Expertise in GitOps packaging and deployment tools like Argo CD and Helm
- Understanding of service meshes like Istio for Kubernetes-based microservices architectures
- Competency in infrastructure-as-code tools such as Terraform
- Background in software development with experience across multiple domains
We offer
- International projects with top brands
- Work with global teams of highly skilled, diverse peers
- Employee financial programs
- Paid time off and sick leave
- Upskilling, reskilling and certification courses
- Unlimited access to the LinkedIn Learning library and 22,000+ courses
- Global career opportunities
- Volunteer and community involvement opportunities
- EPAM Employee Groups
- Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn
Seniority level
Employment type
Job function
- Engineering, Information Technology, and Business Development
Industries
- Software Development, IT Services and IT Consulting, and Nanotechnology Research
We’re unlocking community knowledge in a new way.
Experts add insights directly into each article, started with the help of AI.
#J-18808-Ljbffr