As a Lead Azure SRE, you will be responsible for driving the reliability, performance, and scalability of cloud-based applications and services. Your expertise in Kubernetes, scripting, troubleshooting, and observability will be instrumental in ensuring a seamless and efficient cloud operations environment
Take ownership of managing Kubernetes clusters, ensuring their reliability, scalability, and performance. Implement best practices for deploying, monitoring, and optimizing containerized applications in a cloud environment
Utilize scripting skills in Python, Bash, and PowerShell to develop automation tools and streamline repetitive tasks. Automate infrastructure provisioning, deployment, and maintenance to achieve operational efficiency
Demonstrate expertise in troubleshooting cloud environments, diagnosing and resolving issues to maintain high availability and performance. Implement proactive monitoring and alerting solutions to identify and address potential problems before they escalate
Integrate with Azure DevOps to optimize the CI/CD pipeline, enabling continuous delivery and deployment of applications. Collaborate with development teams to streamline the release process and ensure smooth deployments
Implement and maintain the modern observability stack, including tools like Grafana, Prometheus, Loki, etc. Leverage these tools to monitor the health and performance of systems and applications, enabling quick identification and resolution of incidents
Requirements
Kubernetes
Scripting (Python, Bash, PowerShell… in that order of preference)
Troubleshooting in cloud environments
Azure DevOps
Good understanding/knowledge about modern observability stack i.e., tools like Grafana, Prometheus, Loki, etc
Nice to Have
Experience working with Windows
Knowledge of CI/CD (especially Azure DevOps)
Knowledge of Istio
Knowledge of GitOps tools (like ArgoCD)
We Offer
Career plan and real growth opportunities
Unlimited access to LinkedIn learning solutions
International Mobility Plan within 25 countries
Constant training, mentoring, online corporate courses, eLearning and more
English classes with a certified teacher
Support for employee’s initiatives (Algorithms club, toastmasters, agile club and more)
Enjoyable working environment (Gaming room, napping area, amenities, events, sport teams and more)
Flexible work schedule and dress code
Collaborate in a multicultural environment and share best practices from around the globe
Hired directly by EPAM & 100% under payroll
Law benefits (IMSS, INFONAVIT, 25% vacation bonus)
Major medical expenses insurance: Life, Major medical expenses with dental & visual coverage (for the employee and direct family members)
13 % employee savings fund, capped to the law limit
Grocery coupons
30 days December bonus
Employee Stock Purchase Plan
12 vacations days plus 3 floating days
Official Mexican holidays, plus 5 extra holidays (Maundry Thursday and Friday, November 2nd, December 24th & 31st)
Relocation bonus: transportation, 2 weeks of accommodation for you and your family and more
Monthly non-taxable amount for the electricity and internet bills