Systems Reliability Engineer Opportunity

eeze company

Subscribe to our Telegram Channel

Systems Reliability Engineer in MALTA

Visa sponsorship & Relocation 1 month ago

Company Overview:


Eeze is a leading innovator in the iGaming industry, specialising in the design, development, and delivery of live casino games. As we expand our operations in the European market, we are committed to pushing the boundaries of gaming experiences, offering cutting-edge, immersive live casino games that are tailored to the unique needs of our clients and their audiences. Our work culture promotes collaboration, creativity, and innovation.


Relocation Packages are available for this role please do apply and ask for more details.


Responsibilities:


  • Enhance and maintain observability tools: Develop and optimise monitoring solutions using tools like Grafana, Prometheus, ELK (Elasticsearch, Logstash, and Kibana), OpenTelemetry, and others to ensure comprehensive visibility into system health and performance. Continuously improve logging, metrics, and tracing setups to detect anomalies early and maintain high service availability.
  • Automation and infrastructure management: Design and build robust automation scripts to streamline critical operations such as component restarts and system recovery using tools like Jenkins, Ansible, and other automation frameworks. The goal is to minimize manual intervention and reduce system recovery time.
  • Proactive system monitoring and incident management: Regularly monitor system performance to identify potential bottlenecks or failures before they impact users. Quickly diagnose issues, implement preventive measures, and collaborate with teams to optimize system health. Mentor and guide Technical Support Engineers in best practices for monitoring and automation, improving their understanding and skills in these areas.
  • Platform knowledge and operational support: Gain in-depth knowledge of the live casino platform's architecture, infrastructure, and operations to support deployment processes, troubleshoot issues, and assist with business-as-usual (BAU) tasks. Use this expertise to contribute to platform reliability and provide technical input to related teams.
  • 24/7 shift rotation: Participate in a rotating shift schedule (day, night, rest, off) to ensure round-the-clock coverage and support of live systems. This ensures consistent availability to handle incidents and maintain platform stability.
  • Stakeholder communication and collaboration: Serve as a key point of contact for both internal and external stakeholders, including DevOps teams, studio technicians, Corporate IT, and Customer Account Management. Ensure clear and timely communication during incidents and ongoing projects to align expectations and provide technical updates.
  • Incident response and resolution: Take ownership of incident management, working to resolve issues quickly and efficiently while minimising downtime. Investigate root causes of incidents and implement solutions to prevent recurrence, maintaining system stability and reliability.
  • Cross-functional collaboration: Work closely with other IT departments to ensure seamless integration and performance of new systems and services. Participate in cross-team projects to improve platform reliability and scalability.
  • SRE tools evaluation and adoption: Stay informed about the latest tools and technologies in the SRE domain. Take part in evaluating new tools and approaches for improving observability, automation, and overall system resilience. Contribute to adopting cutting-edge solutions that enhance system performance and reduce operational overhead.


Requirements:


  • Experience: 2+ years of hands-on experience in a Site Reliability Engineer (SRE) role, focusing on improving system reliability, automation, and performance monitoring.
  • Operating systems: Strong understanding of Linux/Unix operating systems, with the ability to troubleshoot and optimise their performance in production environments.
  • Scripting skills: Proficiency in scripting languages like Python or Bash to build automation scripts and streamline manual processes. Knowledge of version control systems like Git is a plus.
  • Automation tools: Experience with automation tools such as Ansible, Terraform, or similar technologies to automate configuration management, infrastructure provisioning, and deployment pipelines.
  • CI/CD: Familiarity with Continuous Integration/Continuous Deployment (CI/CD) concepts and tools such as Jenkins, GitLab CI/CD, or similar platforms to automate software delivery and reduce deployment times.
  • Troubleshooting and problem-solving: Strong analytical skills to identify, diagnose, and resolve complex technical issues, particularly in large-scale, distributed systems. Experience with hyperconverged systems and hypervisors (e.g., VMware) is preferred.
  • Communication and collaboration: Excellent communication skills, with the ability to work collaboratively in a fast-paced, cross-functional environment. A proactive and open approach to solving challenges as part of a team.
  • Learning mindset: Eager to learn and adapt to new technologies, frameworks, and approaches. Demonstrated interest in continuous improvement and staying current with industry trends.
  • Industry passion: A strong passion for the iGaming industry, with an understanding of its unique technical challenges, regulatory requirements, and opportunities for innovation.


This role offers a challenging and rewarding opportunity to play a key part in ensuring the reliability, scalability, and success of a live casino platform. You will be at the forefront of resolving critical issues, automating operations, and improving system observability to deliver world-class gaming experiences to customers worldwide.


Benefits:


  • €400 Wellness Allowance
  • In house Gym
  • Annual Health Insurance
  • Company Events
  • Snacks at the office
  • Daily lunches at the office
  • Shift Allowance
Apply now

Subscribe our newsletter

New Things Will Always Update Regularly