Observability Operations Engineer

bridge351 • Portugal
Remote
Apply
AI Summary

We are looking for an experienced Observability Operations Engineer to support and operate enterprise-scale platform environments. The successful candidate will be responsible for ensuring the reliability, performance, and observability of critical systems running in Kubernetes-based environments. Key responsibilities include operating and supporting Kubernetes-based production environments, managing and optimizing observability platforms and monitoring solutions, and collaborating with engineering teams to improve platform reliability and performance.

Key Highlights
Operate and support Kubernetes-based production environments
Manage and optimize observability platforms and monitoring solutions
Collaborate with engineering teams to improve platform reliability and performance
Key Responsibilities
Operate and support Kubernetes-based production environments
Manage and optimize observability platforms and monitoring solutions
Configure and maintain logging, metrics, and tracing solutions
Support incident, problem, and change management processes
Define and monitor SLIs, SLOs, and SLAs
Create and maintain operational runbooks and documentation
Collaborate with engineering teams to improve platform reliability and performance
Technical Skills Required
Prometheus Grafana Datadog Loki Mimir OpenTelemetry GitLab Jenkins ArgoCD Tekton Argo Workflows
Benefits & Perks
Fully remote work model
International projects for the German market
Modern cloud-native technology stack
Long-term opportunities
Collaborative and highly skilled engineering teams
100% Remote opportunities
International Career
Health and Life Insurance
Tech Visa Company
Nice to Have
Experience in enterprise-scale environments
Cloud-native platform experience
Infrastructure automation knowledge
Experience working in regulated industries

Job Description


Location

Remote (Europe) with occasional travel to Germany

Language Requirements

  • German: C1 or higher (mandatory)
  • English: C1 or higher (mandatory)

About the Role

We are looking for an experienced Observability Operations Engineer to support and operate enterprise-scale platform environments. The successful candidate will be responsible for ensuring the reliability, performance, and observability of critical systems running in Kubernetes-based environments.

You will work closely with platform, infrastructure, and development teams to improve monitoring capabilities, operational excellence, and service reliability across complex enterprise environments.

Key Responsibilities

  • Operate and support Kubernetes-based production environments.
  • Manage and optimize observability platforms and monitoring solutions.
  • Configure and maintain logging, metrics, and tracing solutions.
  • Support incident, problem, and change management processes.
  • Define and monitor SLIs, SLOs, and SLAs.
  • Create and maintain operational runbooks and documentation.
  • Collaborate with engineering teams to improve platform reliability and performance.
  • Contribute to automation and continuous improvement initiatives.

Required Skills & Experience

  • Minimum 3 years of experience operating Kubernetes environments in production.
  • Strong experience with observability and monitoring platforms such as:
    • Prometheus
    • Grafana
    • Datadog
    • Loki
    • Mimir
    • OpenTelemetry
  • Strong understanding of networking concepts, load balancing, and security principles.
  • Experience with CI/CD tools and processes:
    • GitLab
    • Jenkins
    • ArgoCD
    • Tekton
    • Argo Workflows
  • Knowledge of ITSM processes:
    • Incident Management
    • Change Management
    • Problem Management
  • Understanding of Site Reliability Engineering (SRE) practices.
  • Experience documenting operational procedures and maintaining runbooks.
Nice to Have

  • Experience in enterprise-scale environments.
  • Cloud-native platform experience.
  • Infrastructure automation knowledge.
  • Experience working in regulated industries.

What We Offer

  • Fully remote work model.
  • International projects for the German market.
  • Modern cloud-native technology stack.
  • Long-term opportunities.
  • Collaborative and highly skilled engineering teams.

What can you expect from us?

Mind-blowing workplace culture. You will be integrated in a professional, dynamic and collaborative team.

100% Remote opportunities

We want you to have the flexibility to work where you feel most comfortable and productive.

International Career

  • You can expect professional growth and to be connect with the world.
  • We are represented in Portugal, Belgium, Luxembourg, and Denmark.
  • And with projects in many other countries: Netherlands, Luxembourg, Singapore and in the United States of America (and a lot more is coming…)

Extra Benefits & Perks

If you wish to work with us and you are outside European Union (good news…) we are a Tech Visa Company, We will help!

As a plus, we provide Health and Life Insurance.

Similar Jobs

Explore other opportunities that match your interests

Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Director

Explore Group

Portugal

IAM PKI Engineer

Devops
•
6d ago
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Not Applicable

bridge351

Portugal
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Mid-Senior level

TMC

Portugal

Subscribe our newsletter

New Things Will Always Update Regularly