Senior Site Reliability Engineer (SRE) - AI Infrastructure & Kubernetes

densityai United State
Visa Sponsorship
Apply
AI Summary

Own and scale the infrastructure powering DensityAI's AI accelerator program, including Kubernetes clusters, CI/CD pipelines, and hybrid on-prem/cloud platforms. Drive automation and observability for chip-design and ML workloads using AI-assisted tooling. Requires 5+ years of SRE experience with deep Kubernetes, IaC, and observability expertise.

Key Highlights
Own infrastructure for Kubernetes, CI/CD, on-prem/cloud sync, observability, and high-availability platforms.
Support chip-design and ML workloads from first silicon through scale-out.
Use and develop AI-assisted tool flows for infrastructure automation and incident response.
Key Responsibilities
Own the infrastructure that engineering depends on — Kubernetes clusters, CI/CD pipelines, on-prem ↔ cloud sync, observability, and high-availability platforms for chip-design and ML workloads.
Use and develop AI-assisted tool flows to accelerate infra automation and incident response.
Technical Skills Required
Kubernetes Terraform Ansible GitHub Actions Bazel Buildkite AWS GCP Proxmox VMware Prometheus Grafana OpenTelemetry Loki
Benefits & Perks
Base salary $180k – $320k USD per year
Equity grant per company guidelines
Medical / dental / vision insurance
401(k)
Standard PTO
Nice to Have
GitHub Enterprise administration
Bazel build systems
ML-platform infrastructure (training / inference)
RAG / knowledge-platform operations

Job Description


About The Role

Own the infrastructure that engineering depends on — Kubernetes clusters, CI/CD pipelines, on-prem ↔ cloud sync, observability, and high-availability platforms for chip-design and ML workloads. Work with chip-design and software teams driving DensityAI's AI accelerator program from first silicon through scale-out.

What you'll do

  • Own the infrastructure that engineering depends on — Kubernetes clusters, CI/CD pipelines, on-prem ↔ cloud sync, observability, and high-availability platforms for chip-design and ML workloads.
  • Use and develop AI-assisted tool flows to accelerate infra automation and incident response.

What we're looking for

  • Exceptional abilities in Kubernetes operations, infrastructure-as-code (Terraform / Ansible), and CI/CD platforms (GitHub Actions, Bazel, Buildkite, or equivalent)
  • 5+ years of SRE / infrastructure engineering experience supporting engineering or ML teams at scale
  • Hands-on with hybrid on-prem ↔ cloud architectures (AWS / GCP plus virtualization platforms like Proxmox or VMware)
  • Strong fluency in observability stacks (Prometheus, Grafana, OpenTelemetry, Loki, or equivalent) and on-call practices
  • (Optional) GitHub Enterprise administration, Bazel build systems, ML-platform infrastructure (training / inference), or RAG / knowledge-platform operations

Compensation

Base salary: $180k – $320k USD per year, depending on experience and qualifications. Final offers depend on level, location, and skills relevant to the role. Additional compensation: equity grant per company guidelines; medical / dental / vision; 401(k); standard PTO. Discussed in detail during the interview.

Visa sponsorship

DensityAI sponsors qualified candidates for H-1B, O-1, TN, E-3, and other employment-based visas, and we welcome applicants on F-1 OPT and STEM-OPT. Work authorization is required at start; we provide immigration support to secure or transfer status.

Equal Opportunity

DensityAI is an Equal Opportunity Employer. We do not discriminate on the basis of race, color, religious creed, national origin, ancestry, physical or mental disability, medical condition, genetic information, marital status, sex, gender, gender identity, gender expression, age (40+), sexual orientation, military or veteran status, pregnancy, or any other status protected by law. We comply with the California CROWN Act and provide reasonable accommodations on request.

Similar Jobs

Explore other opportunities that match your interests

Cloud Networking Engineer

Devops
20h ago
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Mid-Senior level

Bright Vision Technologies

United State

Endpoint and Automation Security Engineer

Devops
22h ago

Premium Job

Sign up is free! Login or Sign up to view full details.

•••••• •••••• ••••••
Job Type ••••••
Experience Level ••••••

GEICO

United State

Senior Test Engineer - Investment Systems

Devops
1d ago

Premium Job

Sign up is free! Login or Sign up to view full details.

•••••• •••••• ••••••
Job Type ••••••
Experience Level ••••••

vanguard

United State

Subscribe our newsletter

New Things Will Always Update Regularly