Head of Data Center Operations

Blue Signal Search • United State

Remote

This Job is No Longer Active This position is no longer accepting applications

Job Description

Head of Data Center Operations

Location: Remote (United States) – Preference for candidates based in the greater Bay Area

Our client is a well-funded, high-growth innovator delivering large-scale GPU compute for cutting-edge AI workloads. As demand accelerates, they are scaling multiple, next-generation data-center clusters across the country. They are seeking a strategic, hands-on Head of Data Center Operations to safeguard uptime, performance, and growth of this mission-critical infrastructure. If you thrive in hyper-scalable environments and enjoy shaping world-class operational teams, this role offers an unmatched opportunity to define the gold standard for GPU data-center reliability.

This Role Offers

Executive-level influence over a rapidly expanding GPU cloud platform.
Remote-first culture with high ownership, technical depth, and autonomy.
Direct impact on reliability engineering strategy during multi-megawatt capacity growth.
Competitive base salary, performance-based equity, and comprehensive benefits.
Chance to lead real-time operations at the forefront of AI infrastructure innovation.

Key Responsibilities

Direct the 24×7 operations of geographically distributed, high-density GPU data centers totaling tens of megawatts of compute capacity.
Establish and continuously improve monitoring, incident response, and change-management processes to ensure industry-leading uptime and performance.
Drive adoption of reliability-engineering best practices, creating playbooks, automation, and tooling that scale with rapid capacity growth.
Partner with hardware, facilities, and platform-engineering teams to optimize resource utilization, thermal efficiency, and service quality.
Manage vendor and colocation relationships, negotiating SLAs for power, cooling, and network connectivity.
Lead and mentor a global team of site-reliability engineers, NOC staff, and systems operators.
Oversee compliance programs covering security, disaster recovery, business continuity, and environmental regulations.
Analyze incidents and performance trends to identify systemic risks and implement preventive solutions.

Skill Set & Qualifications

10+ years in data-center or large-scale infrastructure operations, including hyperscale, GPU, or HPC environments.
Proven track record operating live production workloads at 20 MW or greater total capacity.
Expert knowledge of observability, telemetry, and alerting systems for distributed infrastructure.
Familiarity with GPU workloads, thermal dynamics, and high-density rack design.
Exceptional incident-management and root-cause-analysis skills.
Demonstrated success building and scaling remote, globally distributed operations teams.
Startup or high-growth environment experience strongly preferred.

Ready to lead the next leap in AI infrastructure reliability? Apply today to explore how your experience can power the future of large-scale GPU compute.

About Blue Signal:

Blue Signal is an award-winning, executive search firm specializing in various specialties. Our recruiters have a proven track record of placing top-tier talent across industry verticals, with deep expertise in numerous professional services. Learn more at bit.ly/46Gs4yS

Job Overview

Posted Date Oct 11, 2025

Employment Type Full-time

Experience Level Mid-Senior level

Location United State

Category Programming

Company Blue Signal Search

Mentioned Skills

Similar Jobs

Explore other opportunities that match your interests

AI Field Engineer - Enterprise

Programming

•

5h ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Entry level

ai talent hunt cloud

United State

HRIS Specialist

Programming

•

5h ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Not Applicable

Peak Support

United State

Site Reliability Engineer

Programming

•

5h ago

Premium Job

•••••• •••••• ••••••

Job Type ••••••

Experience Level ••••••

Nebius

United State

Head of Data Center Operations

Job Description

Job Overview

Mentioned Skills

Industries

Similar Jobs

AI Field Engineer - Enterprise

ai talent hunt cloud

HRIS Specialist

Peak Support

Site Reliability Engineer

Premium Job

Nebius

Subscribe our newsletter