HPC Infrastructure Engineer

spellbrush • United State

Visa Sponsorship

Apply

AI Summary

Lead HPC infrastructure engineer responsible for bringup, administration, and operations of a large anime AI training cluster. Requires experience with modern HPC software landscape, Linux sysadmin skills, and physical computer knowledge. Comfortable working on small, fast-paced teams.

Key Highlights

Lead HPC infrastructure engineer

Bringup, administration, and operations of a large anime AI training cluster

Experience with modern HPC software landscape

Key Responsibilities

Bring up and manage cluster

Ensure SLURM jobs are running

Manage parallel filesystems

Troubleshoot network issues

Work with researchers to train anime models

Technical Skills Required

SLURM Slinky K8s warewulf MAAS ansible WEKA VAST Ceph tailscale Grafana Prometheus Linux LDAP dmesg HGX-based nodes

Benefits & Perks

Visa sponsorship available

On-site collaboration in Tokyo or San Francisco

Physical hardware in the Bay Area

Job Description

We’re looking for an experienced HPC infrastructure engineer to lead bringup, administration, and operations on is probably the largest anime AI training cluster in the world. You’ll serve as the bridge between our researchers and the bare GPU machines, helping to make sure that SLURM jobs are running, parallel filesystems are serving, network is transmitting, and that the anime models are training.

You may be a good fit if:You love anime and the anime aesthetic.

This probably one of the only jobs in the world where you will get to combine your love of anime and large-scale GPU systems.

Searching for Devops roles that provide visa sponsorship? Connect with international employers through Devops Jobs with Visa Sponsorship opportunities actively seeking talented professionals.

You’re familiar with the modern HPC software landscape

Once upon a time, our team could install SLURM on a few bare metal nodes and get away with it. Now the landscape has become unbelievable complex, with SLURM deploys through Slinky on K8s, provisioning through warewulf/MAAS/ansible, filesystems through WEKA/VAST/Ceph, VPN and access through tailscale, and monitoring via the Grafana/Prometheus stack. We’re looking for someone with relevant experience up and down the stack (and maybe a papercut or two to show for it!)

As well as the traditional sysadmin landscape

Explore our comprehensive directory of visa sponsorship jobs from employers worldwide who are ready to sponsor talented international professionals.
Bringing up and managing cluster still requires good old linux sysadmin skills, including wrangling ldap, triaging dmesg, and setting sticky bits on directories for misbehaving users and tools.

You're not afraid of physical computers

We’re building out edge datacenters and our CEO is still personally racking, stacking, and provisioning HGX-based nodes in our living room. Also his VLAN design sucks and he’s bad at fiber routing. Please send help.

Interested in opportunities specifically in United State? Discover our dedicated Visa Sponsorship Jobs in United State page featuring roles from top employers in this location.

And you're comfortable working on small, fast-paced teams.

We currently have a very tiny research team, and you’ll be directly helping some of the AI researchers in the world train the best anime image model in the world.

We also believe in the unmatched speed of in-person teams, and prefer on-site collaboration in either our primary research office in Tokyo (downtown Akihabara), or San Francisco (dogpatch!). Bay area is strongly preferred as we have physical hardware in the Bay Area. Visa sponsorships are available.

Job Overview

Posted Date Jun 15, 2026

Employment Type Full-time

Experience Level Not Applicable

Location United State

Category Devops

Company spellbrush

Mentioned Skills

Similar Jobs

Explore other opportunities that match your interests

ServiceNow Tech Lead

Devops

•

1h ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Mid-Senior level

Jobgether

United State

Senior Deployment Engineer

Devops

•

1d ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Mid-Senior level

Trinity Cyber

United State

Founding Infrastructure and Platform Engineer - AI Integration Platform

Devops

•

2d ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Mid-Senior level

Venture Up

United State

HPC Infrastructure Engineer

Key Highlights

Key Responsibilities

Technical Skills Required

Benefits & Perks

Job Description

Job Overview

Mentioned Skills

Industries

Similar Jobs

ServiceNow Tech Lead

Jobgether

Senior Deployment Engineer

Trinity Cyber

Founding Infrastructure and Platform Engineer - AI Integration Platform

Venture Up

Subscribe our newsletter