Distributed Systems Engineer Opportunity

Acceler8 Talent company

Subscribe to our Telegram & Twitter Channel

Distributed Systems Engineer in SAN FRANCISCO BAY AREA

Visa sponsorship 1 year ago

Distributed Systems Engineer


Introduction: Join our mission-driven team as a Distributed Systems Engineer. This role focuses on building data and coordination systems that enable ultra-long context inference and training on our GPU clusters. We aim to create safe AGI to accelerate progress on critical global issues by automating research and code generation.


About Us: We are dedicated to developing safe AGI by leveraging frontier-scale pre-training, domain-specific reinforcement learning, ultra-long context, and test-time compute. Our approach aims to automate and improve models, solving alignment challenges more reliably than human efforts alone. We value integrity, hands-on collaboration, teamwork, focus, and delivering high-quality results.


About the Role: As a Distributed Systems Engineer, you will be responsible for developing high-performance storage and caching systems to support long-context inference and training. You will work on the internals of deep learning frameworks in distributed settings, automate fault detection and recovery systems, and troubleshoot complex issues across GPUs, networks, storage, OS, and cloud environments.


What We Can Offer You:

  • Competitive Annual Salary Range
  • Significant equity as part of total compensation
  • 401(k) plan with 6% salary matching
  • Comprehensive health, dental, and vision insurance for you and your dependents
  • Unlimited paid time off
  • Option to work in-person in SF or remotely
  • Visa sponsorship and relocation stipend to SF


Key Responsibilities:

  • Build and maintain high-performance storage and caching systems
  • Develop and optimize the internals of deep learning frameworks in a distributed environment
  • Automate fault detection and recovery systems for high availability
  • Troubleshoot and resolve complex issues across GPUs, networks, storage, OS, and cloud environments
  • Collaborate with a small, focused team to achieve our mission


Relevant Keywords: AGI, GPU Clusters, Python, Typescript, Go, Rust, C++, LLM, Large Language Model, Scalable Software Design, TensorFlow, Artificial Intelligence, Deep Learning, Statistical Modeling, Algorithms

Apply now

Subscribe our newsletter

New Things Will Always Update Regularly