Acceler8 Talent company
Distributed Systems Engineer
Introduction: Join our mission-driven team as a Distributed Systems Engineer. This role focuses on building data and coordination systems that enable ultra-long context inference and training on our GPU clusters. We aim to create safe AGI to accelerate progress on critical global issues by automating research and code generation.
About Us: We are dedicated to developing safe AGI by leveraging frontier-scale pre-training, domain-specific reinforcement learning, ultra-long context, and test-time compute. Our approach aims to automate and improve models, solving alignment challenges more reliably than human efforts alone. We value integrity, hands-on collaboration, teamwork, focus, and delivering high-quality results.
About the Role: As a Distributed Systems Engineer, you will be responsible for developing high-performance storage and caching systems to support long-context inference and training. You will work on the internals of deep learning frameworks in distributed settings, automate fault detection and recovery systems, and troubleshoot complex issues across GPUs, networks, storage, OS, and cloud environments.
What We Can Offer You:
Key Responsibilities:
Relevant Keywords: AGI, GPU Clusters, Python, Typescript, Go, Rust, C++, LLM, Large Language Model, Scalable Software Design, TensorFlow, Artificial Intelligence, Deep Learning, Statistical Modeling, Algorithms