Senior Software Engineer, Distributed Scheduling

gtn technical staffing • Dallas-fort Worth Metroplex

Relocation

Apply

AI Summary

Design, build, and maintain large-scale scheduling software for HPC, AI, and production workloads. Develop scalable backend services, APIs, and distributed systems primarily in Go. Ensure platform reliability, performance, and maintainability across cloud and Kubernetes environments.

Key Highlights

Work on Armada open-source scheduling platform

Develop distributed systems for HPC and AI infrastructure

Primary language: Go with cloud and Kubernetes experience

Hands-on production engineering with observability tools

Key Responsibilities

Design, write, test, and review high-quality production code, primarily in Go

Build and maintain scalable backend services, APIs, and distributed systems supporting high-demand workloads

Contribute to Armada and related internal scheduling, orchestration, and platform services

Develop tooling and automation that improves platform reliability, developer productivity, and operational efficiency

Build services that operate reliably across large-scale HPC and AI infrastructure environments

Work with Kubernetes-based orchestration, containerized services, and modern deployment workflows

Develop and debug software in Linux environments using command-line and system-level tooling

Apply networking fundamentals to troubleshoot, optimize, and improve platform connectivity and performance

Manage and optimize data interactions across relational and non-relational data stores, with emphasis on PostgreSQL

Contribute to CI/CD pipelines, automated testing, observability, and engineering best practices

Use monitoring, logging, and runtime tools such as Prometheus, Grafana, or similar platforms

Think critically about correctness, edge cases, performance, and failure modes

Stay current with emerging technologies and apply new approaches where they improve platform outcomes

Technical Skills Required

Go backend services distributed systems Kubernetes Linux PostgreSQL Prometheus Grafana AWS GCP Azure CI/CD pipelines automated testing observability monitoring logging metrics alerting performance optimization reliability engineering

Benefits & Perks

100% company-paid benefits

relocation available for non-local candidates

performance bonus

Nice to Have

HPC infrastructure experience

AI infrastructure experience

batch scheduling

workload orchestration

large-scale compute platforms

open-source project contributions

non-relational databases

message queues

event-driven systems

high-throughput platforms

performance optimization

reliability engineering

production platform operations

Job Description

Senior Software Engineer, Distributed Scheduling

Location: Dallas, TX | Hybrid

Type: Direct Hire

Relocation: Available for non-local candidates

Compensation

Base salary: $170,000 – $250,000 + performance bonus

Benefits: 100% company-paid benefits

Overview

GTN is seeking a Senior Software Engineer, Distributed Scheduling to help design, build, and maintain large-scale scheduling software supporting demanding HPC, AI, research, and production workloads.

This role sits on a highly technical engineering team responsible for developing distributed systems, backend services, APIs, tooling, and automation that keep a high-scale compute platform reliable, performant, and maintainable.

Much of the work centers around Armada, an open-source scheduling platform, along with internal services and platform tooling written primarily in Go. This is a hands-on engineering role focused on writing clean, well-tested code, reviewing designs, solving complex distributed systems problems, and owning production-quality software.

The ideal candidate is a strong software engineer with excellent coding fundamentals, backend or distributed systems experience, and a practical understanding of how software runs in cloud, Linux, Kubernetes, and production infrastructure environments.

Key Responsibilities

Software Engineering & Platform Development

• Design, write, test, and review high-quality production code, primarily in Go

• Build and maintain scalable backend services, APIs, and distributed systems supporting high-demand workloads

• Contribute to Armada and related internal scheduling, orchestration, and platform services

• Develop tooling and automation that improves platform reliability, developer productivity, and operational efficiency

• Apply strong software architecture principles to ensure systems are maintainable, correct, and scalable

Looking to advance your Development & Programming career with relocation support? Explore Development & Programming Jobs with Relocation Packages that include comprehensive packages to help you move and settle in your new role.

Distributed Systems & Infrastructure

• Build services that operate reliably across large-scale HPC and AI infrastructure environments

• Work with Kubernetes-based orchestration, containerized services, and modern deployment workflows

• Develop and debug software in Linux environments using command-line and system-level tooling

• Apply networking fundamentals to troubleshoot, optimize, and improve platform connectivity and performance

• Independently diagnose and resolve complex issues across software and infrastructure layers

Reliability, Data & Operations

• Manage and optimize data interactions across relational and non-relational data stores, with emphasis on PostgreSQL

• Contribute to CI/CD pipelines, automated testing, observability, and engineering best practices

• Use monitoring, logging, and runtime tools such as Prometheus, Grafana, or similar platforms

• Think critically about correctness, edge cases, performance, and failure modes

• Stay current with emerging technologies and apply new approaches where they improve platform outcomes

Required Qualifications

• Strong software engineering fundamentals, including data structures, algorithms, system design, and maintainable code practices

• Proficiency in Go or another statically typed language, with the ability to ramp quickly into Go-based codebases

• Experience building backend services, APIs, distributed systems, or infrastructure software in production environments

• Hands-on experience with cloud environments such as AWS, GCP, or Azure

• Experience with Linux-based development, debugging, and command-line tooling

• Experience with Kubernetes, containers, or modern deployment pipelines

Discover our full range of relocation jobs with comprehensive support packages to help you relocate and settle in your new location.

• Experience with PostgreSQL or similar relational databases

• Understanding of observability practices, including monitoring, logging, metrics, and alerting

• Strong testing mindset with focus on correctness, reliability, and failure scenarios

• Ability to work independently, review code thoughtfully, and contribute within a collaborative engineering team

Preferred Qualifications

• Experience with HPC, AI infrastructure, batch scheduling, workload orchestration, or large-scale compute platforms

• Hands-on experience with Kubernetes scheduling, multi-cluster systems, or distributed job orchestration

• Contributions to open-source projects or experience working in open-source engineering environments

• Experience with non-relational databases, message queues, event-driven systems, or high-throughput platforms

• Familiarity with performance optimization, reliability engineering, or production platform operations

Ideal Profile

The ideal candidate is a hands-on software engineer who enjoys building distributed scheduling and infrastructure software that operates at scale. They write clean, tested code, understand distributed systems tradeoffs, and are comfortable working close to production infrastructure.

They do not need to come directly from an HPC background, but they should have strong backend engineering fundamentals and an interest in solving complex scheduling, orchestration, and platform reliability challenges.

Why This Role

• Work on high-scale HPC and AI infrastructure supporting demanding production workloads

• Contribute to Armada, an open-source scheduling platform

• Join a senior, collaborative engineering team with real ownership over technical direction

• Build software that directly impacts platform reliability, performance, and scalability

• Competitive compensation, performance bonus, relocation support, and 100% company-paid benefits

Job Overview

Posted Date Jun 05, 2026

Employment Type Full-time

Experience Level Mid-Senior level

Location Dallas-fort Worth Metroplex

Annual Salary 170,000 - 250,000 USD

Category Programming

Company gtn technical staffing

Mentioned Skills

Similar Jobs

Explore other opportunities that match your interests

Senior Software Engineer, Distributed Systems

Programming

•

22h ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Mid-Senior level

gtn technical staffing

Dallas-fort Worth Metroplex

Senior Backend Software Engineer - HPC Scheduling

Programming

•

1w ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Mid-Senior level

gtn technical staffing

Dallas-fort Worth Metroplex

Senior Java Developer - Investment Banking Domain

Programming

•

9h ago

Premium Job

•••••• •••••• ••••••

Job Type ••••••

Experience Level ••••••

Bounteous

Canada

Senior Software Engineer, Distributed Scheduling

Key Highlights

Key Responsibilities

Technical Skills Required

Benefits & Perks

Nice to Have

Job Description

Job Overview

Mentioned Skills

Industries

Similar Jobs

Senior Software Engineer, Distributed Systems

gtn technical staffing

Senior Backend Software Engineer - HPC Scheduling

gtn technical staffing

Senior Java Developer - Investment Banking Domain

Premium Job

Bounteous

Subscribe our newsletter