Inference Runtime Engineer

inferact • United State

Remote Visa Sponsorship

Apply

AI Summary

Inferact is seeking an Inference Runtime Engineer to optimize the execution of large language models across diverse hardware and architectures. The ideal candidate will have a deep understanding of transformer architectures and experience with LLM inference systems. The role requires strong programming skills in Python and the ability to contribute performant and maintainable code.

Key Highlights

Optimize LLM execution across diverse hardware and architectures

Work on the core of vLLM

Impact how the world runs AI inference

Technical Skills Required

Python PyTorch Transformer architectures LLM inference systems

Benefits & Perks

Generous health, dental, and vision benefits

401(k) company match

Equity

Nice to Have

Deep understanding of KV-cache memory management

Familiarity with RL frameworks and algorithms for LLMs

Contributions to open-source ML or system infrastructure projects

Job Description

Inferact's mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. Founded by the creators and core maintainers of vLLM, we sit at the intersection of models and hardware—a position that took years to build.

About The Role

We're looking for an inference runtime engineer to push the boundaries of what's possible in LLM and diffusion model serving. Models grow larger. Architectures shift: mixture-of-experts, multimodal, agentic. Every breakthrough demands innovations on the inference engine itself. You'll work at the core of vLLM, optimizing how models execute across diverse hardware and architectures. Your work will directly impact how the world runs AI inference.

Skills And Qualifications

Minimum qualifications:

Bachelor's degree or equivalent experience in computer science, engineering, or similar.
Deep understanding of transformer architectures and their variants.
Strong programming skills in Python with experience in PyTorch internals.

Searching for Development & Programming roles that provide visa sponsorship? Connect with international employers through Development & Programming Jobs with Visa Sponsorship opportunities actively seeking talented professionals.

Experience with LLM inference systems (vLLM, TensorRT-LLM, SGLang, TGI).
Ability to read and implement model architectures and inference techniques from research papers.
Demonstrate the ability to contribute performant and maintainable code and debug in complex ML codebases.

Preferred qualifications:

Deep understanding of KV-cache memory management, prefix caching, and hybrid model serving.
Familiarity with RL frameworks and algorithms for LLMs.

Explore our comprehensive directory of visa sponsorship jobs from employers worldwide who are ready to sponsor talented international professionals.

Experience with multimodal inference (audio/image/video/text).
Contributions to open-source ML or system infrastructure projects.

Bonus points if you have:

Implemented core features in vLLM or other inference engine projects.
Contributed to vLLM integrations (verl, OpenRLHF, Unsloth, LlamaFactory, etc).

Interested in opportunities specifically in United State? Discover our dedicated Visa Sponsorship Jobs in United State page featuring roles from top employers in this location.

Written widely-shared technical blogs or side projects on vLLM or LLM inference.

Logistics

Location: This role is based in San Francisco, California. Will consider remote in the US for exceptional candidates.
Compensation: Depending on background, skills, and experience, the expected annual salary range for this position is $200,000 - $400,000 USD + equity.
Visa sponsorship: We sponsor visas on a case-by-case basis.
Benefits: Inferact offers generous health, dental, and vision benefits as well as 401(k) company match.

Compensation Range: $200K - $400K

Job Overview

Posted Date Jun 19, 2026

Employment Type Full-time

Experience Level Not Applicable

Location United State

Annual Salary 200 - 400 USD

Category Programming

Company inferact

Mentioned Skills

Similar Jobs

Explore other opportunities that match your interests

Lead AI Engineer

Programming

•

22m ago

Premium Job

•••••• •••••• ••••••

Job Type ••••••

Experience Level ••••••

Capital One

United State

Founding Technical Architect

Programming

•

1h ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Entry level

analogue

United State

Backend Rust Engineer (AI Training) - Contract

Programming

•

1h ago

Visa Sponsorship Relocation Remote

Job Type Contract

Experience Level Not Applicable

Alignerr

United State

Inference Runtime Engineer

Key Highlights

Technical Skills Required

Benefits & Perks

Nice to Have

Job Description

Job Overview

Mentioned Skills

Industries

Similar Jobs

Lead AI Engineer

Premium Job

Capital One

Founding Technical Architect

analogue

Backend Rust Engineer (AI Training) - Contract

Alignerr

Subscribe our newsletter