Senior Software Engineer - LLM Compression Infrastructure

ora computing Austria
Visa Sponsorship Relocation
Apply
AI Summary

Design and structure the software stack for compressing large language models with 80%+ compression without retraining. Build clean, well-scoped libraries, internal tooling, and integrate compression output with inference engines. Seeking experienced software engineer with strong code design skills, GPU experience, and production-grade Python expertise.

Key Highlights
Own how the software stack is built
Design a compression pipeline that is fully automated
Set the engineering bar for the team as they hire
Key Responsibilities
Design and refactor core libraries (pruning, quantization, retraining, evaluation)
Build internal tooling (CI, benchmarks, reproducible runs)
Integrate compression output with inference engines
Set the engineering bar for the team as they hire
Technical Skills Required
GPU experience Production-grade Python CI/CD Benchmarking Reproducible runs
Benefits & Perks
Visa sponsorship
Relocation support
Competitive salary (€70–120k base + equity)
Hybrid or fully remote work options
Nice to Have
Open-source contributions to ML infrastructure
CUDA, Triton, or kernel-level work
Experience designing a library from scratch
Familiarity with model serving and inference optimization

Job Description


Ora Computing · Vienna · Full-time


We compress large language models (LLMs). Our information-theoretic structural pruning and quantization algorithm shrinks model footprints by over 80% without retraining, in hours rather than weeks.

We closed a €3.5M seed in 2026 (Constructor, Greencode Ventures, XISTA) and are working with customers in automotive, edge inference, and cloud.


The role


You'll own how our software stack is built. Today the codebase reflects four people moving fast, it works, but it needs structure. Your job is to give it that structure: well-designed libraries, robust packages, environments, the kind of codebase that scales as we grow the team and ship more to customers.

This is not a glue-code role. You'll work between the algorithm and inference layer: designing a compression pipeline that is fully automated and takes target runtimes into account. You'll design the abstractions our compression pipeline runs on and make them fast.


What you'll work on


  • Designing and refactoring our core libraries — pruning, quantization, retraining, evaluation — into clean, well-scoped packages
  • Building the internal tooling that lets the team move quickly without breaking things — CI, benchmarks, reproducible runs
  • Integrating our compression output with inference engines (vLLM, TensorRT-LLM, llama.cpp) and customer deployment targets
  • Setting the engineering bar for the team as we hire


What we're looking for


  • Bachelor’s/Master's in computer science or equivalent, plus 2+ years of professional software engineering
  • Strong opinions about code design. You know what a well-structured library looks like and why
  • GPU experience — memory hierarchy, kernels, what bottlenecks performance — even if you don't write CUDA daily
  • Production-grade Python. You write code others can read, extend, and trust
  • You finish things and you care about the codebase you leave behind


Bonus


  • Open-source contributions to ML infrastructure (vLLM, llama.cpp, transformers, TensorRT-LLM, PyTorch internals)
  • CUDA, Triton, or kernel-level work
  • Experience designing a library from scratch that other engineers ended up using
  • Familiarity with model serving and inference optimization


Practical


  • Vienna-based. Hybrid or fully remote
  • Working language is English
  • We sponsor visas and support relocation
  • Compensation: €70–120k base + equity. Austrian minimum disclosed per Kollektivvertrag: €45,738/year
  • You'll set the engineering standards we hire against next


How to apply


Send CV, a code sample you're proud of, and any open-source links to [email protected]. Tell us in two paragraphs what you'd want to build at Ora and why. We respond within a week.



Similar Jobs

Explore other opportunities that match your interests

Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Not Applicable

fonio.ai

Austria
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Mid-Senior level

ora computing

Austria

Agentic Backend Engineer (Golang)

Programming
13h ago
Visa Sponsorship Relocation Remote
Job Type Contract
Experience Level Not Applicable

fiskaly

Austria

Subscribe our newsletter

New Things Will Always Update Regularly