Luxoft company
Would You like to become a part of Luxoft Team? If yes, this opportunity might be for You!
What do we offer our Employees?
Online recruitment process and onboarding trainings
LuxMed health & dental care, life insurance
MyBenefit program (sports card, well-being program etc.)
Relocation support with Family coverage
Equipment such as a laptop and monitor
Corrective Glasses reimbursement
Special offer of Banking Services
Preferential Car Leasing offer
Paid Referrals
LuxTalent platform (webinars, training, courses with certificates)
@gsikora
Project Description:
Our R&D team is focused on creating the most effective engine for deploying generative AI models, with efforts ranging from precise GPU kernel fine-tuning to comprehensive system optimizations.
We're looking for an expert level engineer with a strong background in either CUDA, ROCm, or Triton kernel optimization. Your role will involve leading substantial improvements in GPU performance and playing a key role in pioneering AI and machine learning initiatives.
Responsibilities:
● Explore and analyze performance bottlenecks in ML training and inference.
● Develop and optimize high-performance computing kernels in Triton, CUDA, and/or ROCm.
● Implement programming solutions in C/C++ and Python.
● Deep dive into GPU performance optimizations to maximize efficiency and speed.
● Collaborate with the team to extend and improve existing machine learning compilers or frameworks such as MLIR, Pytorch, Tensorflow, ONNX Runtime, TensorRT. (This is optional but beneficial)
Mandatory Skills Description:
● Bachelor's, Master’s or PhD’s degree in Computer Science, Electrical
Engineering, or a related field.
● Strong programming skills in C/C++.
● Deep understanding and experience in GPU performance optimizations.
● Proven experience with kernel optimizations on CUDA, ROCm, or other
accelerators.
Nice-to-Have Skills Description:
● General experience with the training and deployment of ML models
● Experience with distributed systems development or distributed ML workloads
● Good programming skills in Python.
● Experience with innovative OSS projects like
FlashAttention, mlc-llm, vllm.
● Experience with machine learning compilers or
frameworks such as TVM, MLIR, Pytorch, Tensorflow, ONNX Runtime, TensorRT.
Languages:
English: C1 Advanced