Senior Kernel Engineer - AI Accelerator

densityai • United State

Visa Sponsorship

Apply

AI Summary

Write, evaluate, and profile specialized compute kernels for a custom AI accelerator. Develop profiling infrastructure and define kernel programming models. Collaborate with architecture and compiler teams to optimize tensor operations and memory hierarchies.

Key Highlights

Custom AI accelerator kernel development

Performance profiling and optimization

Kernel DSL design and MLIR dialect integration

Tensor operations: GEMM, convolution, attention

C/C++ and CUDA experience required

Key Responsibilities

Write and optimize compute kernels for a custom AI accelerator

Develop and maintain profiling infrastructure to measure kernel performance

Define and document shuffle patterns for ML kernel primitives

Drive kernel DSL design decisions

Enable end-to-end kernel execution on the architectural simulator

Collaborate with the compiler team on the MLIR dialect

Create onboarding documentation and kernel writing guides

Technical Skills Required

C/C++ CUDA Computer architecture Performance profiling Tensor operations Python

Benefits & Perks

Base salary: $200k - $360k USD per year

Equity grant

Medical / dental / vision

401(k)

Standard PTO

Nice to Have

RISC-V, x86, or ARM64 ISA experience

MLIR or LLVM compiler infrastructure

HPC or scientific computing background

FPGA or Verilog/SystemVerilog experience

Familiarity with CUTLASS, Triton, or similar kernel libraries

Job Description

About The Role

You will write, evaluate, and profile specialized compute kernels that run on a custom AI accelerator. This is the critical interface between high-level ML workloads and silicon — your code directly determines how effectively the hardware performs. You'll work closely with the architecture and compiler teams to define the kernel programming model, implement core tensor operations, and drive the performance profiling workflow that validates silicon design decisions.

What you'll do

Write and optimize compute kernels for a custom AI accelerator — tensor operations, data movement patterns, memory hierarchy exploitation
Develop and maintain profiling infrastructure to measure kernel performance against architectural targets
Define and document shuffle patterns for ML kernel primitives across CPU-like control, tensor cores, and CUTLASS-style operations
Drive kernel DSL design decisions — thread spawn mechanisms, register passing conventions, and memory management strategies
Enable end-to-end kernel execution on the architectural simulator

Searching for Development & Programming roles that provide visa sponsorship? Connect with international employers through Development & Programming Jobs with Visa Sponsorship opportunities actively seeking talented professionals.

Collaborate with the compiler team on the MLIR dialect — your kernels are the primary validation target
Create onboarding documentation and kernel writing guides for the broader team

What we're looking for

C/C++ — production-grade systems code, not scripted glue. You'll write performance-critical kernels.
CUDA or equivalent accelerator programming — deep experience writing GPU kernels, understanding warp/wavefront execution, memory coalescing, shared memory optimization. The mental model transfers directly.
Computer architecture — you need to reason about pipelines, memory hierarchies, data movement costs, and how software maps to hardware.

Explore our comprehensive directory of visa sponsorship jobs from employers worldwide who are ready to sponsor talented international professionals.

Performance profiling and optimization — you live in profilers. Identifying bottlenecks, measuring throughput, and iterating until kernels meet targets is the core loop.
Tensor operations — practical understanding of GEMM, convolution, attention, reduction, and scatter/gather as they map to hardware.
Python — for scripting, DSL integration, and profiling automation.
(Optional) RISC-V, x86, or ARM64 ISA experience
(Optional) MLIR or LLVM compiler infrastructure
(Optional) HPC or scientific computing background (large-scale parallel compute intuition)
(Optional) FPGA or Verilog/SystemVerilog (ability to read RTL and reason about the hardware you're targeting)
(Optional) Familiarity with CUTLASS, Triton, or similar kernel libraries

Interested in opportunities specifically in United State? Discover our dedicated Visa Sponsorship Jobs in United State page featuring roles from top employers in this location.

Compensation

Base salary: $200k – $360k USD per year, depending on experience and qualifications. Final offers depend on level, location, and skills relevant to the role. Additional compensation: equity grant per company guidelines; medical / dental / vision; 401(k); standard PTO. Discussed in detail during the interview.

Visa sponsorship

DensityAI sponsors qualified candidates for H-1B, O-1, TN, E-3, and other employment-based visas, and we welcome applicants on F-1 OPT and STEM-OPT. Work authorization is required at start; we provide immigration support to secure or transfer status.

Equal Opportunity

DensityAI is an Equal Opportunity Employer. We do not discriminate on the basis of race, color, religious creed, national origin, ancestry, physical or mental disability, medical condition, genetic information, marital status, sex, gender, gender identity, gender expression, age (40+), sexual orientation, military or veteran status, pregnancy, or any other status protected by law. We comply with the California CROWN Act and provide reasonable accommodations on request.

Job Overview

Posted Date Jun 04, 2026

Employment Type Full-time

Experience Level Not Applicable

Location United State

Annual Salary 0 - 0 USD

Category Programming

Company densityai

Mentioned Skills

Industries

Similar Jobs

Explore other opportunities that match your interests

Senior Software Engineer, Machine Learning Platform

Programming

•

1h ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Mid-Senior level

carnaby fox

United State

Founding Engineer - AI Workflow Automation

Programming

•

11h ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Mid-Senior level

differential.

United State

Research and Engineering Engineer

Programming

•

12h ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Not Applicable

preference model

United State

Senior Kernel Engineer - AI Accelerator

Key Highlights

Key Responsibilities

Technical Skills Required

Benefits & Perks

Nice to Have

Job Description

Job Overview

Mentioned Skills

Industries

Similar Jobs

Senior Software Engineer, Machine Learning Platform

carnaby fox

Founding Engineer - AI Workflow Automation

differential.

Research and Engineering Engineer

preference model

Subscribe our newsletter