PrimaLabs builds systems that help enterprises run large-scale AI workloads efficiently on real hardware. Our focus is on optimizing inference performance, cost, and reliability across modern accelerator platforms.

We work directly with enterprise customers deploying frontier models on next-generation GPUs and AI accelerators. Our platform continuously discovers optimal runtime configurations to maximize throughput, reduce latency, and improve cost efficiency.

Role Overview

PrimaLabs is hiring a Senior ML Systems Engineer to own the optimization engine that runs on real customer hardware.

You will work on tuning and benchmarking inference systems across GPUs like NVIDIA H200 Tensor Core GPU, NVIDIA B200 Tensor Core GPU, and AMD Instinct MI300X.

Your work will power PrimaLabs automated optimization stack, including runtime tuning, benchmarking pipelines, and integration with large-scale hyperparameter search frameworks such as DeepHyper.

You will also work directly with customers during deployments, ensuring our system delivers measurable performance gains on real production infrastructure.

Key Responsibilities

Inference Runtime Optimization

Tune and optimize inference systems using vLLM and SGLang
Profile model performance across different hardware and runtime configurations
Identify and eliminate performance bottlenecks (memory bandwidth, kernel inefficiencies, batching behavior)

Benchmarking & Performance Analysis

Design and execute benchmark suites for real customer workloads
Measure throughput, latency, memory utilization, and cost efficiency
Build standardized benchmarking frameworks for new models and hardware

Optimization Infrastructure

Build systems for large-scale configuration sweeps and automated tuning
Integrate runtime parameters, hardware constraints, and workload characteristics into search pipelines
Maintain and extend the DeepHyper-based optimization pipeline

Customer Deployments

Work directly on enterprise deployments running on modern AI accelerators
Support benchmarking and optimization during customer onboarding
Deliver performance improvements tailored to customer hardware environments

Hardware-Aware Systems Engineering

Optimize workloads across GPUs including:
NVIDIA H200 Tensor Core GPU
NVIDIA B200 Tensor Core GPU
AMD Instinct MI300X
Understand memory hierarchy, GPU scheduling, and model parallelism strategies

Required Background

5+ years experience in ML infrastructure or high-performance ML systems
Deep experience with LLM inference runtimes
Strong skills in:
Performance profiling
GPU utilization optimization
Systems debugging
Hands-on experience with:
vLLM, SGLang, or similar inference runtimes
GPU profiling tools
Python + systems-level debugging

Nice to Have

Experience working with large-scale inference serving systems
Familiarity with GPU kernel profiling tools (Nsight, ROCm profiler)
Experience with distributed inference or model parallelism
Exposure to hyperparameter optimization frameworks such as DeepHyper
Previous work with cutting-edge AI hardware deployments

What Makes This Role Unique

Work directly with next-generation AI hardware
Solve real performance problems on enterprise deployments
Build the core optimization engine of PrimaLabs
Close collaboration with founders and direct impact on customer success

Senior ML Systems Engineer

Credflow AI

Job Description

Services you might be interested in

Improve Your Resume Today