Senior ML Systems Engineer
Credflow AI
5 - 10 years
Gurugram
Posted: 15/03/2026
Job Description
PrimaLabs builds systems that help enterprises run large-scale AI workloads efficiently on real hardware. Our focus is on optimizing inference performance, cost, and reliability across modern accelerator platforms.
We work directly with enterprise customers deploying frontier models on next-generation GPUs and AI accelerators. Our platform continuously discovers optimal runtime configurations to maximize throughput, reduce latency, and improve cost efficiency.
Role Overview
PrimaLabs is hiring a Senior ML Systems Engineer to own the optimization engine that runs on real customer hardware.
You will work on tuning and benchmarking inference systems across GPUs like NVIDIA H200 Tensor Core GPU, NVIDIA B200 Tensor Core GPU, and AMD Instinct MI300X.
Your work will power PrimaLabs automated optimization stack, including runtime tuning, benchmarking pipelines, and integration with large-scale hyperparameter search frameworks such as DeepHyper.
You will also work directly with customers during deployments, ensuring our system delivers measurable performance gains on real production infrastructure.
Key Responsibilities
Inference Runtime Optimization
- Tune and optimize inference systems using vLLM and SGLang
- Profile model performance across different hardware and runtime configurations
- Identify and eliminate performance bottlenecks (memory bandwidth, kernel inefficiencies, batching behavior)
Benchmarking & Performance Analysis
- Design and execute benchmark suites for real customer workloads
- Measure throughput, latency, memory utilization, and cost efficiency
- Build standardized benchmarking frameworks for new models and hardware
Optimization Infrastructure
- Build systems for large-scale configuration sweeps and automated tuning
- Integrate runtime parameters, hardware constraints, and workload characteristics into search pipelines
- Maintain and extend the DeepHyper-based optimization pipeline
Customer Deployments
- Work directly on enterprise deployments running on modern AI accelerators
- Support benchmarking and optimization during customer onboarding
- Deliver performance improvements tailored to customer hardware environments
Hardware-Aware Systems Engineering
- Optimize workloads across GPUs including:
- NVIDIA H200 Tensor Core GPU
- NVIDIA B200 Tensor Core GPU
- AMD Instinct MI300X
- Understand memory hierarchy, GPU scheduling, and model parallelism strategies
Required Background
- 5+ years experience in ML infrastructure or high-performance ML systems
- Deep experience with LLM inference runtimes
- Strong skills in:
- Performance profiling
- GPU utilization optimization
- Systems debugging
- Hands-on experience with:
- vLLM, SGLang, or similar inference runtimes
- GPU profiling tools
- Python + systems-level debugging
Nice to Have
- Experience working with large-scale inference serving systems
- Familiarity with GPU kernel profiling tools (Nsight, ROCm profiler)
- Experience with distributed inference or model parallelism
- Exposure to hyperparameter optimization frameworks such as DeepHyper
- Previous work with cutting-edge AI hardware deployments
What Makes This Role Unique
- Work directly with next-generation AI hardware
- Solve real performance problems on enterprise deployments
- Build the core optimization engine of PrimaLabs
- Close collaboration with founders and direct impact on customer success
Services you might be interested in
Improve Your Resume Today
Boost your chances with professional resume services!
Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.
