🔔 FCM Loaded

Artificial Intelligence Engineer

Shunya Labs

5 - 10 years

Gurugram

Posted: 12/02/2026

Getting a referral is 5x more effective than applying directly

Job Description

About US

Shunya Labs is building the Voice AI Infrastructure Layer for Enterprises powering speech intelligence, conversational agents, and domain-specific voice applications across industries. Born from deep work in mental-health AI and built for global enterprise scale, our stack combines state-of-the-art ASR/TTS models with an open-weights philosophy , driving accuracy, privacy, and scalability.


About the Role

Were seeking an AI Systems Engineer with an experience of 5-10 years, who thrives at the intersection of AI model optimization, infrastructure engineering, and applied research.


You will evaluate, host, and optimize a wide range of AI models spanning ASR, LLMs, and multimodal systems, and build the orchestration layer that powers scalable, low-latency deployments.

This is a role for someone whos comfortable navigating ambiguity, researching emerging AI methods, and translating client requirements into robust, production-ready solutions.

Youll work across the full stack, from GPU inference tuning to React-based control dashboards, building a resilient and scalable AI delivery platform.


Key Responsibilities

AI Model Evaluation & Optimization

  • Evaluate, benchmark, and optimize AI models (speech, text, vision, multimodal) for latency, throughput, and accuracy.
  • Implement advanced inference optimizations using ONNX Runtime, TensorRT, quantization, and GPU batching.
  • Continuously research and experiment with the latest AI runtimes, serving frameworks, and model architectures.
  • Develop efficient caching and model loading strategies for multi-tenant serving.


AI Infrastructure & Orchestration

  • Design and develop a central orchestration layer to manage multi-model inference, load balancing, and intelligent routing.
  • Build scalable, fault-tolerant deployments using AWS ECS/EKS, Lambda, and Terraform.
  • Use Kubernetes autoscaling and GPU node optimization to minimize latency under dynamic load.
  • Implement observability and monitoring (Prometheus, Grafana, CloudWatch) across the model-serving ecosystem.


DevOps, CI/CD & Automation

  • Build and maintain CI/CD pipelines for model integration, updates, and deployment (GitHub Actions, CodePipeline, etc.).
  • Manage Dockerized environments, version control, and GPU-enabled build pipelines.
  • Ensure reproducibility and resilience through infrastructure-as-code and automated testing.


Frontend & Developer Tools

  • Create React/Next.js-based dashboards for performance visualization, latency tracking, and configuration control.
  • Build intuitive internal tools for model comparison, experiment management, and deployment control.
  • Utilize Cursor, VS Code, and other AI-powered development tools to accelerate iteration.
  • Client Interaction & Solutioning
  • Work closely with clients and internal stakeholders to gather functional and performance requirements.
  • Translate abstract business needs into deployable AI systems with measurable KPIs.
  • Prototype quickly, iterate with feedback, and deliver robust production systems.


Research & Continuous Innovation

  • Stay on top of the latest AI research and model releases (OpenAI, Anthropic, Hugging Face, Meta, etc.).
  • Evaluate emerging frameworks for model serving, fine-tuning, and retrieval (LangChain, LlamaIndex, GraphRAG, etc.).
  • Proactively identify and implement performance or cost improvements in the model serving stack.
  • Share learnings and contribute to the internal AI knowledge base.
  • Ambiguous Problem Solving
  • Work effectively in undefined problem spaces, identifying optimal paths forward through experimentation.
  • Break down high-level goals into actionable technical strategies.
  • Balance trade-offs between accuracy, latency, and cost while innovating under uncertainty.


Required Skills

  • Strong proficiency in Python, TypeScript/JavaScript, Bash, and modern software development practices.
  • Deep understanding of Docker, Kubernetes, Terraform, and AWS (ECS, Lambda, S3, CloudFront).
  • Experience with inference optimization (ONNX, TensorRT, quantization, batching).
  • Proven ability to design and scale real-time inference pipelines.
  • Experience building and maintaining CI/CD pipelines and monitoring systems.
  • Hands-on experience with React/Next.js or similar frameworks for dashboard/UI development.
  • Strong grasp of API design, load balancing, and GPU resource management.


Nice to Have

  • Experience with LangChain, LlamaIndex, GraphRAG, or vector databases (FAISS, Neo4j).
  • Familiarity with speech processing models (Whisper, Silero, NeMo, etc.).
  • Prior work with serverless inference or edge AI architectures.
  • Knowledge of data pipelines, model versioning, and MLOps best practices.


Soft Skills

  • Excellent problem-solving in ambiguous, evolving environments.
  • Strong ability to research, self-learn, and prototype emerging AI technologies.
  • Confident communicator who can translate technical findings to business impact.
  • Ownership mindset with a collaborative, solution-oriented approach.

Services you might be interested in

Improve Your Resume Today

Boost your chances with professional resume services!

Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.