🔔 FCM Loaded

Principal Machine Learning Engineer - Multimodal AI & Inference

Mulya Technologies

2 - 5 years

Bengaluru

Posted: 16/12/2025

Getting a referral is 5x more effective than applying directly

Job Description

Principal Machine Learning Engineer - Multimodal AI & Inference

Bangalore

Founded in 2023,by Industry veterans HQ in California,US

  • We are revolutionizing sustainable AI compute through intuitive software with composable silicon

Overview:

You will design, optimize, and deploy large multimodal models (language, vision, audio, video) to run efficiently on a compact, high-performance AI appliance capable of supporting 100B+ parameter models at real-time speeds. Your mission is to deliver state-of-the-art multimodal inference locally through advanced model optimization, quantization, and system-level integration.

Key Responsibilities:

1. Model Integration & Porting

  • Optimize large-scale foundation models (e.g., Llama, gpt-oss, Whisper, HiDream, Qwen, Wan etc) for on-device inference.
  • Adapt pre-trained models for multimodal tasks (text, image, audio, video, or cross-modal reasoning).
  • Ensure seamless interoperability between modalities e.g., enabling the system to see, hear, and talk naturally.

2. Model Optimization for Edge Hardware

  • Quantize and compress large models (4-bit or mixed precision) while maintaining high accuracy and low latency.
  • Implement and benchmark inference runtimes using frameworks like Llama.cpp, Ollama, vLLM, ONNX etc.
  • Collaborate with hardware engineers to co-design model architectures optimized for the appliances compute fabric.

3. Inference Pipeline Development

  • Build and maintain scalable, high-throughput inference pipelines capable of handling concurrent multimodal requests (text, audio, image, video).
  • Implement token streaming, caching, and scheduling strategies for real-time responses.
  • Develop APIs for low-latency local inference accessible via a web interface.

4. Evaluation & Benchmarking

  • Profile and benchmark performance (throughput, latency, energy efficiency) of deployed models.
  • Run regression tests to validate numerical accuracy after quantization or pruning.
  • Define KPIs for multimodal model performance under real-world usage.

5. Research & Prototyping

  • Investigate emerging multimodal architectures and lightweight model variants for local deployment.
  • Prototype hybrid models that combine LLMs, diffusion models, and ASR/TTS pipelines for advanced multimodal applications.
  • Stay current on state-of-the-art inference frameworks, compression techniques, and multimodal learning trends.

Required Qualifications:

  • Strong background in deep learning and model deployment, with hands-on experience in PyTorch and/or TensorFlow.
  • Expertise in model optimization quantization, pruning, distillation, or mixed-precision inference.
  • Practical knowledge of inference engines (vLLM, llama.cpp, ONNX Runtime or similar).
  • Experience deploying large models locally or on edge devices with limited memory/compute constraints.
  • Familiarity with multimodal model architectures e.g., CLIP, Flamingo, LLaVA, or AudioGPT-style systems.
  • Strong software engineering skills (Python, C++, CUDA) and experience integrating models into production systems.
  • Understanding of GPU/accelerator utilization, memory bandwidth optimization, and distributed inference.

Preferred Qualifications:

experience-10+ years

  • Experience with model-parallel or tensor-parallel inference at scale.
  • Contributions to open-source inference frameworks or model serving systems.
  • Familiarity with hardware-aware training or co-optimization of neural networks and hardware.
  • Background in speech, vision, or multimodal ML research.
  • Track record of deploying models that run entirely offline or on embedded/edge systems.


Contact:

Uday

Mulya Technologies

"Mining The Knowledge Community"

Services you might be interested in

Improve Your Resume Today

Boost your chances with professional resume services!

Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.