The Role

You'll be the architect and owner of Neo's AI infrastructure. This means training custom models for our unique use cases, building production ML pipelines, and creating the reasoning systems that make Neo intelligent. You'll work across the full ML lifecycle - from data pipelines to model training to production deployment.

What You'll Own

1. Custom Model Development & Training

Build specialized models that foundation models can't provide. Train speaker diarization for Indian accents, fine-tune embedding models for conversational memory, develop custom NER for Hindi-English code-mixing, and optimize models for edge deployment.

Key Challenges :

Train speaker diarization models on Indian multi-speaker conversations with code-mixing Fine-tune embedding models for semantic search across temporal context Build custom NER/entity linking for Hindi-English mixed conversations Optimize transformer models for mobile deployment with <100ms latency Handle class imbalance in emotion detection and intent classification

Tech Stack : PyTorch/TensorFlow for model training, Hugging Face for fine-tuning, ONNX/TensorRT for optimization

2. Memory Architecture & ML Pipeline

Build the brain that remembers everything. Design temporal knowledge graphs that ingest conversations, extract entities and relationships using custom-trained models, and enable longitudinal pattern detection. Own the full ML pipeline from data ingestion to model inference to graph updates.

Key Challenges :

Bi-temporal data models with real-time updates

Entity linking across noisy conversational transcripts

Relationship extraction using fine-tuned sequence models

Pattern detection with unsupervised learning (clustering, anomaly detection) Privacy-preserving embeddings and federated learning

Tech Stack : PyTorch for custom models, Neo4j/graph databases, vector databases (Qdrant), streaming pipelines

3. Audio Processing & Speech ML

Own the end-to-end speech pipeline. Train/fine-tune ASR models for Indian languages, build speaker diarization systems, develop audio quality assessment models, and optimize for edge deployment. Handle the unique challenges of Indian conversational speech.

Key Challenges :

Fine-tune Whisper/wav2vec2 for 15+ Indian languages with code-mixing Train speaker diarization models handling overlapping speech

Build voice activity detection for noisy environments

Develop audio quality assessment using CNNs

Optimize models for real-time mobile inference (quantization, pruning) Tech Stack : PyTorch, TorchAudio, Kaldi, ESPnet, model compression techniques

4. Intelligence & Reasoning Layer

Create the query understanding and reasoning system. Build hybrid retrieval combining dense embeddings with graph traversal, train ranking models for result quality, develop proactive insight detection, and fine-tune LLMs for conversational queries.

Key Challenges :

Train re-ranking models for temporal query results

Fine-tune LLMs for Hindi-English conversational queries

Build classification models for query intent and temporal scope

Develop anomaly detection for proactive insights

Handle distribution shift as user behavior evolves

Tech Stack : PyTorch, sentence-transformers, LLM fine-tuning (LoRA, QLoRA), scikit-learn

5. Multi-Agent Systems & Orchestration

Design agent orchestration where specialized AI agents collaborate. Train classifier models for routing queries, build reward models for agent evaluation, develop action prediction models, and create meta-learning systems that improve over time.

Key Challenges :

Train intent classification for agent routing

Build RL-based systems for multi-step action planning

Develop evaluation models for agent output quality

Create meta-learning pipelines for continuous improvement

Handle conflicting agent recommendations with trained arbitration models Tech Stack : PyTorch, Ray for distributed training, custom RL implementations

6. NeoCore SDK & ML Infrastructure

Build enterprise ML APIs with custom model serving. Design multi-tenant architecture with model versioning, build A/B testing infrastructure, implement model monitoring and drift detection, and create auto-scaling inference pipelines.

Key Challenges :

Sub-100ms inference at scale with model optimization

Multi-tenant model serving with resource isolation

A/B testing infrastructure for model experiments

Automated retraining pipelines on concept drift

Custom domain fine-tuning for enterprise clients

Tech Stack : FastAPI, model serving (TorchServe, TensorFlow Serving), MLOps tools, Docker/K8s

Technical Stack You'll Master

ML/DL Frameworks : PyTorch (primary), TensorFlow/Keras, JAX

Model Training : Distributed training, mixed precision, gradient accumulation, hyperparameter tuning

Model Optimization : Quantization, pruning, distillation, ONNX, TensorRT MLOps : Experiment tracking (Weights & Biases, MLflow), model versioning, CI/CD for ML Speech/NLP : Transformers, wav2vec2, Whisper, BERT variants, custom architectures Traditional ML : Scikit-learn, XGBoost, clustering, dimensionality reduction Infrastructure : Python async, distributed systems, GPU optimization, streaming pipelines Data : Graph databases, vector databases, real-time analytics

What Success Looks Like

3 Months :

Custom speaker diarization model in production with >85% accuracy Fine-tuned embedding model powering memory search

ML pipeline processing 10K+ conversations daily with <500ms latency First enterprise deployments live

6 Months :

Edge-optimized models reducing cloud inference costs by 60%

Proactive insight detection using unsupervised learning

Multi-agent workflows with trained routing and arbitration

A/B testing infrastructure validating model improvements

12 Months :

Automated retraining pipelines maintaining model quality

You've built an ML engineering team

Core AI systems are defensible competitive moats

Models outperform generic foundation models on domain tasks

Who You Are

Must-Have:

2-5 years building and deploying ML/DL models in production serving real users at scale

Strong PyTorch or TensorFlow expertise : training, optimization, debugging, deployment

End-to-end ML ownership : data pipeline model training production monitoring iteration

Deep learning fundamentals : architectures (CNNs, RNNs, Transformers), optimization, regularization

Production ML systems : model serving, A/B testing, monitoring, retraining pipelines Python expert : async programming, optimization, profiling, debugging System design : distributed systems, high throughput, low latency, GPU optimization Pragmatic builder : ship fast, validate with data, iterate based on metrics

Strong Plus:

Speech processing (ASR, diarization, TTS) or NLP (NER, embeddings, generation) Knowledge graphs and graph neural networks

Model compression and edge deployment (quantization, pruning, distillation) LLM fine-tuning (LoRA, RLHF, prompt engineering)

Multi-agent systems and reinforcement learning

Indian language experience (Hindi, Tamil, Telugu, etc.)

Open-source ML contributions or research publications

Experience with Hugging Face ecosystem

Why This Role is Special

Greenfield ML Problems : Train models for problems that don't have pre-trained solutions - Indian accent diarization, Hindi-English entity linking, temporal conversation understanding. Build from first principles.

Own the Full Stack : Not just calling APIs. Train models, build data pipelines, optimize for edge, deploy at scale, monitor quality, iterate based on metrics.

Founding Team Equity : Meaningful equity in a fast-growing startup defining a new category.

Exceptional Team : Work with technical founders (IIT Madras AI background) who understand ML deeply. Small team, high autonomy, first-principles thinking.

Real Impact : Your models power how families stay connected, professionals manage relationships, and enterprises build conversation intelligence.

Market Timing : Ambient computing is nascent. The models you build will set standards for conversational AI infrastructure.

What We Offer

Location : Bangalore (Onsite - we ship hardware, need to be hands-on)

Culture : High autonomy, ship-focused, weekly demos, direct feedback

Perks : Learning budget, conference passes, MacBook Pro + GPU workstation, full ML experimentation budget

Equity : Meaningful ownership in a fast-growing startup

How We Work

Ship weekly : Models reach production every week, not quarters

First principles : Question assumptions, validate with ablation studies

Deep work : Protected focus blocks for training runs, batched meetings

Direct communication : No corporate BS, honest technical feedback

AI-assisted development : Leverage Claude/Copilot for 3-4x productivity Experiment rigorously : Track everything, A/B test model changes, data-driven decisions

Interview Stages

1. Initial Screening (30 min) : Chat about your ML background and approach to a real Neo problem

2. Technical Deep Dive (2 hours) :

ML fundamentals discussion (architectures, optimization, debugging) System design for ML at scale

Coding: implement a model component in PyTorch

Live model debugging/optimization exercise

3. Founder Chat (1 hour) : Team meet, vision alignment, compensation discussion

Real Problems You'll Solve (Examples)

1. Train a speaker diarization model that handles 4+ speakers in Hindi-English code-mixed conversations with background noise

2. Fine-tune an embedding model for semantic search where "What did Sarah say about the budget?" retrieves conversations from 3 months ago

3. Build a temporal NER system that links "my manager" mentioned today to "Priya" from last week's conversation

4. Optimize a Transformer model from 200ms to <50ms latency for mobile deployment without accuracy loss

5. Design an RL system where agents learn to proactively remind users of forgotten commitments

These aren't interview questions. These are Tuesday problems.

Artificial Intelligence Engineer

Recro

Job Description

Services you might be interested in

Improve Your Resume Today