Artificial Intelligence Engineer
Recro
2 - 5 years
Bengaluru
Posted: 08/01/2026
Job Description
The Role
You'll be the architect and owner of Neo's AI infrastructure. This means training custom models for our unique use cases, building production ML pipelines, and creating the reasoning systems that make Neo intelligent. You'll work across the full ML lifecycle - from data pipelines to model training to production deployment.
What You'll Own
1. Custom Model Development & Training
Build specialized models that foundation models can't provide. Train speaker diarization for Indian accents, fine-tune embedding models for conversational memory, develop custom NER for Hindi-English code-mixing, and optimize models for edge deployment.
Key Challenges :
Train speaker diarization models on Indian multi-speaker conversations with code-mixing Fine-tune embedding models for semantic search across temporal context Build custom NER/entity linking for Hindi-English mixed conversations Optimize transformer models for mobile deployment with <100ms latency Handle class imbalance in emotion detection and intent classification
Tech Stack : PyTorch/TensorFlow for model training, Hugging Face for fine-tuning, ONNX/TensorRT for optimization
2. Memory Architecture & ML Pipeline
Build the brain that remembers everything. Design temporal knowledge graphs that ingest conversations, extract entities and relationships using custom-trained models, and enable longitudinal pattern detection. Own the full ML pipeline from data ingestion to model inference to graph updates.
Key Challenges :
Bi-temporal data models with real-time updates
Entity linking across noisy conversational transcripts
Relationship extraction using fine-tuned sequence models
Pattern detection with unsupervised learning (clustering, anomaly detection) Privacy-preserving embeddings and federated learning
Tech Stack : PyTorch for custom models, Neo4j/graph databases, vector databases (Qdrant), streaming pipelines
3. Audio Processing & Speech ML
Own the end-to-end speech pipeline. Train/fine-tune ASR models for Indian languages, build speaker diarization systems, develop audio quality assessment models, and optimize for edge deployment. Handle the unique challenges of Indian conversational speech.
Key Challenges :
Fine-tune Whisper/wav2vec2 for 15+ Indian languages with code-mixing Train speaker diarization models handling overlapping speech
Build voice activity detection for noisy environments
Develop audio quality assessment using CNNs
Optimize models for real-time mobile inference (quantization, pruning) Tech Stack : PyTorch, TorchAudio, Kaldi, ESPnet, model compression techniques
4. Intelligence & Reasoning Layer
Create the query understanding and reasoning system. Build hybrid retrieval combining dense embeddings with graph traversal, train ranking models for result quality, develop proactive insight detection, and fine-tune LLMs for conversational queries.
Key Challenges :
Train re-ranking models for temporal query results
Fine-tune LLMs for Hindi-English conversational queries
Build classification models for query intent and temporal scope
Develop anomaly detection for proactive insights
Handle distribution shift as user behavior evolves
Tech Stack : PyTorch, sentence-transformers, LLM fine-tuning (LoRA, QLoRA), scikit-learn
5. Multi-Agent Systems & Orchestration
Design agent orchestration where specialized AI agents collaborate. Train classifier models for routing queries, build reward models for agent evaluation, develop action prediction models, and create meta-learning systems that improve over time.
Key Challenges :
Train intent classification for agent routing
Build RL-based systems for multi-step action planning
Develop evaluation models for agent output quality
Create meta-learning pipelines for continuous improvement
Handle conflicting agent recommendations with trained arbitration models Tech Stack : PyTorch, Ray for distributed training, custom RL implementations
6. NeoCore SDK & ML Infrastructure
Build enterprise ML APIs with custom model serving. Design multi-tenant architecture with model versioning, build A/B testing infrastructure, implement model monitoring and drift detection, and create auto-scaling inference pipelines.
Key Challenges :
Sub-100ms inference at scale with model optimization
Multi-tenant model serving with resource isolation
A/B testing infrastructure for model experiments
Automated retraining pipelines on concept drift
Custom domain fine-tuning for enterprise clients
Tech Stack : FastAPI, model serving (TorchServe, TensorFlow Serving), MLOps tools, Docker/K8s
Technical Stack You'll Master
ML/DL Frameworks : PyTorch (primary), TensorFlow/Keras, JAX
Model Training : Distributed training, mixed precision, gradient accumulation, hyperparameter tuning
Model Optimization : Quantization, pruning, distillation, ONNX, TensorRT MLOps : Experiment tracking (Weights & Biases, MLflow), model versioning, CI/CD for ML Speech/NLP : Transformers, wav2vec2, Whisper, BERT variants, custom architectures Traditional ML : Scikit-learn, XGBoost, clustering, dimensionality reduction Infrastructure : Python async, distributed systems, GPU optimization, streaming pipelines Data : Graph databases, vector databases, real-time analytics
What Success Looks Like
3 Months :
Custom speaker diarization model in production with >85% accuracy Fine-tuned embedding model powering memory search
ML pipeline processing 10K+ conversations daily with <500ms latency First enterprise deployments live
6 Months :
Edge-optimized models reducing cloud inference costs by 60%
Proactive insight detection using unsupervised learning
Multi-agent workflows with trained routing and arbitration
A/B testing infrastructure validating model improvements
12 Months :
Automated retraining pipelines maintaining model quality
You've built an ML engineering team
Core AI systems are defensible competitive moats
Models outperform generic foundation models on domain tasks
Who You Are
Must-Have:
2-5 years building and deploying ML/DL models in production serving real users at scale
Strong PyTorch or TensorFlow expertise : training, optimization, debugging, deployment
End-to-end ML ownership : data pipeline model training production monitoring iteration
Deep learning fundamentals : architectures (CNNs, RNNs, Transformers), optimization, regularization
Production ML systems : model serving, A/B testing, monitoring, retraining pipelines Python expert : async programming, optimization, profiling, debugging System design : distributed systems, high throughput, low latency, GPU optimization Pragmatic builder : ship fast, validate with data, iterate based on metrics
Strong Plus:
Speech processing (ASR, diarization, TTS) or NLP (NER, embeddings, generation) Knowledge graphs and graph neural networks
Model compression and edge deployment (quantization, pruning, distillation) LLM fine-tuning (LoRA, RLHF, prompt engineering)
Multi-agent systems and reinforcement learning
Indian language experience (Hindi, Tamil, Telugu, etc.)
Open-source ML contributions or research publications
Experience with Hugging Face ecosystem
Why This Role is Special
Greenfield ML Problems : Train models for problems that don't have pre-trained solutions - Indian accent diarization, Hindi-English entity linking, temporal conversation understanding. Build from first principles.
Own the Full Stack : Not just calling APIs. Train models, build data pipelines, optimize for edge, deploy at scale, monitor quality, iterate based on metrics.
Founding Team Equity : Meaningful equity in a fast-growing startup defining a new category.
Exceptional Team : Work with technical founders (IIT Madras AI background) who understand ML deeply. Small team, high autonomy, first-principles thinking.
Real Impact : Your models power how families stay connected, professionals manage relationships, and enterprises build conversation intelligence.
Market Timing : Ambient computing is nascent. The models you build will set standards for conversational AI infrastructure.
What We Offer
Location : Bangalore (Onsite - we ship hardware, need to be hands-on)
Culture : High autonomy, ship-focused, weekly demos, direct feedback
Perks : Learning budget, conference passes, MacBook Pro + GPU workstation, full ML experimentation budget
Equity : Meaningful ownership in a fast-growing startup
How We Work
Ship weekly : Models reach production every week, not quarters
First principles : Question assumptions, validate with ablation studies
Deep work : Protected focus blocks for training runs, batched meetings
Direct communication : No corporate BS, honest technical feedback
AI-assisted development : Leverage Claude/Copilot for 3-4x productivity Experiment rigorously : Track everything, A/B test model changes, data-driven decisions
Interview Stages
1. Initial Screening (30 min) : Chat about your ML background and approach to a real Neo problem
2. Technical Deep Dive (2 hours) :
ML fundamentals discussion (architectures, optimization, debugging) System design for ML at scale
Coding: implement a model component in PyTorch
Live model debugging/optimization exercise
3. Founder Chat (1 hour) : Team meet, vision alignment, compensation discussion
Real Problems You'll Solve (Examples)
1. Train a speaker diarization model that handles 4+ speakers in Hindi-English code-mixed conversations with background noise
2. Fine-tune an embedding model for semantic search where "What did Sarah say about the budget?" retrieves conversations from 3 months ago
3. Build a temporal NER system that links "my manager" mentioned today to "Priya" from last week's conversation
4. Optimize a Transformer model from 200ms to <50ms latency for mobile deployment without accuracy loss
5. Design an RL system where agents learn to proactively remind users of forgotten commitments
These aren't interview questions. These are Tuesday problems.
Services you might be interested in
Improve Your Resume Today
Boost your chances with professional resume services!
Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.
