Senior AI Engineer (LLM Systems & Evaluation)
BambooBox
5 - 10 years
Bengaluru
Posted: 05/03/2026
Job Description
Job Description:
BambooBox is a 5 year old, B2B SaaS start-up, offering a platform for growth marketers. With its advanced AI and ML-driven platform, BambooBox helps companies deliver untapped marketing intelligence to business teams and increase the marketing contribution to the overall revenue. We work with clients like Airtel Business, Darwinbox, Acalvio, and os. We are backed by investors such as Peak XV (earlier Sequoia Surge), Emergent Ventures, and Arc180. We are currently scaling the organization and are in a high-growth phase.
Job Location- Bangalore
Job Type- Work from office
Role Overview:
We are looking for a Senior AI Engineer who can design, build, evaluate, and scale LLM-powered systems in production. This role goes beyond prompt engineeringyou will work on model orchestration, evaluation frameworks, retrieval pipelines, and reliability of AI systems.
You will collaborate closely with product, backend, and data teams to ship high-impact AI features that are measurable, reliable, and continuously improving.
Key Responsibilities
LLM & AI System Development
Design and implement LLM-powered applications using APIs from OpenAI, Anthropic, Gemini, or similar
Build agentic workflows, tool-calling pipelines, and multi-step reasoning systems
Integrate LLMs with RAG pipelines (vector databases, embeddings, chunking strategies, retrieval tuning)
Optimize prompts, system instructions, and context windows for accuracy, cost, and latency
Evaluation & Quality
Design robust evaluation frameworks for LLM outputs (accuracy, relevance, faithfulness, toxicity, hallucination)
Build and maintain offline and online evals (gold datasets, synthetic evals, regression tests)
Define success metrics and guardrails for AI features
Analyze failure cases and iteratively improve system performance
Production & Reliability
Ship AI systems to production with monitoring, logging, and alerting
Handle model drift, prompt regressions, and API changes
Implement fallback strategies, retries, and safety controls
Optimize cost vs quality trade-offs across models and workflows
Collaboration & Ownership
Work closely with Product to translate business problems into AI system designs
Review code, mentor junior engineers, and contribute to architectural decisions
Document design decisions, evaluation methodology, and learnings
Required Skills & Experience
Core Requirements
36 years of software engineering or ML/AI engineering experience
Hands-on experience with LLM APIs (OpenAI, Anthropic, Gemini, etc.)
Strong experience building production-grade AI systems
Proven experience designing LLM evaluation frameworks
Solid understanding of:
Prompt engineering patterns
RAG architectures
Embeddings and vector databases
Token limits, latency, and cost optimization
Programming & Systems
Strong proficiency in Python
Experience with async systems, APIs, and microservices
Familiarity with REST, event-driven systems, or distributed systems
Experience deploying AI services (Docker, Kubernetes, or cloud services)
Good to Have (Strong Plus)
Experience with LLM eval tools (custom evals, LangSmith, Ragas, OpenAI Evals, etc.)
Experience building agent frameworks or multi-agent systems
Knowledge of retrieval tuning, re-ranking, and hybrid search
Experience with observability for AI systems
Exposure to security, privacy, and safety considerations for AI
Experience in B2B SaaS or data-heavy systems
What We Value
Strong engineering disciplinetests, evals, metrics, and reliability
Comfort with ambiguity and fast iteration
Systems thinking over demo-driven AI
Bias for ownership and measurable outcomes
Curiosity to keep up with rapidly evolving LLM capabilities
Services you might be interested in
Improve Your Resume Today
Boost your chances with professional resume services!
Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.
