Artificial Intelligence Engineer
TerraGiG
2 - 5 years
Pune
Posted: 31/01/2026
Job Description
ROLE: GENAI
Experience: 5+ exp
Work Mode: onsite
The client is looking for a GenAI Engineer who can design and deploy end-to-end solutions. Key focus areas include Vector Embeddings, RAG Pipelines, and FastAPI integration. Please review the full Job Description below and focus your preparation on the "Deep-Dive" topics provided.
Full Job Description
Role Overview
We are seeking a GenAI Engineer to design, develop, and deploy Generative AI solutions that enhance business workflows and user experiences. The ideal candidate will have strong expertise in LLMs (Large Language Models), prompt engineering, and integration of AI services into scalable applications.
Key Responsibilities
Model Integration: Implement/fine-tune LLMs; build APIs/microservices for GenAI features.
Prompt Engineering: Design, optimize, and evaluate prompts for safety and accuracy.
RAG (Retrieval-Augmented Generation): Develop pipelines for document ingestion, vector embeddings, and semantic search.
App Dev: Integrate GenAI into web/mobile apps using FastAPI, Streamlit, or React.
Optimization: Monitor token usage, latency, and inference costs.
Safety: Implement moderation, bias detection, and responsible AI guidelines.
Required Skills
Python (FastAPI, Flask, Django), LLM APIs (OpenAI, Azure), Vector DBs (Pinecone, Weaviate, FAISS).
Cloud (AWS/Azure/GCP), Docker/K8s, ML fundamentals (embeddings, tokenization).
Real-time AI (SSE/WebSockets).
Preferred Skills
LangChain, LlamaIndex, Image models (Stable Diffusion), MLOps, CI/CD.
Technical Deep-Dive: Vector Embeddings
Since the JD specifically asks for knowledge of embeddings and vector databases, your engineers should be prepared to answer the following:
1. Conceptual Understanding
What are they? They are high-dimensional numerical representations of data (text, images, audio). Unlike keyword search, embeddings capture semantic meaning.
Dimensionality: Be familiar with common sizes (e.g., OpenAIs text-embedding-3-small is 1536-dimensional).
Distance Metrics: Know when to use Cosine Similarity (directional similarity) vs. Euclidean Distance (magnitude-based) vs. Dot Product.
2. Implementation Challenges
Chunking: How to break a 100-page PDF into chunks so the embedding captures context without losing detail.
Normalization: Why we normalize vectors to unit length before storing them (crucial for Cosine Similarity performance).
Matryoshka Embeddings: (Advanced 2026 topic) Being able to explain how to shorten vectors (e.g., from 3072 to 256) without losing significant accuracy to save on storage costs.
Suggested Preparation Topics
Pillar 1: The RAG Pipeline
Indexing: The flow from Document -> Chunking -> Embedding -> Vector DB.
Retrieval: Explain Top-K retrieval and how to use "Re-ranking" models (like Cohere Rerank) to improve the quality of the top results.
Pillar 2: Engineering (The "Developer" part)
FastAPI: Be ready to code a basic endpoint that takes a user query and returns a streamed response using StreamingResponse.
Streaming (SSE): Explain why we use SSE for LLMs (to reduce "perceived latency" for the user).
Pillar 3: Evaluation & Operations
LLM-as-a-Judge: Using a stronger model (GPT-4o) to grade the outputs of a smaller model.
Token Management: How to implement a "sliding window" or "summary-based memory" to keep context without hitting token limits or high costs.
Services you might be interested in
Improve Your Resume Today
Boost your chances with professional resume services!
Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.
