Research Engineer - Video Generation
Pocket FM
2 - 5 years
Bengaluru
Posted: 17/02/2026
Job Description
About Pocket FM
Pocket FM is on a mission to deliver personalised and immersive audio experiences to listeners worldwide. We are revolutionising the audio entertainment industry through long-form storytelling, supported by our cutting-edge platform that serves millions of listeners and generates billions of minutes of engagement monthly. We leverage Generative AI in producing content and streamlining operations, developing innovative solutions for cutting-edge challenges in the AI landscape across all modalitiestext, audio, and images. With strong backing and rapid user base growth, Pocket FM is an exciting and dynamic place to join.
The Role: What You'll Build and Own
- Design and implement an agentic orchestration framework that Selects optimal video generation models per scene, Constructs and refines prompts dynamically decomposes episode-level goals into scene-level tasks, manages generation, validation, and refinement loops
- Build a multi-agent system that can translate high-level episode briefs into structured scripts, break scripts into scenes, shots, and animation beats, select visual style, pacing, and emotional tone parameters, trigger the appropriate video models and pipelines
- Develop automated prompt engineering strategies, model selection heuristics (or learned selection policies), self-refinement and critique loops, quality control mechanisms (LLM- or vision-based evaluators)
- Create orchestration logic for scene continuity (character consistency, environment persistence), Style preservation across the episode, Temporal coherence,Budget / compute optimisation
- Build evaluation frameworks that measure narrative coherence, Visual consistency, Style fidelity, Emotional alignment, Anime-specific quality metrics
- Optimise for minimal human intervention, scalable production, robust failure recovery, and reproducibility
The Ideal Candidate: Who You Are
You are someone who is experiences in building-
- An agent-based orchestration engine
- Automated prompt generation and refinement modules
- Model selection and routing layer
- Episode planner (hierarchical decomposition system)
- Feedback-driven improvement loops
- Evaluation and scoring system
- Production-ready pipeline for end-to-end anime episode generation
Your Technical Toolkit:
- Masters or PhD in Computer Science, AI, ML, or related field
- Strong experience with Large Language Models (LLMs), multimodal generative models, prompt engineering and prompt optimisation, python and production ML systems
- Hands-on experience building agentic systems (e.g., ReAct, AutoGPT-style, planning agents), tool-using LLM systems, and Orchestration pipelines
- Deep understanding of video generation models, Model evaluation and benchmarking and experimentation frameworks.
Preferred Qualification:
- Experience with video diffusion or text-to-video systems, character consistency techniques (LoRA, embeddings, adapters),scene planning or hierarchical generation, reinforcement learning or policy learning and automated content evaluation systems
- Familiarity with anime production workflows, storyboarding, shot composition and pacing, diffusion models, and narrative structure
- Experience deploying distributed ML systems, GPU-accelerated pipelines and cloud-based ML infrastructure
Services you might be interested in
Improve Your Resume Today
Boost your chances with professional resume services!
Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.
