About Pocket FM

Pocket FM is on a mission to deliver personalised and immersive audio experiences to listeners worldwide. We are revolutionising the audio entertainment industry through long-form storytelling, supported by our cutting-edge platform that serves millions of listeners and generates billions of minutes of engagement monthly. We leverage Generative AI in producing content and streamlining operations, developing innovative solutions for cutting-edge challenges in the AI landscape across all modalitiestext, audio, and images. With strong backing and rapid user base growth, Pocket FM is an exciting and dynamic place to join.

The Role: What You'll Build and Own

Design and implement an agentic orchestration framework that Selects optimal video generation models per scene, Constructs and refines prompts dynamically decomposes episode-level goals into scene-level tasks, manages generation, validation, and refinement loops
Build a multi-agent system that can translate high-level episode briefs into structured scripts, break scripts into scenes, shots, and animation beats, select visual style, pacing, and emotional tone parameters, trigger the appropriate video models and pipelines
Develop automated prompt engineering strategies, model selection heuristics (or learned selection policies), self-refinement and critique loops, quality control mechanisms (LLM- or vision-based evaluators)
Create orchestration logic for scene continuity (character consistency, environment persistence), Style preservation across the episode, Temporal coherence,Budget / compute optimisation
Build evaluation frameworks that measure narrative coherence, Visual consistency, Style fidelity, Emotional alignment, Anime-specific quality metrics
Optimise for minimal human intervention, scalable production, robust failure recovery, and reproducibility

The Ideal Candidate: Who You Are

You are someone who is experiences in building-

An agent-based orchestration engine
Automated prompt generation and refinement modules
Model selection and routing layer
Episode planner (hierarchical decomposition system)
Feedback-driven improvement loops
Evaluation and scoring system
Production-ready pipeline for end-to-end anime episode generation

Your Technical Toolkit:

Masters or PhD in Computer Science, AI, ML, or related field
Strong experience with Large Language Models (LLMs), multimodal generative models, prompt engineering and prompt optimisation, python and production ML systems
Hands-on experience building agentic systems (e.g., ReAct, AutoGPT-style, planning agents), tool-using LLM systems, and Orchestration pipelines
Deep understanding of video generation models, Model evaluation and benchmarking and experimentation frameworks.

Preferred Qualification:

Experience with video diffusion or text-to-video systems, character consistency techniques (LoRA, embeddings, adapters),scene planning or hierarchical generation, reinforcement learning or policy learning and automated content evaluation systems
Familiarity with anime production workflows, storyboarding, shot composition and pacing, diffusion models, and narrative structure
Experience deploying distributed ML systems, GPU-accelerated pipelines and cloud-based ML infrastructure

Research Engineer - Video Generation

Pocket FM

Job Description

Services you might be interested in

Improve Your Resume Today