Key Responsibilities

Core Video Analysis: Implement and optimize fundamental video processing algorithms, including Optical Flow, Motion Estimation, and Background Subtraction to handle dynamic sports footage.
Spatio-Temporal Modeling: Develop models that understand time as well as space, utilizing 3D CNNs, RNNs/LSTMs, or Temporal Transformers to recognize actions and events over sequences of frames.
Object Tracking & Re-Identification: Build robust tracking pipelines using industry-standard algorithms (e.g., Kalman Filters, SORT, DeepSORT) to maintain identity of players and objects across occlusions.
Advanced Architectures: Research and integrate state-of-the-art models, including Vision Transformers (ViT) and Attention Mechanisms, to improve accuracy beyond traditional CNN limits.
Agentic AI Workflows: Assist in designing Agentic AI systems where autonomous agents plan multi-step video analysis tasks (e.g., deciding when to focus on specific game events) with minimal human intervention.
Data & Pipeline Strategy: Manage video datasets and collaborate with the engineering team to deploy efficient inference pipelines.

Required Skills & Qualifications

Education: Bachelors or Masters degree in Computer Science, AI, Data Science, or a related field.
Core Computer Vision: Strong understanding of traditional CV concepts:
Image Geometry & Camera Calibration
Feature Extraction (SIFT, SURF, ORB)
Image Filtering & Edge Detection
Deep Learning for Video: In-depth knowledge of neural network architectures tailored for video:
CNNs (ResNet, EfficientNet) for spatial features.
Sequence Models (RNN, LSTM, GRU) for temporal dependencies.
3D CNNs (C3D, I3D, X3D) for spatiotemporal feature learning.
Transformers & Attention: Understanding of Self-Attention mechanisms, Vision Transformers (ViT), and how they differ from convolutional approaches.
Programming: Proficiency in Python with libraries like OpenCV, NumPy, Pandas, and Scikit-learn.
Frameworks: Hands-on experience with PyTorch or TensorFlow.

Good to Have (Bonus)

Experience with Agentic AI frameworks (e.g., LangChain) applied to visual tasks.
Knowledge of Multimodal AI (Video + Audio/Text).
Familiarity with model optimisation tools (TensorRT, ONNX) for real-time video inference.

Application Link:

Computer Vision Engineer