Data Architect
GMG
2 - 5 years
Gurugram
Posted: 17/02/2026
Job Description
What we do:
GMG is a global well-being company retailing, distributing and manufacturing a portfolio of leading international and home-grown brands across sport, everyday goods, health and beauty, properties and logistics sectors. Under the ownership and management of the Baker family for over 45 years, GMG is a valued partner of choice for the world's most successful and respected brands in the well-being sector. Working across the Middle East, North Africa, and Asia, GMG has introduced more than 120 brands across 12 countries. These include notable home-grown brands such as Sun & Sand Sports, Dropkick, Supercare Pharmacy, Farm Fresh, Klassic, and international brands like Nike, Columbia, Converse, Timberland, Vans, Mama Sita's, and McCain.
What will you do:
We are hiring a Data Architect to own the end-to-end architecture and engineering standards of our data and AI platform. This is a hands-on individual contributor role with leadership responsibility for 2 engineers. You will design, implement, and operate scalable, secure, and cost-effective data infrastructure across Databricks on AWS, enabling analytics/BI, classical ML, and GenAI/Agentic AI workloads
Role Summary:
- Own the data platform architecture (ingestion lake/warehouse serving) and its operating model.
- Lead implementation of infrastructure, orchestration, CI/CD, observability, quality, lineage, and governance.
- Architect and enable BI, MLOps, and Agentic AI platform capabilities.
- Evaluate and introduce fit-for-purpose tools (open-source preferred) to solve team challenges.
- Set engineering best practices and manage delivery through a small team.
Responsibilities:
Data platform & infrastructure ownership:
- Own platform architecture on AWS + Databricks, ensuring scalability, security, reliability, and cost efficiency.
- Define the target architecture across batch pipelines, streaming patterns, storage formats, and compute policies.
- Implement infrastructure-as-code using Terraform, including environments, networking dependencies (as needed), and platform configuration.
Architecture for BI, ML, and Agentic AI:
- Design architecture patterns for:
- BI data serving and exports to downstream BI stacks (e.g., Fabric) through governed, performant datasets.
- MLOps foundations: training/inference patterns (batch-first), model registry/versioning approach, monitoring integration.
- Agentic AI infrastructure: secure retrieval patterns, tool access boundaries, prompt/tool governance, and audit logs (platform-level enablers, not use-case specifics).
- Ensure architectural decisions support both experimentation and production-grade operation.
Data engineering best practices & SDLC:
- Establish engineering standards: branching strategy, PR reviews, release/versioning, code quality gates, and documentation.
- Implement CI/CD for data pipelines and infrastructure; enforce Git-based workflows and environment promotion.
- Promote modular, reusable pipeline patterns and templates for the team.
Data quality, lineage, and governance:
- Implement quality frameworks: freshness/completeness/validity checks, anomaly detection on key measures.
- Establish lineage and metadata management; define how datasets are documented and discoverable.
- Own data classification (PII/sensitive), retention policies, and secure access patterns (RBAC/ABAC).
Tooling strategy (open-source preferred):
- Evaluate and introduce fit-for-purpose tools in areas like:
- Observability/monitoring
- Data quality and testing
- Lineage/catalog
- Orchestration enhancements
- Secrets management and policy enforcement
- Make pragmatic build-vs-buy decisions with clear TCO and operational fit.
Data modeling (added advantage):
- Guide and review modeling patterns (dimensional/entity models) to ensure consistent, reusable datasets for reporting, analytics and ML.
How does success look like:
- A stable, scalable platform with clear architectural standards and high engineering quality.
- Pipelines are reliable with defined SLAs/SLOs, strong observability, and reduced incident frequency.
- CI/CD and Git-based SDLC are adopted; changes are predictable, versioned, and easy to roll back.
- BI/ML/GenAI platform foundations are in place and are enabling faster delivery across teams.
- Measurable cost/performance improvements (job runtimes, compute spend, data freshness reliability).
- The 2 engineers operate with clarity, quality, and autonomy under your guidance.
Technical Competencies:
- 10+ years in data engineering / data platform / data architecture roles with hands-on delivery.
- Proven ownership of end-to-end data platforms (lake/warehouse + orchestration + governance).
- Experience leading small teams and driving engineering standards and change management.
- Strong stakeholder management and ability to balance speed, quality, and control.
Required technical skills:
Mandatory:
- Databricks on AWS platform understanding (workloads, jobs, cluster policies, Delta/Lakehouse concepts).
- Strong Terraform (IaC) for cloud/platform infrastructure.
- Containerization & runtime: Docker, Kubernetes (deployment patterns, environment management).
- Orchestration: Airflow (DAG design, retries, backfills, SLAs).
- Data transformation practices (dbt familiarity preferred; tool-agnostic standards accepted).
- CI/CD implementation, Git workflows, branching/release strategy.
- Strong understanding of data platform concerns: ingestion, streaming concepts, outbound patterns, quality, lineage, retention, and classification.
- Security fundamentals: IAM/RBAC, secrets management, auditability, PII handling.
Good to have:
- Deep dbt experience (macros, tests, docs, environment promotion).
- Lakeflow jobs experience / Databricks Workflows depth.
- Experience with open-source tools in:
- Data quality (e.g., Great Expectations / Soda)
- Lineage/catalog (e.g., OpenLineage / DataHub / Amundsen)
- Observability (e.g., Prometheus/Grafana stack)
- Strong data modeling background (dimensional + metrics layer thinking).
- Experience with ML platform patterns and LLM/RAG platform guardrails.
Qualification & Experience:
- Graduation or Masters in Statistics, Mathematics, Computer Science or equivalent
- 10+ years in data engineering / data platform / data architecture roles with hands-on delivery
Services you might be interested in
Improve Your Resume Today
Boost your chances with professional resume services!
Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.
