Job Title: Senior Data Architect (Enterprise Data Platforms & GenAI/ML Enablement)

ABOUT HASHEDIN

We are software engineers who solve business problems with a Product Mindset for leading global organizations.

By combining engineering talent with business insight, we build software and products that can create new enterprise value.

The secret to our success is a fast-paced learning environment, an extreme ownership spirit, and a fun culture.

WHY SHOULD YOU JOIN US?

With the agility of a start-up and the opportunities of an enterprise, every day at HashedIn, your work will make an impact that matters.

So, if you are a problem solver looking to thrive in a dynamic fun culture of inclusion, collaboration, and high performance HashedIn is the place to be!

From learning to leadership, this is your chance to take your software engineering career to the next level.

Role Summary

The Senior Data Architect is responsible for defining and delivering enterprise-grade data architectures and reference implementations across complex, regulated environments. You will design scalable, secure, and cost-efficient data platforms and products spanning batch and real-time processing, lakehouse/warehouse, governance, lineage, and master/reference data. You will also enable GenAI/ML outcomes on governed enterprise data (RAG, semantic search, AI-driven insights, and agentic automation), partnering with Engineering, Security, Infrastructure, BI, and Data Science teams to translate business needs into target-state architectures, standards, and execution roadmaps.

Key Responsibilities

1) Architecture, Standards & Roadmaps

Define target-state enterprise data architecture and multi-year modernization roadmaps (cloud/hybrid), including migration waves and operating model impacts.
Produce high-quality architecture artifacts: logical/physical models, integration patterns, data flows, domain/data product designs, and non-functional requirements.
Own the architecture blueprint end-to-end (ingestion storage transformation serving) for analytics and AI.
Set engineering standards and define reusable reference architectures and frameworks, templates and accelerators:
Medallion patterns (Bronze/Silver/Gold) or equivalent layered curation
Data products, domain boundaries/ownership, and contract-first delivery
Data contracts, schema evolution/versioning, and deprecation/backward compatibility

2) Ingestion, Integration, CDC & Streaming

Standardize ingestion/integration patterns across API, batch, file, and CDC (where applicable).
Architect streaming/event-driven platforms at scale: Kafka/Event Hubs/Kinesis + Flink/Spark Structured Streaming/Kafka Streams (or equivalents).
Define reliability patterns: idempotency, dedupe, watermarking/late data handling, replay/backfill, and operational runbooks.

3) Modeling, Curation & Serving (BI + AI)

Model and curate data for consumption: curated datasets, dimensional marts, and semantic alignment for BI; fit-for-purpose datasets for ML/GenAI.
Define serving patterns: governed SQL, semantic layers, APIs, and activation/reverse ETL (as needed).
Establish performance/cost standards: partitioning/clustering, compaction, file sizing, workload isolation, and unit-cost/FinOps guardrails.
Standardize storage/table tech: Parquet/ORC and Delta/Iceberg/Hudi.

4) GenAI, RAG & Agentic Solutioning

Solution agentic workflows for enterprise automation using Google ADK and/or AutoGen:
Tool/function calling, planning vs execution, memory patterns, and human-in-the-loop approvals
Guardrails: prompt-injection defenses, least-privilege tool access, grounding/provenance patterns, safe fallbacks
Define scalable patterns for RAG/semantic search:
Document ingestion (normalize/classify/dedupe), chunking strategy, metadata enrichment/ACL tagging
Embeddings pipelines, vector index/retrieval, and secure context delivery
Leverage platform-native AI where appropriate:
Databricks (e.g., Mosaic AI, model serving; agent accelerators such as Agent Bricks where applicable)
Snowflake (Cortex, Snowpark integration, governance-aligned AI consumption)
Gemini Enterprise / Google AI suites (where applicable to client standards)
Knowledge graph / semantic modeling implementation (entity resolution, taxonomy/ontology alignment; hybrid graph + vector retrieval).

5) MLOps / LLMOps / AgentOps Enablement

Establish production practices: reproducible training/serving datasets, registry integration, CI/CD for data + ML/GenAI pipelines, promotion across environments.
Define evaluation + release gating: groundedness/quality metrics, safety checks, regression tests, monitoring/drift signals, and cost/performance baselines.
Support model hosting/inference patterns (batch + real-time) and operational monitoring.

6) Governance, Security, Compliance & Run-Ready Ops

Define/enforce standards for modeling, quality SLAs/SLOs (freshness/latency/completeness), metadata/catalog/lineage, and auditability.
Implement compliance-by-design controls: RBAC/ABAC concepts, row/column security, masking/tokenization, encryption, retention, and private connectivity patterns.
Extend governance to AI usage: approved datasets for AI, access-controlled retrieval, and auditable context usage (per policy).
Improve run readiness: observability (freshness/latency/failures), alerting, incident response, DR/backup/restore, and RPO/RTO awareness/testing approach.
Architect and guide implementation of governance platforms (catalog, lineage, stewardship workflows) and ensure alignment with regulations (GDPR, CCPA, HIPAA as applicable).

7) Master & Reference Data (MDM)

Architect MDM and reference data solutions including domain ownership, golden record strategies, survivorship rules, and integration patterns.
Define how master/reference data is published/consumed across analytical and operational systems.

8) Stakeholder Partnership & Technical Leadership

Act as a primary technical advisor; lead architecture reviews, mentor senior engineers, and drive cross-team alignment and decision-making.
Partner with Security/Platform/BI/Product to ensure coherent enterprise-wide solutions.

Required Skills & Experience

10+ years in data engineering/architecture delivering enterprise-scale data platforms (lakehouse/warehouse).
Strong architecture depth across batch + streaming/event-driven patterns (Spark, Kafka or equivalents).
Lakehouse/warehouse experience: Snowflake, Databricks/Delta Lake, BigQuery, Redshift, Synapse (or similar).
Advanced SQL + data modeling for analytics and AI consumption; performance/cost optimization at scale.
Strong knowledge of open formats and table tech: Parquet/ORC; Delta/Iceberg/Hudi.
Pipeline/orchestration experience (hands-on or leading teams): Airflow; ADF/Glue/Dataflow/NiFi/Snowpipe (or equivalents).
Governance and security implementation: Purview/Collibra/DataZone (or equivalents); fine-grained access controls and audit logging.
Cloud architecture fundamentals (AWS/Azure/GCP): networking, IAM, KMS, private connectivity, and production hardening.
Executive-ready communication and the ability to drive architecture trade-offs.

GenAI/Agentic (Required)

Experience enabling GenAI/ML on enterprise data, including at least one of:
RAG/semantic search pipelines (docs embeddings retrieval)
ML dataset/feature foundations + MLOps/LLMOps integration
Platform-native AI (e.g., Databricks Mosaic AI or Snowflake Cortex)
Familiarity with agent frameworks/SDKs (e.g., Google ADK, AutoGen) and guardrail/evaluation patterns.

Preferred / Differentiators

Hands-on depth in Databricks (Unity Catalog, DLT, Workflows, Mosaic AI) and/or Snowflake (Cortex, Snowpark) and/or Iceberg-based lakehouse stacks.
Data observability/quality frameworks (Great Expectations/Deequ), DataOps CI/CD, and policy-as-code patterns.
Consulting/engagement skills: solution design, estimation, roadmapping, and multi-team delivery.

Senior Technical Architect

HashedIn by Deloitte

Job Description

Services you might be interested in

We Search & Apply Jobs for You!