Senior SRE – Observability & Datadog Migration
HireAlpha
5 - 10 years
Bengaluru
Posted: 31/01/2026
Job Description
Role: SRE / DevOps Engineer(Prometheus/ Grafana to Datadog Migration)
Location: Bangalore (Work From Office)
Experience Required: 5+ Years
Employment Type: Contractual
Contract Duration:
6 months (Extendable based on performance and management decision)
Project Start Date:
1st March, 2026 (Immediate to 15 days joiners preferred)
Interview Process:
Technical Screening + Technical Assessment
Experience Required:
Must Have:
- Atleast 5 years of relevant experience in working on Observability stack as defined above.
- Has managed and operated Datadog Platform.
- Strong communication skills to interact with global teams.
- Fundamental knowledge of working and operating on AWS using IAC practices.
Beginning March, we need to start a new project for migration of our Observability Infra Stack from self hosted AWS ( Prometheus/Grafana, Loki,Mimir) to Datadog Solution ( SAAS).
The good resources that will focus on Engineering deliverables set by the organization SRE Team for migration.
SKILLS:
1. Working Knowledge of Prometheus and PromQL:
- Ability to read, understand, and modify existing PromQL queries, dashboards, and alerting rules, including common aggregations and label usage.
2. Grafana and Alertmanager Familiarity:
- Experience navigating Grafana dashboards and Alertmanager configurations to understand intent, thresholds, and alert routing.
3. Datadog Dashboarding and Monitors
- Hands-on experience creating Datadog dashboards and monitors based on defined requirements, using existing patterns and guidance.
4. Query and Alert Semantics Translation
- Ability to accurately map PromQL queries and Alertmanager rules to Datadog equivalents, recognising non-1:1 translations, validating statistical correctness, and documenting functional differences where exact parity is not possible.
5. Observability Concepts
- Understanding of metrics vs logs vs traces, alert thresholds, and standard monitoring practices in production environments.
6. Team Collaboration
- Ability to work with engineering teams to validate migrated dashboards and alerts, following structured validation checklists.
7. Clear Execution and Documentation
- Documenting migrated assets, assumptions, and validation outcomes in a consistent, predefined format.
8. Automation Skills
- Proficient is building tooling using python to reduce engineering toil for these migration activities.
Nice to Have:
- AWS Administrator Certifications.
Services you might be interested in
Improve Your Resume Today
Boost your chances with professional resume services!
Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.
