🔔 FCM Loaded

Senior SRE – Observability & Datadog Migration

HireAlpha

5 - 10 years

Bengaluru

Posted: 31/01/2026

Getting a referral is 5x more effective than applying directly

Job Description

Role: SRE / DevOps Engineer(Prometheus/ Grafana to Datadog Migration)

Location: Bangalore (Work From Office)

Experience Required: 5+ Years

Employment Type: Contractual

Contract Duration:

6 months (Extendable based on performance and management decision)

Project Start Date:

1st March, 2026 (Immediate to 15 days joiners preferred)

Interview Process:

Technical Screening + Technical Assessment


Experience Required:

Must Have:

- Atleast 5 years of relevant experience in working on Observability stack as defined above.

- Has managed and operated Datadog Platform.

- Strong communication skills to interact with global teams.

- Fundamental knowledge of working and operating on AWS using IAC practices.



Beginning March, we need to start a new project for migration of our Observability Infra Stack from self hosted AWS ( Prometheus/Grafana, Loki,Mimir) to Datadog Solution ( SAAS).


The good resources that will focus on Engineering deliverables set by the organization SRE Team for migration.



SKILLS:

1. Working Knowledge of Prometheus and PromQL:

- Ability to read, understand, and modify existing PromQL queries, dashboards, and alerting rules, including common aggregations and label usage.


2. Grafana and Alertmanager Familiarity:

- Experience navigating Grafana dashboards and Alertmanager configurations to understand intent, thresholds, and alert routing.


3. Datadog Dashboarding and Monitors

- Hands-on experience creating Datadog dashboards and monitors based on defined requirements, using existing patterns and guidance.


4. Query and Alert Semantics Translation

- Ability to accurately map PromQL queries and Alertmanager rules to Datadog equivalents, recognising non-1:1 translations, validating statistical correctness, and documenting functional differences where exact parity is not possible.


5. Observability Concepts

- Understanding of metrics vs logs vs traces, alert thresholds, and standard monitoring practices in production environments.

6. Team Collaboration

- Ability to work with engineering teams to validate migrated dashboards and alerts, following structured validation checklists.


7. Clear Execution and Documentation

- Documenting migrated assets, assumptions, and validation outcomes in a consistent, predefined format.


8. Automation Skills

- Proficient is building tooling using python to reduce engineering toil for these migration activities.


Nice to Have:

- AWS Administrator Certifications.

Services you might be interested in

Improve Your Resume Today

Boost your chances with professional resume services!

Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.