🔔 FCM Loaded

Level 3 AWS Infrastructure Support Engineer

Electronikmedia (EM)

2 - 5 years

Kochi

Posted: 23/12/2025

Getting a referral is 5x more effective than applying directly

Job Description

Role Overview

As a Level 3 AWS Infrastructure Support Engineer , you will own overnight monitoring and response for Electronikmedias Clients' AWS-based production environment. You will:

  • Monitor system health using Datadog and AWS-native tools
  • Investigate alerts and anomalies using established runbooks
  • Resolve production incidents when possible
  • Escalate complex issues quickly and accurately
  • Maintain clean, auditable incident documentation

This role is ideal for someone who thrives in high-trust, high-impact operational environments.

Key ResponsibilitiesOn-Call & Incident Response
  • Provide initial response within 15 minutes for all high-priority production alerts
  • Investigate, mitigate, and resolve production outages when feasible
  • Escalate unresolved or complex issues using the defined escalation matrix
  • Act as the owner of the production system stability
Monitoring, Alerting & Observability
  • Analyze and respond to Datadog monitor alerts across infrastructure and application layers
  • Identify abnormal patterns, trend-line deviations, and early indicators of systemic risk
  • Proactively notify stakeholders of significant performance or stability concerns
  • Contribute insights for preventive and corrective actions
Root Cause & Trend Analysis
  • Track recurring alerts and incidents
  • Provide analysis and recommendations to reduce alert noise and improve system resilience
  • Participate in weekly validation of Datadog alert configurations and thresholds
Communication & Documentation
  • Maintain clear, concise, and timely communication during incidents
  • Document all incidents, alarms, and observations in Jira during each shift
  • Ensure handoff notes are complete and actionable for daytime engineering teams
Technical EnvironmentCore AWS Services
  • ECS (Fargate)
  • RDS
  • ElastiCache
  • EC2
  • Lambda
  • API Gateway
  • S3
Tooling
  • Datadog (monitoring, alerts, dashboards)
  • Jira (incident tracking and documentation)
QualificationsExperience
  • 5+ years of hands-on AWS infrastructure administration and support
  • Proven experience supporting production-grade, high-availability systems
  • Strong background in incident response within enterprise or scale-up environments
Skills
  • Deep operational knowledge of AWS services and distributed systems
  • Strong troubleshooting and root-cause analysis skills under tight SLAs
  • Ability to follow runbooks while also knowing when to think beyond them
  • Calm, structured decision-making during production incidents
Certifications (Preferred)
  • AWS Certified Solutions Architect Associate or Professional
  • AWS Certified DevOps Engineer Professional (Nice to Have)
Service Level Expectations
  • Alert Escalation SLA: 15 minutes for high-priority alarms
  • Availability: Consistent overnight coverage ( IST Day Shift )
  • Reliability: Zero missed critical alerts during assigned coverage windows
Deliverables
  • Monthly Service Performance Report , including:
  • Alerts monitored
  • Incidents resolved
  • Escalations
  • SLA adherence metrics
  • Weekly Datadog Validation , ensuring alert accuracy and functionality


Services you might be interested in

Improve Your Resume Today

Boost your chances with professional resume services!

Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.