Level 3 AWS Infrastructure Support Engineer
Electronikmedia (EM)
2 - 5 years
Kochi
Posted: 23/12/2025
Getting a referral is 5x more effective than applying directly
Job Description
As a Level 3 AWS Infrastructure Support Engineer , you will own overnight monitoring and response for Electronikmedias Clients' AWS-based production environment. You will:
- Monitor system health using Datadog and AWS-native tools
- Investigate alerts and anomalies using established runbooks
- Resolve production incidents when possible
- Escalate complex issues quickly and accurately
- Maintain clean, auditable incident documentation
This role is ideal for someone who thrives in high-trust, high-impact operational environments.
Key ResponsibilitiesOn-Call & Incident Response- Provide initial response within 15 minutes for all high-priority production alerts
- Investigate, mitigate, and resolve production outages when feasible
- Escalate unresolved or complex issues using the defined escalation matrix
- Act as the owner of the production system stability
- Analyze and respond to Datadog monitor alerts across infrastructure and application layers
- Identify abnormal patterns, trend-line deviations, and early indicators of systemic risk
- Proactively notify stakeholders of significant performance or stability concerns
- Contribute insights for preventive and corrective actions
- Track recurring alerts and incidents
- Provide analysis and recommendations to reduce alert noise and improve system resilience
- Participate in weekly validation of Datadog alert configurations and thresholds
- Maintain clear, concise, and timely communication during incidents
- Document all incidents, alarms, and observations in Jira during each shift
- Ensure handoff notes are complete and actionable for daytime engineering teams
- ECS (Fargate)
- RDS
- ElastiCache
- EC2
- Lambda
- API Gateway
- S3
- Datadog (monitoring, alerts, dashboards)
- Jira (incident tracking and documentation)
- 5+ years of hands-on AWS infrastructure administration and support
- Proven experience supporting production-grade, high-availability systems
- Strong background in incident response within enterprise or scale-up environments
- Deep operational knowledge of AWS services and distributed systems
- Strong troubleshooting and root-cause analysis skills under tight SLAs
- Ability to follow runbooks while also knowing when to think beyond them
- Calm, structured decision-making during production incidents
- AWS Certified Solutions Architect Associate or Professional
- AWS Certified DevOps Engineer Professional (Nice to Have)
- Alert Escalation SLA: 15 minutes for high-priority alarms
- Availability: Consistent overnight coverage ( IST Day Shift )
- Reliability: Zero missed critical alerts during assigned coverage windows
- Monthly Service Performance Report , including:
- Alerts monitored
- Incidents resolved
- Escalations
- SLA adherence metrics
- Weekly Datadog Validation , ensuring alert accuracy and functionality
Services you might be interested in
Improve Your Resume Today
Boost your chances with professional resume services!
Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.
