Manager - Projects

Cognizant

8 - 12 years

Chennai

Posted: 21/01/2025

Job Description



Job Summary

  • Incident Management: Lead and manage high-priority incident responses with a sense of urgency and efficiency. Analyze troubleshoot and resolve complex system issues spanning across multiple technology stacks.
  • FMEA: Perform Hot spot analysis and Service map Analysis to Identify potential risks and Vulnerabilities for Faults
  • NFRs and Quality Gates: Refine and Drive NFRs and Quality gates including Safe release patters and Reliability and Resiliency requirements in Releases.

  • Responsibilities

  • Full-Stack Expertise: Use extensive knowledge of both front-end and back-end technologies to understand and debug system issues quickly. Implement solutions that encompass all layers of the application and infrastructure stack.
  • Enterprise Systems Knowledge: Use deep understanding of enterprise-level network and middleware technologies to find root causes of incidents and provide sustainable solutions.
  • Problem Management: Drive continuous improvement initiatives by analyzing incident trends finding recurring issues and implementing initiative-taking measures to enhance system reliability and performance. Review and refine SRE standards and processes focusing on incident response and reducing toil.
  • Collaboration: Work closely with development operations and other IT teams to ensure cohesive and effective incident management. Facilitate post-incident reviews and share learnings across the organization.
  • Automation and Tooling: Develop and implement automation tools and scripts to streamline Diagnostic package incident response and resolution processes. Provide feedback on Enhanced monitoring and alerting systems to detect issues proactively.
  • Documentation: Keep detailed documentation of incidents resolutions and system changes to ensure knowledge sharing and compliance with IT governance standards.
  • Observability & Self Heal: Provide leading indicators and Drive Observability maturity. Drive Development Self heal capabilities with Various teams.
  • Assess and measure performance resiliency and reliability of apps with focus on Observability and monitoring practices like SLAs SLOs etc

    Experience in Dynatrace

    Configure monitoring and logging of systems in order to obtain better visibility

    Help design processes that automatically evaluate system SLA

    Be proactive identify and remediate issues before SLAs are violated

    Tool-agnostic and approach-centric

    Required Knowledge/Skills Education and Experience

    8-12 years in the software industry with 4+ years in an SRE or DevOps role

    Profound knowledge of full-stack technologies legacy servers middleware cloud platforms (AWS) containerization technologies (e.g. Docker Kubernetes) and databases (SQL NoSQL)

    Experience with container management and infrastructure monitoring tools

    Expertise in enterprise network architectures protocols middleware technologies and API management Tool / process

    Programming skills in high-level languages like Python Java Ruby or JavaScript

    Automation experience with scripting and API development (e.g. Ansible Terraform Shell Python)

    2+ years with observability tools and containerization

    Preferred Knowledge/Skills Education and Experience

    Experience with AWS Terraform CloudFormation and incident tracking tools.

    Certifications in AWS Observability and monitoring tools

    Experience with log management tools

    Ensure system reliability getting systems back to steady-state as quickly as possible

    About Company

    Cognizant is a global leader in technology and consulting services, helping businesses transform their operations through digital solutions. Specializing in IT services, including software development, business process outsourcing, and consulting, Cognizant supports clients across industries such as healthcare, financial services, manufacturing, and retail. With a focus on innovation, Cognizant assists organizations in modernizing their technology, improving operational efficiency, and enhancing customer experiences. Headquartered in the U.S., it is consistently ranked among the most admired companies in the world and is a member of the NASDAQ-100.

    Services you might be interested in

    One-Shot Campaign

    Reach out to ideal employees in one shot!

    The intelligent campaign for reaching out to the ideal audience to whom you can ask for help (guidance or referral).