🔔 FCM Loaded

Architect

Birlasoft

10 - 12 years

Hyderabad

Posted: 23/11/2025

Getting a referral is 5x more effective than applying directly

Job Description

Area(s) of responsibility

Job Description: Reliability Architect – 6A

Reliability Architect with over 10 years of experience in proactive monitoring, automation, and observability. Skilled in AIOps/MLOps, infrastructure management, and performance optimization using modern tools and practices. Adept at leading incident response, mentoring support teams, and driving cross-functional collaboration to ensure system reliability and scalability.

 

Key Responsibilities:

  • Monitoring and Automation
    Proactively monitor software systems to prevent incidents and automate routine operational tasks.
  • Effective Monitoring
    Design monitoring systems that trigger alerts based on symptoms rather than outages, ensuring early detection and resolution.
  • Application Performance Monitoring (APM)
    Implement and manage APM tools like New Relic or Dynatrace to track application performance, identify bottlenecks, and optimize resource usage.
  • Log Analysis with Splunk
    Use Splunk to analyze logs for troubleshooting, anomaly detection, and improving system reliability.
  • Dashboards Preparation
    Build intuitive dashboards to visualize system health, performance metrics, and operational KPIs.
  • Alerts Setup
    Configure intelligent alerts based on thresholds and anomalies to ensure timely incident response.
  • Reports Scheduling
    Automate regular reporting to provide insights into system performance, reliability, and trends.
  • Reliability Metrics
    Define and track metrics such as SLOs, SLIs, and error budgets to measure and maintain system reliability.
  • Observability Skills
    Apply observability practices including distributed tracing, logging, and metrics collection to gain deep insights into system behavior.
  • AI-Driven Monitoring & Automation
    Utilize AIOps techniques to proactively detect anomalies, automate incident response, and enable self-healing systems through intelligent alerting and predictive analytics.
  •  Observability & ML Integration
    Integrate machine learning models with observability tools to enhance system insights, optimize performance, and ensure reliability of AI-powered services in production.
  • Cross-Team Collaboration
    Work closely with development and support teams to enhance service reliability through rigorous testing and release procedures.
  • Capacity Planning
    Participate in system design reviews and capacity planning to ensure scalability and performance.
  • Debugging and Incident Response
    Lead incident response efforts, analyze debugging information, and manage rollbacks of faulty software deployments.
  • Mentoring Support Teams
    Guide and mentor L1/L2 support teams to establish best practices in monitoring and observability.
  • Infrastructure Management
    Manage infrastructure using tools like Chef, Ansible, Terraform, GitLab CI/CD, and Kubernetes.
  • Documentation
    Maintain comprehensive documentation of processes and procedures to ensure operational consistency and reduce redundancy.
  • Proactive Mindset
    Approach challenges with enthusiasm, ownership, and a continuous improvement mindset.

 

About Company

Birlasoft is a global IT services and consulting company that is part of the CK Birla Group. It specializes in digital transformation, enterprise application services, and IT modernization for industries such as manufacturing, life sciences, BFSI, and energy. Birlasoft is known for its strong capabilities in SAP, Oracle, cloud, and analytics, helping clients drive innovation, reduce costs, and improve agility.

Services you might be interested in

Improve Your Resume Today

Boost your chances with professional resume services!

Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.