Login Sign Up

AIOps Engineer

SysTechCorp Inc

2 - 5 years

Bengaluru

Posted: 30/04/2026

Getting a referral is 5x more effective than applying directly

Job Description

Role: AIOps Engineer

Experience Level : 6+ years

Location : Bangalore

Working Mode : Hybrid

Package: max 10LPA


We are building a next-generation, AI-driven observability and outage management platform operating at scale across 700+ enterprise customers. The platform leverages metrics, logs, traces, and events to provide deep system insights, enabling automated signal correlation, root cause analysis, and proactive failure detection. It also incorporates historical knowledge and conversational AI to deliver a Talk to Monitoring experience.

We are seeking engineers with an AI-first mindset who can design and build intelligent, data-driven solutions on top of observability platforms, with a focus on scalable and reliable production systems.


Key Responsibilities:

  • Design and develop AI-driven backend systems for observability and outage management
  • Build intelligent services for event correlation, noise reduction, root cause analysis, anomaly detection, and prediction
  • Develop capabilities for incident summarization, knowledge retrieval, and operational insights
  • Design and optimize data pipelines for large-scale telemetry data (logs, metrics, traces, events)
  • Implement LLM-powered features, including conversational interfaces, RAG pipelines, and automated insights
  • Integrate AI/ML models into production systems, ensuring scalability and reliability
  • Work with OpenTelemetry and observability platforms to process and analyze system signals
  • Collaborate with engineering, SRE, and DevOps teams to build cloud-native solutions on OCI
  • Contribute to system design, code reviews, and platform evolution


Primary Skills & Experience:

  • AI / Machine Learning & Data Engineering (Primary)
  • Strong proficiency in Python for AI/ML and data engineering
  • Experience designing and deploying AI/ML applications in production
  • Hands-on experience with LLMs and APIs (OCI Generative AI, OpenAI, or similar)
  • Experience with prompt engineering, evaluation frameworks, and RAG pipelines
  • Understanding of anomaly detection, pattern recognition, and time-series analysis
  • Experience with vector databases / similarity search systems
  • Observability, Backend & Distributed Systems (Core)
  • Strong understanding of observability principles (metrics, logs, traces, events)
  • Experience with distributed systems debugging and reliability engineering
  • Hands-on experience with OpenTelemetry and monitoring tools (Prometheus, Grafana, OCI Monitoring)
  • Strong backend development experience with Python, APIs, and microservices
  • Familiarity with event-driven architectures and streaming platforms (Kafka, OCI Streaming)
  • Understanding of scalable, fault-tolerant system design
  • Experience with monitoring, alerting, dashboards, and search platforms (Elasticsearch/OpenSearch)


Qualifications:

  • Bachelors or Master's degree in computer science or related field
  • Experience with AI-powered observability or AIOps systems preferred
  • Knowledge of incident management, root cause analysis, and SLO/SLA frameworks
  • Experience with multi-tenant, large-scale distributed systems
  • Strong communication and collaboration skills in an agile environment

Services you might be interested in

Improve Your Resume Today

Boost your chances with professional resume services!

Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.