Role: AIOps Engineer

Experience Level : 6+ years

Location : Bangalore

Working Mode : Hybrid

Package: max 10LPA

We are building a next-generation, AI-driven observability and outage management platform operating at scale across 700+ enterprise customers. The platform leverages metrics, logs, traces, and events to provide deep system insights, enabling automated signal correlation, root cause analysis, and proactive failure detection. It also incorporates historical knowledge and conversational AI to deliver a Talk to Monitoring experience.

We are seeking engineers with an AI-first mindset who can design and build intelligent, data-driven solutions on top of observability platforms, with a focus on scalable and reliable production systems.

Key Responsibilities:

Design and develop AI-driven backend systems for observability and outage management
Build intelligent services for event correlation, noise reduction, root cause analysis, anomaly detection, and prediction
Develop capabilities for incident summarization, knowledge retrieval, and operational insights
Design and optimize data pipelines for large-scale telemetry data (logs, metrics, traces, events)
Implement LLM-powered features, including conversational interfaces, RAG pipelines, and automated insights
Integrate AI/ML models into production systems, ensuring scalability and reliability
Work with OpenTelemetry and observability platforms to process and analyze system signals
Collaborate with engineering, SRE, and DevOps teams to build cloud-native solutions on OCI
Contribute to system design, code reviews, and platform evolution

Primary Skills & Experience:

AI / Machine Learning & Data Engineering (Primary)
Strong proficiency in Python for AI/ML and data engineering
Experience designing and deploying AI/ML applications in production
Hands-on experience with LLMs and APIs (OCI Generative AI, OpenAI, or similar)
Experience with prompt engineering, evaluation frameworks, and RAG pipelines
Understanding of anomaly detection, pattern recognition, and time-series analysis
Experience with vector databases / similarity search systems
Observability, Backend & Distributed Systems (Core)
Strong understanding of observability principles (metrics, logs, traces, events)
Experience with distributed systems debugging and reliability engineering
Hands-on experience with OpenTelemetry and monitoring tools (Prometheus, Grafana, OCI Monitoring)
Strong backend development experience with Python, APIs, and microservices
Familiarity with event-driven architectures and streaming platforms (Kafka, OCI Streaming)
Understanding of scalable, fault-tolerant system design
Experience with monitoring, alerting, dashboards, and search platforms (Elasticsearch/OpenSearch)

Qualifications:

Bachelors or Master's degree in computer science or related field
Experience with AI-powered observability or AIOps systems preferred
Knowledge of incident management, root cause analysis, and SLO/SLA frameworks
Experience with multi-tenant, large-scale distributed systems
Strong communication and collaboration skills in an agile environment

AIOps Engineer

SysTechCorp Inc

Job Description

Services you might be interested in

Improve Your Resume Today