AIOps Engineer
SysTechCorp Inc
2 - 5 years
Bengaluru
Posted: 30/04/2026
Job Description
Role: AIOps Engineer
Experience Level : 6+ years
Location : Bangalore
Working Mode : Hybrid
Package: max 10LPA
We are building a next-generation, AI-driven observability and outage management platform operating at scale across 700+ enterprise customers. The platform leverages metrics, logs, traces, and events to provide deep system insights, enabling automated signal correlation, root cause analysis, and proactive failure detection. It also incorporates historical knowledge and conversational AI to deliver a Talk to Monitoring experience.
We are seeking engineers with an AI-first mindset who can design and build intelligent, data-driven solutions on top of observability platforms, with a focus on scalable and reliable production systems.
Key Responsibilities:
- Design and develop AI-driven backend systems for observability and outage management
- Build intelligent services for event correlation, noise reduction, root cause analysis, anomaly detection, and prediction
- Develop capabilities for incident summarization, knowledge retrieval, and operational insights
- Design and optimize data pipelines for large-scale telemetry data (logs, metrics, traces, events)
- Implement LLM-powered features, including conversational interfaces, RAG pipelines, and automated insights
- Integrate AI/ML models into production systems, ensuring scalability and reliability
- Work with OpenTelemetry and observability platforms to process and analyze system signals
- Collaborate with engineering, SRE, and DevOps teams to build cloud-native solutions on OCI
- Contribute to system design, code reviews, and platform evolution
Primary Skills & Experience:
- AI / Machine Learning & Data Engineering (Primary)
- Strong proficiency in Python for AI/ML and data engineering
- Experience designing and deploying AI/ML applications in production
- Hands-on experience with LLMs and APIs (OCI Generative AI, OpenAI, or similar)
- Experience with prompt engineering, evaluation frameworks, and RAG pipelines
- Understanding of anomaly detection, pattern recognition, and time-series analysis
- Experience with vector databases / similarity search systems
- Observability, Backend & Distributed Systems (Core)
- Strong understanding of observability principles (metrics, logs, traces, events)
- Experience with distributed systems debugging and reliability engineering
- Hands-on experience with OpenTelemetry and monitoring tools (Prometheus, Grafana, OCI Monitoring)
- Strong backend development experience with Python, APIs, and microservices
- Familiarity with event-driven architectures and streaming platforms (Kafka, OCI Streaming)
- Understanding of scalable, fault-tolerant system design
- Experience with monitoring, alerting, dashboards, and search platforms (Elasticsearch/OpenSearch)
Qualifications:
- Bachelors or Master's degree in computer science or related field
- Experience with AI-powered observability or AIOps systems preferred
- Knowledge of incident management, root cause analysis, and SLO/SLA frameworks
- Experience with multi-tenant, large-scale distributed systems
- Strong communication and collaboration skills in an agile environment
Services you might be interested in
Improve Your Resume Today
Boost your chances with professional resume services!
Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.
