Site Reliability Engineer
ITMC Systems, Inc
2 - 5 years
Pune
Posted: 18/03/2026
Job Description
Open Role || SRE with OpenShift || Hyderabad/Pune
Role: OpenShift & Site Reliability Engineering (SRE)
Job Description:
Providing suggestion/consultation to client based on their requirement to setup OpenShift Platform (On-Premises or Cloud or PAAS) with resources size.
Configuring the OCP post its deployment by client or DevOps Team.
Ensure the OCP Cluster is resilience over SPoF
Day 2 configurations such as CSI Blob driver installation on ARO cluster to consume azure blob storage
ODF deployment and customizing or creating new Storage Classes based on Reclaim and VolumeBind policy requirement
Integrating VMWare with OCP so that it can leverage the underlying hardware for cluster autoscaling and storage consumption
Deploy Thanos setup to store Cluster and Workload metrics for longer duration as ARO monitoring has limitation
Configure monitoring rules and alert managers to intimate cluster and application failure over Email or Ticketing tools
Configure OADP backup tool and test Backup/Restore of Application to meet the RTO & RPO
Ensure clusters comply with security standards and free from vulnerabilities
Perform cluster upgrade on Regular intervals based on OCPs EOL and Application Compatibility
Automate OCP Day2 configurations using Ansible Playbooks
Maximo Application:
Supporting IBM MAS deployment, configuration, troubleshooting, backup & restore validation
Configure TLS certificates
Amend network policies on Application namespace to communicate with Kub-Api to stop/start app services using OC commands via cronjob
Troubleshoot issues during Tekton pipeline execution for IBM MAS Application Install/Upgrade activities.
Azure Platform:
Azure storage management Create & manage Blob/File storages based on the requirement of App and high availability (LRS or ZRS), enable cleanup policies to keep-up data retention for logs/metrics on blob storage, enable azure backup for file storages.
Integrate ARO with Azure ARC to leverage Azure Monitoring and Log analytics for alerting
Rotate Azure Service principal creds before they expire for ARO clusters
Responsibilities:
Deploy and manage OpenShift(RHOCP 4.X) environment from scratch on Bare-Metal using Ansible scripts.
Updating the inventories and deployment methods in ansible playbook as per the deployment ENV.
Build and Manage Tanzu Kubernetes Grid on Bare-Metal VMWare platform.
Multi Cluster administration Management cluster for FCAPS components and Resource cluster for Application components
OpenShift Cluster backup/restore using Trilio
OCP Compliance fixing based on Compliance Operator and Prisma security tools
Prepare Mirror Repository for OCP4 deployment on Restricted network
4G & 5G CU Application onboarding using helm charts on both RHOCP & VMWare-TKG.
Cluster turning for Applications onboarding, such as ODF/OCS, Performance addon, Multus, SRIOV, NMState, Kubevirt, Quay and GitLab installation
FCAPS installation for logging (Elasticsearch-Fluentd-Kibana), Certificate Management, Authentication (RH IDM)
Working with AWS Services - EC2, S3, Route53, EBS, IAM, ELB, Cloud Watch, Auto Scaling, VPC.
HLD Preparation for OCP/TKG Single node/ HA Setups.
Services you might be interested in
Improve Your Resume Today
Boost your chances with professional resume services!
Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.
