Cloud Platform Engineer
HCLSoftware
2 - 5 years
Noida, Agra
Posted: 13/06/2026
Job Description
Please share CV to monica_sharma@hcl-software.com with the below details:
Total experience-
Current CTC-
Expected CTC-
notice Period-
Location- Noida, Pune, Bangalore, Hyderabad
Required experience- 3 to 8 years
Role-Cloud Platform Engineer Kubernetes and Container Platforms
Position Overview
We are seeking a highly skilled Platform Engineer with deep expertise in Kubernetes and container orchestration platforms to join the AI & Intelligent Operations LOB. The successful candidate will own the end-to-end lifecycle of container platforms from architecture and installation design through production operations with a strong focus on enterprise-grade High Availability (HA), Disaster Recovery (DR), and security. The role spans managed Kubernetes services on AWS (EKS), Azure (AKS), and Google Cloud (GKE) as well as on-premises OpenShift deployments, requiring hands-on proficiency with Helm, Kubernetes-native configurations, cloud-native services, and Infrastructure-as-Code (IaC) toolchains.
Required Qualifications
Education
- Bachelor's or Master's degree in Computer Science or Information Technology
- Equivalent practical experience considered for exceptionally strong candidates.
Experience
- 3 8 years of overall IT / software engineering experience with a minimum of 2 years of hands-on Kubernetes platform engineering in production environments.
- Demonstrable experience deploying and operating at least two of: EKS, AKS, GKE, and OpenShift in enterprise settings.
- Proven track record of designing and implementing HA/DR for containerized workloads at scale and helm chart development.
Required Technical Skills:
- Amazon EKS
- Azure AKS
- Kubernetes (upstream/vanilla)
- Red Hat OpenShift (OCP)
- Helm (Chart authoring & management)
- Docker / Podman / Buildah / Kaniko
- Observability Prometheus, Grafana, Loki, OpenTelemetry
- Bash / Python scripting
Preferred Qualifications
- Certified Kubernetes Administrator (CKA) strongly preferred.
- Certified Kubernetes Application Developer (CKAD) advantageous.
- Certified Kubernetes Security Specialist (CKS) highly desirable for senior profiles.
- Red Hat Certified OpenShift Administrator (EX280) preferred for OpenShift-heavy roles.
- Experience with multi-cluster management platforms: Rancher, Red Hat ACM, ArgoCD ApplicationSets, Cluster API (CAPI).
- Familiarity with eBPF-based observability and networking tools (Cilium, Hubble, Pixie).
- Contributions to open-source Kubernetes ecosystem projects or published Helm charts.
- Experience in regulated industries (BFSI, Healthcare, Government) with compliance frameworks: SOC 2, PCI-DSS, HIPAA, ISO 27001.
Key Responsibilities
1. Platform Architecture & Design
- Design scalable, highly available Kubernetes cluster architectures for EKS, AKS, GKE, and OpenShift environments aligned with enterprise workload requirements.
- Architect multi-region and multi-cluster topologies with active-active and active-passive HA patterns, including cross-cluster service discovery and traffic management.
- Define Disaster Recovery strategies: RTO/RPO target setting, cluster backup (Velero / OADP), etcd backup & restore, and regional failover runbooks.
- Produce Low-Level Design (LLD) and High-Level Design (HLD) documents, architecture decision records (ADRs), and capacity planning models.
- Design network topology: VPC/VNet design, CNI selection (Calico, Cilium, Flannel, OVN-Kubernetes), Network Policies, Service Mesh integration (Istio / Linkerd).
- Define storage architecture: persistent volume strategies using CSI drivers, StorageClass selection, RWX/RWO provisioning across EBS, EFS, Azure Disk, Azure Files, GCP Persistent Disk.
2. Kubernetes Platform Installation & Configuration
- Install and configure production-grade Kubernetes clusters using kubeadm, kops, Rancher, or cloud provider managed services (EKS, AKS, GKE).
- Deploy and configure Red Hat OpenShift Container Platform (OCP 4.x) using IPI (Installer Provisioned Infrastructure) and UPI (User Provisioned Infrastructure) methods.
- Configure enterprise authentication integrations: LDAP/AD integration via OIDC (Dex, Keycloak), AWS IAM IRSA, Azure AD Workload Identity, GCP Workload Identity Federation.
- Implement Role-Based Access Control (RBAC) hierarchies ClusterRoles, Roles, RoleBindings aligned with principle of least privilege and organizational IAM structures.
- Configure Kubernetes Admission Controllers, Pod Security Admission (PSA/PSP replacement), and OPA/Gatekeeper or Kyverno policy engines for compliance enforcement.
- Set up cluster add-ons: CoreDNS, metrics-server, cluster-autoscaler, karpenter, node-problem-detector, external-dns, cert-manager, and ingress controllers (NGINX, Traefik, AWS ALB, Azure Application Gateway Ingress).
3. Application Deployment on Kubernetes
- Own deployment pipelines for HCL Software product suites onto Kubernetes/OpenShift environments, including stateful and stateless applications.
- Author, maintain, and publish Helm charts with parameterized values, environment-specific overrides, and lifecycle hooks for complex application topologies.
- Implement GitOps deployment workflows using ArgoCD or Flux CD: manage ApplicationSets, multi-cluster deployments, progressive delivery (canary, blue-green) strategies.
- Configure Kubernetes workload resources: Deployments, StatefulSets, DaemonSets, Jobs, CronJobs, and HorizontalPodAutoscalers (HPA) / VerticalPodAutoscalers (VPA).
- Define and enforce resource requests/limits, namespace quotas (ResourceQuota, LimitRange), and QoS classes to optimize cluster utilization and stability.
- Implement Pod Disruption Budgets (PDBs), topology spread constraints, affinity/anti-affinity rules, and taints/tolerations for workload placement and resilience.
- Manage ConfigMaps, Secrets (with external secrets operator / Vault integration), and environment variable injection patterns following 12-factor application principles.
4. Enterprise HA, DR & Resiliency
- Design and implement control plane HA: multi-master etcd clusters with quorum management, etcd compaction, defragmentation, and backup automation.
- Configure node-level HA: node groups across multiple Availability Zones, managed node groups vs. self-managed nodes, spot/preemptible instance strategies with fallback.
- Implement load balancer HA patterns: NLB/ALB for EKS, Azure Load Balancer + Application Gateway for AKS, Cloud Load Balancing for GKE.
- Establish cluster-level DR procedures: namespace-scoped Velero backups to S3/Azure Blob/GCS, application-consistent snapshots, tested restore runbooks.
- Design and document failover playbooks covering DNS cutover, PV data replication (Rook/Ceph, Portworx, Longhorn), and stateful application quorum management.
- Conduct Game Day exercises and DR drills; measure and report against SLO/SLA commitments.
5. Security Engineering
- Harden Kubernetes clusters per CIS Kubernetes Benchmark and NSA/CISA Kubernetes Hardening Guide; track compliance using kube-bench.
- Implement network segmentation with Kubernetes Network Policies and service mesh mTLS; enforce zero-trust network access within clusters.
- Integrate container image scanning (Trivy, Snyk, Aqua) into CI/CD pipelines; enforce registry policies to block vulnerable or unsigned images.
- Configure runtime threat detection using Falco; define and tune rule sets for anomalous syscall detection and container escape attempts.
- Manage PKI: configure cert-manager with internal/external CAs, automate TLS certificate provisioning and rotation for ingress and internal service communication.
- Implement secrets lifecycle management with HashiCorp Vault (Vault Agent, Vault Secrets Operator), AWS Secrets Manager, or Azure Key Vault CSI driver.
- Enforce image signing and supply chain security (cosign / Sigstore, Notary) for all production workloads.
- Conduct security reviews for new platform features; participate in penetration testing activities and remediate findings within SLA.
6. Cloud Infrastructure & IaC
- Write and maintain Terraform / OpenTofu modules for provisioning cloud infrastructure: VPCs, subnets, security groups, IAM roles, EKS/AKS/GKE clusters, node groups, managed database services, and DNS.
- Manage Terraform state backends (S3 + DynamoDB, Azure Blob, GCS), implement workspace strategies for multi-environment (dev/staging/prod) provisioning.
- Use Ansible for post-provisioning configuration management: OS hardening, prerequisite installation, cluster bootstrapping, and day-2 operations.
- Implement cloud cost optimization strategies: right-sizing, Spot/Preemptible adoption, cluster autoscaler tuning, resource tagging governance.
- Maintain parity across AWS, Azure, and GCP deployments; abstract cloud differences using Terraform modules and Helm chart conditional logic.
7. Observability & Platform Operations
- Deploy and maintain observability stacks: Prometheus + Alertmanager, Grafana dashboards, Loki for log aggregation, Jaeger/Tempo for distributed tracing.
- Define and configure Service Level Indicators (SLIs) and alert thresholds; build runbooks for alert response and escalation.
- Integrate OpenTelemetry collectors and auto-instrumentation for HCL product workloads.
- Manage cluster upgrades (minor and patch) using rolling upgrade strategies with zero-downtime requirements; validate compatibility matrices.
- Participate in on-call rotation; investigate and resolve production incidents; conduct post-incident reviews (PIR/RCA) and track corrective actions.
Services you might be interested in
We Search & Apply Jobs for You!
Our team scans through 1000s of opportunities and applies to roles best suited to your profile
Save 100+ hours and focus on what matters - cracking interviews and landing offers.
