Login Sign Up

Cloud Platform Engineer

HCLSoftware

2 - 5 years

Noida, Agra

Posted: 13/06/2026

Getting a referral is 5x more effective than applying directly

Job Description

Please share CV to monica_sharma@hcl-software.com with the below details:

Total experience-

Current CTC-

Expected CTC-

notice Period-



Location- Noida, Pune, Bangalore, Hyderabad

Required experience- 3 to 8 years

Role-Cloud Platform Engineer Kubernetes and Container Platforms


Position Overview

We are seeking a highly skilled Platform Engineer with deep expertise in Kubernetes and container orchestration platforms to join the AI & Intelligent Operations LOB. The successful candidate will own the end-to-end lifecycle of container platforms from architecture and installation design through production operations with a strong focus on enterprise-grade High Availability (HA), Disaster Recovery (DR), and security. The role spans managed Kubernetes services on AWS (EKS), Azure (AKS), and Google Cloud (GKE) as well as on-premises OpenShift deployments, requiring hands-on proficiency with Helm, Kubernetes-native configurations, cloud-native services, and Infrastructure-as-Code (IaC) toolchains.


Required Qualifications

Education

  • Bachelor's or Master's degree in Computer Science or Information Technology
  • Equivalent practical experience considered for exceptionally strong candidates.


Experience

  • 3 8 years of overall IT / software engineering experience with a minimum of 2 years of hands-on Kubernetes platform engineering in production environments.
  • Demonstrable experience deploying and operating at least two of: EKS, AKS, GKE, and OpenShift in enterprise settings.
  • Proven track record of designing and implementing HA/DR for containerized workloads at scale and helm chart development.


Required Technical Skills:

  1. Amazon EKS
  2. Azure AKS
  3. Kubernetes (upstream/vanilla)
  4. Red Hat OpenShift (OCP)
  5. Helm (Chart authoring & management)
  6. Docker / Podman / Buildah / Kaniko
  7. Observability Prometheus, Grafana, Loki, OpenTelemetry
  8. Bash / Python scripting


Preferred Qualifications

  • Certified Kubernetes Administrator (CKA) strongly preferred.
  • Certified Kubernetes Application Developer (CKAD) advantageous.
  • Certified Kubernetes Security Specialist (CKS) highly desirable for senior profiles.
  • Red Hat Certified OpenShift Administrator (EX280) preferred for OpenShift-heavy roles.
  • Experience with multi-cluster management platforms: Rancher, Red Hat ACM, ArgoCD ApplicationSets, Cluster API (CAPI).
  • Familiarity with eBPF-based observability and networking tools (Cilium, Hubble, Pixie).
  • Contributions to open-source Kubernetes ecosystem projects or published Helm charts.
  • Experience in regulated industries (BFSI, Healthcare, Government) with compliance frameworks: SOC 2, PCI-DSS, HIPAA, ISO 27001.



Key Responsibilities

1. Platform Architecture & Design

  • Design scalable, highly available Kubernetes cluster architectures for EKS, AKS, GKE, and OpenShift environments aligned with enterprise workload requirements.
  • Architect multi-region and multi-cluster topologies with active-active and active-passive HA patterns, including cross-cluster service discovery and traffic management.
  • Define Disaster Recovery strategies: RTO/RPO target setting, cluster backup (Velero / OADP), etcd backup & restore, and regional failover runbooks.
  • Produce Low-Level Design (LLD) and High-Level Design (HLD) documents, architecture decision records (ADRs), and capacity planning models.
  • Design network topology: VPC/VNet design, CNI selection (Calico, Cilium, Flannel, OVN-Kubernetes), Network Policies, Service Mesh integration (Istio / Linkerd).
  • Define storage architecture: persistent volume strategies using CSI drivers, StorageClass selection, RWX/RWO provisioning across EBS, EFS, Azure Disk, Azure Files, GCP Persistent Disk.


2. Kubernetes Platform Installation & Configuration

  • Install and configure production-grade Kubernetes clusters using kubeadm, kops, Rancher, or cloud provider managed services (EKS, AKS, GKE).
  • Deploy and configure Red Hat OpenShift Container Platform (OCP 4.x) using IPI (Installer Provisioned Infrastructure) and UPI (User Provisioned Infrastructure) methods.
  • Configure enterprise authentication integrations: LDAP/AD integration via OIDC (Dex, Keycloak), AWS IAM IRSA, Azure AD Workload Identity, GCP Workload Identity Federation.
  • Implement Role-Based Access Control (RBAC) hierarchies ClusterRoles, Roles, RoleBindings aligned with principle of least privilege and organizational IAM structures.
  • Configure Kubernetes Admission Controllers, Pod Security Admission (PSA/PSP replacement), and OPA/Gatekeeper or Kyverno policy engines for compliance enforcement.
  • Set up cluster add-ons: CoreDNS, metrics-server, cluster-autoscaler, karpenter, node-problem-detector, external-dns, cert-manager, and ingress controllers (NGINX, Traefik, AWS ALB, Azure Application Gateway Ingress).


3. Application Deployment on Kubernetes

  • Own deployment pipelines for HCL Software product suites onto Kubernetes/OpenShift environments, including stateful and stateless applications.
  • Author, maintain, and publish Helm charts with parameterized values, environment-specific overrides, and lifecycle hooks for complex application topologies.
  • Implement GitOps deployment workflows using ArgoCD or Flux CD: manage ApplicationSets, multi-cluster deployments, progressive delivery (canary, blue-green) strategies.
  • Configure Kubernetes workload resources: Deployments, StatefulSets, DaemonSets, Jobs, CronJobs, and HorizontalPodAutoscalers (HPA) / VerticalPodAutoscalers (VPA).
  • Define and enforce resource requests/limits, namespace quotas (ResourceQuota, LimitRange), and QoS classes to optimize cluster utilization and stability.
  • Implement Pod Disruption Budgets (PDBs), topology spread constraints, affinity/anti-affinity rules, and taints/tolerations for workload placement and resilience.
  • Manage ConfigMaps, Secrets (with external secrets operator / Vault integration), and environment variable injection patterns following 12-factor application principles.


4. Enterprise HA, DR & Resiliency

  • Design and implement control plane HA: multi-master etcd clusters with quorum management, etcd compaction, defragmentation, and backup automation.
  • Configure node-level HA: node groups across multiple Availability Zones, managed node groups vs. self-managed nodes, spot/preemptible instance strategies with fallback.
  • Implement load balancer HA patterns: NLB/ALB for EKS, Azure Load Balancer + Application Gateway for AKS, Cloud Load Balancing for GKE.
  • Establish cluster-level DR procedures: namespace-scoped Velero backups to S3/Azure Blob/GCS, application-consistent snapshots, tested restore runbooks.
  • Design and document failover playbooks covering DNS cutover, PV data replication (Rook/Ceph, Portworx, Longhorn), and stateful application quorum management.
  • Conduct Game Day exercises and DR drills; measure and report against SLO/SLA commitments.


5. Security Engineering

  • Harden Kubernetes clusters per CIS Kubernetes Benchmark and NSA/CISA Kubernetes Hardening Guide; track compliance using kube-bench.
  • Implement network segmentation with Kubernetes Network Policies and service mesh mTLS; enforce zero-trust network access within clusters.
  • Integrate container image scanning (Trivy, Snyk, Aqua) into CI/CD pipelines; enforce registry policies to block vulnerable or unsigned images.
  • Configure runtime threat detection using Falco; define and tune rule sets for anomalous syscall detection and container escape attempts.
  • Manage PKI: configure cert-manager with internal/external CAs, automate TLS certificate provisioning and rotation for ingress and internal service communication.
  • Implement secrets lifecycle management with HashiCorp Vault (Vault Agent, Vault Secrets Operator), AWS Secrets Manager, or Azure Key Vault CSI driver.
  • Enforce image signing and supply chain security (cosign / Sigstore, Notary) for all production workloads.
  • Conduct security reviews for new platform features; participate in penetration testing activities and remediate findings within SLA.


6. Cloud Infrastructure & IaC

  • Write and maintain Terraform / OpenTofu modules for provisioning cloud infrastructure: VPCs, subnets, security groups, IAM roles, EKS/AKS/GKE clusters, node groups, managed database services, and DNS.
  • Manage Terraform state backends (S3 + DynamoDB, Azure Blob, GCS), implement workspace strategies for multi-environment (dev/staging/prod) provisioning.
  • Use Ansible for post-provisioning configuration management: OS hardening, prerequisite installation, cluster bootstrapping, and day-2 operations.
  • Implement cloud cost optimization strategies: right-sizing, Spot/Preemptible adoption, cluster autoscaler tuning, resource tagging governance.
  • Maintain parity across AWS, Azure, and GCP deployments; abstract cloud differences using Terraform modules and Helm chart conditional logic.


7. Observability & Platform Operations

  • Deploy and maintain observability stacks: Prometheus + Alertmanager, Grafana dashboards, Loki for log aggregation, Jaeger/Tempo for distributed tracing.
  • Define and configure Service Level Indicators (SLIs) and alert thresholds; build runbooks for alert response and escalation.
  • Integrate OpenTelemetry collectors and auto-instrumentation for HCL product workloads.
  • Manage cluster upgrades (minor and patch) using rolling upgrade strategies with zero-downtime requirements; validate compatibility matrices.
  • Participate in on-call rotation; investigate and resolve production incidents; conduct post-incident reviews (PIR/RCA) and track corrective actions.

Services you might be interested in

We Search & Apply Jobs for You!

Our team scans through 1000s of opportunities and applies to roles best suited to your profile

Save 100+ hours and focus on what matters - cracking interviews and landing offers.