Login Sign Up

HCI Administrator

Tata Consultancy Services

2 - 5 years

Chennai

Posted: 29/04/2026

Getting a referral is 5x more effective than applying directly

Job Description

Key Responsibilities

1) Incident & Problem Management (L2 Ownership)

  • Lead L2 triage, diagnosis, and restoration for vSAN and vSphere incidents, including performance issues, resync operations, object health, latency, and host failures, ensuring rapid service recovery in alignment with SLAs. Conduct post-incident reviews and ensure that problem records are resolved to root cause and permanent fix.
  • Implement structured Incident and Major Incident practices, prioritizing incidents based on impact and urgency, utilizing defined escalation paths, and assigning clear roles during high-severity events.

2) Health, Performance & Capacity Operations

  • Use vSAN Skyline Health to monitor cluster health, including hardware compatibility, network health, and storage objects. Apply health scoring and diagnostics to prioritize remediation actions and track operational trends.
  • Leverage Aria Operations for monitoring vSAN performance, capacity, and configuration. Utilize dashboards, alerts, and recommendations to anticipate and prevent potential issues.
  • Analyze resync operations, I/O paths, and advanced statistics such as vsantop and I/O Trip Analyzer to optimize workloads and eliminate performance bottlenecks.

3) Configuration, Policy & Resiliency

  • Develop and maintain SPBM policies, including FTT/RAID configurations, stripes per object, IOPS limits, and space-efficiency settings, ensuring alignment with workload SLAs and continuous policy compliance. Make optimal reconfigurations after failures.
  • Administer vSphere HA and DRS with vSAN for automatic failover and balanced recovery following events. Manage HA admission control, VM-host affinity and anti-affinity rules, and DPM interactions.
  • Design and maintain fault domains to protect against rack and chassis failures, validating latency and placement rules for replicas and witness objects.
  • Operate stretched clusters across two sites with a witness, configure storage policies for site affinity, manage failure scenarios, and verify HA/DRS behavior across sites.

4) Security & Compliance

  • Enable and manage vSAN data-at-rest encryption using AES-256, including the KEK/DEK workflow and integration with KMS or Native Key Provider. Ensure key persistence with TPM, perform rekey operations, and maintain secure cluster practices.
  • Validate data-in-transit encryption where applicable, and enforce role-based access controls for all encryption operations.

5) Lifecycle & Hardware Compatibility

  • Maintain vSphere Lifecycle Manager (vLCM) compliance for vSAN clusters, orchestrating ESXi images, vendor add-ons, drivers, and firmware, and performing hardware compatibility checks against the vSAN HCL. Coordinate with the OEM Hardware Support Manager for full-stack remediation.
  • Apply vSAN build recommendations, including release catalog and critical patches, and baseline groups. Remediate clusters and monitor catalog currency through health checks.

6) Change, Release & Knowledge

  • Plan and execute changes, such as patching, driver and firmware updates, and policy adjustments, within designated maintenance windows. Maintain runbooks and knowledge bases for common faults and recovery procedures.
  • Mentor L1 and L2 staff, establish operational checklists, and conduct pre-flight validations, including network MTU/NIOC, capacity slack space, and hardware balance.

Services you might be interested in

Improve Your Resume Today

Boost your chances with professional resume services!

Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.