🔔 FCM Loaded

Infrastructure Architect

Luxoft India

2 - 5 years

Bengaluru

Posted: 28/02/2026

Getting a referral is 5x more effective than applying directly

Job Description

Project Description:

We are seeking an experienced Infrastructure Architect with deep expertise in designing and delivering endtoend onpremises AI and datacenter infrastructure. In this role, you will architect and guide implementations spanning GPU compute clusters, highperformance storage, datacenter networking fabrics, and automation frameworks, ensuring the environment is secure, resilient, and optimized for AI/ML workloads.

This position requires both highlevel architectural vision and handson design across compute, network, storage, and controlplane platforms.


Responsibilities:

1. EndtoEnd Infrastructure Architecture

Own and evolve the reference architecture for onprem AI compute ecosystems, including GPU servers, accelerators, and DPUs.

Design GPU clustering strategies and partitioning models (MIG, MPS) for multitenant training and inference workloads.

Define racktocontrolplane architecture, aligning hardware, storage, network fabric, and Kubernetes/OpenShift environments.


2. Data Center Physical & Logical Design

Develop hardware BOMs, rack elevations, cabling schematics, and power/cooling envelopes.

Ensure alignment with modern data center design, including hot/cold aisle strategy, airflow optimization, and liquidcooling readiness.


3. HighPerformance Networking

Architect highperformance datacenter fabrics such as spineleaf topologies, RoCEv2/InfiniBand, and highspeed Ethernet (400G/800G).

Define network segmentation, QoS, and isolation strategies for multitenant AI infrastructures.


4. Storage Architecture

Design scalable, highthroughput storage solutions, including PowerScale, NVMe tiering, and object storage systems for AI/ML workloads.


5. Control Plane & Orchestration

Architect and harden Kubernetes/OpenShift controlplane environments with HA topologies and GPU scheduling, ensuring Day0/1/2 operational readiness.


6. Capacity & Performance Engineering

Build capacity models covering GPU/CPU utilization, memory, storage I/O throughput, and network bandwidth aligned with model sizes and dataingestion patterns.


Mandatory Skills Description:

8+ years in infrastructure architecture across compute, network, and storage domains.

Deep knowledge of:

GPU compute platforms, clustering, and partitioning (MIG, MPS).

Highperformance datacenter fabrics: spineleaf, RoCEv2/InfiniBand, 400G/800G Ethernet.

Scaleout storage systems (PowerScale, NVMe, object storage).

Kubernetes/OpenShift controlplane design and HA patterns.

Experience with datacenter physical design (power, cooling, cabling, thermal).

Strong automation background (PowerShell, Terraform, Ansible).

Expertise in capacity planning, performance engineering, and resilience design.

Services you might be interested in

Improve Your Resume Today

Boost your chances with professional resume services!

Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.