Role Summary :

The Data Engineer is responsible for building and maintaining scalable, reliable, and high-performance data platforms. The role is hands-on, with a strong focus on engineering solutions for data storage, real-time processing, and platform integrations. Collaboration with Data Architects and Cloud Engineers is key to operationalizing and optimizing core data infrastructure components.

Key Responsibilities :

- Design, build, and optimize data pipelines for batch and real-time processing using Spark, Python, and related technologies.

- Set up, configure, and manage databases such as Postgres, ClickHouse, MongoDB, DynamoDB, and other analytical or NoSQL systems.

- Develop and maintain data models, indexing strategies, partitioning, and schema management to support scalable data solutions.

- Engineer and manage data storage formats and lakehouse table systems such as Delta Lake, Iceberg, and Hudi for efficient data access and analytics.

- Integrate databases with cloud components (AWS services, Databricks, internal microservices) to enable seamless data flow across platforms.

- Work with real-time platforms such as Kafka and Flink for streaming ingestion, event processing, and low-latency data delivery.

- Collaborate with Cloud Engineers to ensure infrastructure provisioning, networking connectivity, containerization, and access controls are aligned with data engineering needs.

- Troubleshoot and optimize data pipeline performance, including slow queries, write amplification, compaction issues, indexing strategies, and cluster configurations.

- Support platform observability and monitoring by installing, configuring, and monitoring systems like Prometheus and Grafana.

Required Skills:

- Strong Python skills

- Experience with distributed table formats (Delta Lake, Iceberg, Hudi).

-Competency in Kafka (consumer groups, offsets, partitions) and Flink for stream processing.

- Experience with PySpark for data ingestion and transformation workflows

- Deep knowledge of Postgres (indexing, replication, partitioning, optimization)

- Hands-on with ClickHouse (setup, tuning, materialized views, TTLs)

- Familiarity with NoSQL (MongoDB, DynamoDB) schema design and access patterns

- Familiarity with AWS (EC2, S3, VPC, IAM, Glue, Lambda)

- Understanding of database security, encryption, role management, and backup strategies

Good-to-Have Skills:

- Experience with Java frameworks such as Spring Batch or Hibernate

- Experience with Databricks workflows, catalog integration, or table ingestion patterns.

- Exposure to containerization (Docker) for database sandboxing or API deployments.

- Knowledge of infrastructure orchestration (Terraform) for database provisioning.

- Ability to contribute to datastore benchmarking and performance testing.

Data Engineer – Data & AI - Institutional Equities

Kotak Securities

Job Description

Services you might be interested in

Improve Your Resume Today