Big Data Engineer - Hybrid - Officer

Citi Bank

2 - 5 years

Chennai

Posted: 16/09/2025

Getting a referral is 5x more effective than applying directly

Job Description

Discover your future at Citi

Working at Citi is far more than just a job. A career with us means joining a team of more than 230,000 dedicated people from around the globe. At Citi, you’ll have the opportunity to grow your career, give back to your community and make a real impact.

Job Overview

Responsible for designing, developing, and optimizing data processing solutions using a combination of Big Data technologies. Focus on building scalable and efficient data pipelines for handling large datasets and enabling batch & real-time data streaming and processing.

Responsibilities:
> Develop Spark applications using Scala or Python (Pyspark) for data transformation, aggregation, and analysis.
> Develop and maintain Kafka-based data pipelines: This includes designing Kafka Streams, setting up Kafka Clusters, and ensuring efficient data flow.
> Create and optimize Spark applications using Scala and PySpark: They leverage these languages to process large datasets and implement data transformations and aggregations.
> Integrate Kafka with Spark for real-time processing: They build systems that ingest real-time data from Kafka and process it using Spark Streaming or Structured Streaming.
> Collaborate with data teams: This includes data engineers, data scientists, and DevOps, to design and implement data solutions.
> Tune and optimize Spark and Kafka clusters: Ensuring high performance, scalability, and efficiency of data processing workflows.
> Write clean, functional, and optimized code: Adhering to coding standards and best practices.
> Troubleshoot and resolve issues: Identifying and addressing any problems related to Kafka and Spark applications.
> Maintain documentation: Creating and maintaining documentation for Kafka configurations, Spark jobs, and other processes.
> Stay updated on technology trends: Continuously learning and applying new advancements in functional programming, big data, and related technologies.

Proficiency in:
Hadoop ecosystem big data tech stack(HDFS, YARN, MapReduce, Hive, Impala).
Spark (Scala, Python) for data processing and analysis.
Kafka for real-time data ingestion and processing.
ETL processes and data ingestion tools
Deep hands-on expertise in Pyspark, Scala, Kafka

Programming Languages:
Scala, Python, or Java for developing Spark applications.
SQL for data querying and analysis.

Other Skills:
Data warehousing concepts.
Linux/Unix operating systems.
Problem-solving and analytical skills.
Version control systems

About Company

Citi Bank, officially known as Citibank, is a global financial institution and the consumer division of Citigroup, a leading multinational banking corporation. Established in 1812, Citibank provides a wide range of financial services, including retail banking, credit cards, personal loans, wealth management, and investment banking. With a strong presence in over 100 countries, it serves millions of customers worldwide, offering both individual and business banking solutions. Citibank is known for its digital banking innovations, global reach, and commitment to financial inclusion and economic growth.