Big Data Engineer - (Scala | Spark | Databricks | Cloud)

Citi Bank

4 - 6 years

Chennai

Posted: 03/05/2026

Getting a referral is 5x more effective than applying directly

Job Description

Discover your future at Citi

Working at Citi is far more than just a job. A career with us means joining a team of more than 230,000 dedicated people from around the globe. At Citi, you’ll have the opportunity to grow your career, give back to your community and make a real impact.

Job Overview

We are seeking a talented and experienced Big Data Hadoop Developer to join our growing data engineering team. The ideal candidate will have 4-6 years of hands-on experience designing, developing, and optimizing big data solutions using the Hadoop ecosystem, with a strong focus on Apache Spark. You will be responsible for building and maintaining scalable data pipelines, processing large datasets, and collaborating with data scientists and analysts to deliver insights.

Responsibilities:

Design, develop, and maintain robust and scalable ETL processes and data pipelines using Apache Hadoop and Apache Spark.
Write efficient, clear, and well-documented code primarily in Scala, Python, or PySpark for big data processing.
Implement data ingestion, transformation, and loading routines from various sources into Hadoop Distributed File System (HDFS) and other big data stores.
Optimize existing Spark jobs and Hadoop ecosystem components for performance and scalability.
Collaborate with data architects, data scientists, and other stakeholders to understand data requirements and translate them into technical solutions.
Ensure data quality, integrity, and security across all big data platforms.
Participate in code reviews, testing, and deployment of big data applications.
Troubleshoot and resolve issues in big data environments.
Stay up-to-date with the latest trends and technologies in the big data ecosystem.

Qualifications:

Bachelor's or Master's degree in Computer Science, Engineering, or a related quantitative field.
3-4 years of professional experience in Big Data development.
Proven experience with the Hadoop ecosystem, including HDFS, YARN, Hive, and other related technologies.
Hands on experience in SQL and shell scripting
Strong expertise in Apache Spark for data processing and analysis.
Proficiency in at least one of the following programming languages: Scala, Python, or PySpark.
Experience with building and optimizing large-scale data pipelines.
Familiarity with data warehousing concepts and ETL methodologies.
Solid understanding of distributed computing principles.
Excellent problem-solving skills and attention to detail.
Ability to work independently and as part of a collaborative team.

Preferred Qualifications:

Experience with cloud-based big data services (e.g., AWS EMR, Azure HDInsight, Google Cloud Dataproc).
Experience with Databricks platform.
Knowledge of other big data tools like Kafka, HBase, Flink, or Presto.
Experience with SQL and NoSQL databases.
Familiarity with CI/CD practices and tools (e.g., Git, Jenkins).
Understanding of machine learning concepts and how they apply to big data.

Education:

Bachelor’s degree/University degree or equivalent experience

About Company

Citi Bank, officially known as Citibank, is a global financial institution and the consumer division of Citigroup, a leading multinational banking corporation. Established in 1812, Citibank provides a wide range of financial services, including retail banking, credit cards, personal loans, wealth management, and investment banking. With a strong presence in over 100 countries, it serves millions of customers worldwide, offering both individual and business banking solutions. Citibank is known for its digital banking innovations, global reach, and commitment to financial inclusion and economic growth.