Senior Big Data Engineer - Assistant Vice President
Citi Bank
5 - 10 years
Pune
Posted: 24/08/2025
Job Description
Discover your future at Citi
Working at Citi is far more than just a job. A career with us means joining a team of more than 230,000 dedicated people from around the globe. At Citi, you’ll have the opportunity to grow your career, give back to your community and make a real impact.
Job Overview
Big Data Engineer (PySpark & Apache Airflow)
Role Overview:
We are actively seeking a highly skilled and dedicated Big Data Engineer specializing in PySpark and Apache Airflow to enhance our data platform capabilities. This critical role involves designing, developing, and orchestrating complex data pipelines that underpin our advanced analytics and machine learning initiatives. You will be responsible for leveraging PySpark for efficient data processing and utilizing Apache Airflow for robust workflow management, ensuring data quality, reliability, and scalability across our large-scale datasets.
Key Responsibilities:
- Design, develop, and maintain robust, scalable, and efficient big data pipelines primarily using PySpark for data ingestion, transformation, and processing.
- Implement and manage data workflows using Apache Airflow, including designing DAGs (Directed Acyclic Graphs), configuring operators, and optimizing task dependencies for reliable and scheduled data pipeline execution.
- Optimize PySpark jobs and data workflows for performance, cost-efficiency, and resource utilization across distributed computing environments.
- Collaborate closely with data scientists, AI/ML engineers, and other stakeholders to translate analytical and machine learning requirements into highly performant and automated data solutions.
- Develop and implement data quality checks, validation rules, and monitoring mechanisms within PySpark jobs and Airflow DAGs to ensure data integrity and consistency.
- Troubleshoot, debug, and resolve issues in PySpark code and Airflow pipeline failures, ensuring high availability and reliability of data assets.
- Contribute to the architecture and evolution of our data platform, advocating for best practices in data engineering, automation, and operational excellence.
- Ensure data security, privacy, and compliance throughout the data lifecycle within the pipelines.
Required Skills and Qualifications:
- 7+ Years of Expert-level proficiency in PySpark for building and optimizing large-scale data processing applications.
- Strong hands-on experience with Apache Airflow, including DAG development, custom operators/sensors, connections, and deployment strategies.
- Proven experience in designing, building, and operating production-grade distributed data pipelines.
- Solid understanding of big data architectures, distributed computing principles, and data warehousing concepts.
- Proficiency in data modeling, schema design, and various data storage formats (e.g., Parquet, ORC, Delta Lake).
- Experience with cloud platforms such as AWS, Azure, or Google Cloud Platform (GCP), specifically their big data services (e.g., EMR, Databricks, HDInsight, Dataflow) and object storage (S3, ADLS, GCS).
- Demonstrated experience with version control systems, particularly Git.
- Excellent problem-solving, analytical, and debugging skills.
- Ability to work effectively both independently and as part of a collaborative, agile team.
Desired (Plus) Skills:
- Experience with containerization technologies (e.g., Docker, Kubernetes) for deploying PySpark applications or Airflow.
- Familiarity with CI/CD practices for data pipelines.
- Understanding of machine learning concepts and experience with data preparation for AI/ML models.
- Knowledge of other orchestration tools or workflow managers.
Education:
- Bachelor’s degree/University degree or equivalent experience
About Company
Citi Bank, officially known as Citibank, is a global financial institution and the consumer division of Citigroup, a leading multinational banking corporation. Established in 1812, Citibank provides a wide range of financial services, including retail banking, credit cards, personal loans, wealth management, and investment banking. With a strong presence in over 100 countries, it serves millions of customers worldwide, offering both individual and business banking solutions. Citibank is known for its digital banking innovations, global reach, and commitment to financial inclusion and economic growth.
Services you might be interested in
One-Shot Campaign
Reach out to ideal employees in one shot!
The intelligent campaign for reaching out to the ideal audience to whom you can ask for help (guidance or referral).