Job Title: Scala Data Engineer

Location: Bengaluru

Experience: 8+ IT experience

Job Summary:

We are seeking a highly skilled and experienced Senior Scala Data Engineer to join our

dynamic data team. In this role, you will be instrumental in designing, developing, and

maintaining our next-generation data pipelines and platforms using Scala, Apache Spark,

and cloud-native technologies. You will work on challenging problems involving large-scale

data ingestion, transformation, and processing, contributing directly to our analytical

capabilities and product features.

Note: We are looking for teh immediate Joiners who can join immediately plus the experience on Scala should me more than 6 years or 7 years its a mandate less than that will not be get entertained.

Key Responsibilities:

Design & Development: Architect, build, and optimize robust, scalable, and

efficient data pipelines using Scala and Apache Spark (Spark Core, Spark SQL, Spark

Streaming).

Data Ingestion: Develop solutions for ingesting high-volume, high-velocity data

from various sources (e.g., relational databases, NoSQL databases, APIs, message

queues like Kafka, log files) into our data lake/warehouse.

Data Transformation: Implement complex data transformations, aggregations, and

feature engineering logic to prepare data for analytics, machine learning models,

and operational systems.

Performance Optimization: Identify and resolve performance bottlenecks in Spark

jobs and data pipelines, ensuring optimal resource utilization and execution times.

Data Quality & Governance: Implement data validation, monitoring, and alerting

mechanisms to ensure data accuracy, completeness, and consistency. Contribute to

data governance best practices.

Cloud Infrastructure: Leverage and optimize cloud services (e.g., AWS EMR/Glue,

Azure Databricks/Synapse, GCP DataProc/BigQuery) for data processing and

storage.

Automation & Orchestration: Design and implement automated workflows for

data pipelines using tools like Apache Airflow, AWS Step Functions, or similar.

Required Qualifications:

Experience: 5+ years of professional experience in data engineering, with a strong

focus on building large-scale data solutions.

Scala Expertise: Proven advanced proficiency in Scala programming language.

Apache Spark: Deep hands-on experience with Apache Spark (Core, SQL,

Streaming) for batch and real-time data processing.

Cloud Platforms: Extensive experience with at least one major cloud provider

(AWS, Azure, or GCP) and their relevant data services (e.g., AWS S3, EMR, Glue,

Kinesis; Azure Data Lake, Databricks, Event Hubs; GCP GCS, DataProc, Pub/Sub).

Data Warehousing: Strong understanding of data warehousing concepts,

dimensional modeling (star/snowflake schemas), and ETL/ELT processes.

SQL: Expert-level SQL skills for data querying, manipulation, and optimization.

Distributed Systems: Experience working with distributed systems and

understanding of their challenges (consistency, fault tolerance, concurrency).

Version Control: Proficiency with Git and collaborative development workflows.

Nice-to-Haves:

Streaming Technologies: Experience with real-time streaming platforms like

Apache Kafka, Apache Flink, or Kinesis.

Containerization & Orchestration: Experience with Docker, Kubernetes, and

container orchestration for Spark applications.

Data Orchestration Tools: Hands-on experience with Apache Airflow, Dagster,

Prefect, or similar workflow management tools.

NoSQL Databases: Experience with NoSQL databases such as Cassandra, MongoDB,

DynamoDB, or HBase.

Data Lakehouse/Modern DW: Experience with technologies like Delta Lake,

Apache Iceberg, Snowflake, Redshift, or BigQuery.

MLOps: Familiarity with MLOps principles and supporting data pipelines for

machine learning models.

CI/CD: Experience setting up and maintaining CI/CD pipelines for data engineering

projects.

Performance Tuning: Advanced knowledge of Spark performance tuning

techniques, including memory management, shuffle optimization, and data

partitioning strategies.

Certifications: Relevant cloud (AWS Certified Data Analytics, Azure Data Engineer

Associate, GCP Professional Data Engineer) or Spark certifications.

Thanks & Regards,

Vibha Seth

Technical Recruiter

E-Mail: vibha@vmcsofttech.com

Contact: 9935984975

LinkedIn: linkedin.com/in/vibha-seth-14337b241

Scala DATA Engineer

VMC Soft Technologies, Inc

Job Description

Services you might be interested in

Improve Your Resume Today