AI & Data
In this age of disruption, organizations need to navigate the future with confidence, embracing decision making with clear, data-driven choices that deliver enterprise value in a dynamic business environment.
TIn this age of disruption, organizations need to navigate the future with confidence, embracing decision making with clear, data-driven choices that deliver enterprise value in a dynamic business environment.
The AI & Data team leverages the power of data, analytics, robotics, science and cognitive technologies to uncover hidden relationships from vast troves of data, generate insights, and inform decision-making. The offering portfolio helps clients transform their business by architecting organizational intelligence programs and differentiated strategies to win in their chosen markets.
AI & Data will work with our clients to:
Implement large-scale data ecosystems including data management, governance and the integration of structured and unstructured data to generate insights leveraging cloud-based platforms
Leverage automation, cognitive and science-based techniques to manage data, predict scenarios and prescribe actions
Drive operational efficiency by maintaining their data ecosystems, sourcing analytics expertise and providing As-a-Service offerings for continuous insights and improvements
PySpark Consultant
The position is suited for individuals who have demonstrated ability to work effectively in a fast paced, high volume, deadline driven environment.
Education and Experience
Education:
B.Tech/M.Tech/MCA/MS
3-6 years of experience in design and implementation of migrating an Enterprise legacy system to Big Data Ecosystem for Data Warehousing project.
Required Skills
Must have excellent knowledge in Apache Spark and Python programming experience
Deep technical understanding of distributed computing and broader awareness of different Spark version
Strong UNIX operating system concepts and shell scripting knowledge
Hands-on experience using Spark & Python
Deep experience in developing data processing tasks using PySpark such as reading data from external sources, merge data, perform data enrichment and load in to target data destinations.
Experience in deployment and operationalizing the code, knowledge of scheduling tools like Airflow, Control-M etc. is preferred
Working experience on AWS ecosystem, Google Cloud, BigQuery etc. is an added advantage
Hands on experience with AWS S3 Filesystem operations
Good knowledge of Hadoop, Hive and Cloudera/ Hortonworks Data Platform
Should have exposure with Jenkins or equivalent CICD tool & Git repository
Experience handling CDC operations for huge volume of data
Should understand and have operating experience with Agile delivery model
Should have experience in Spark related performance tuning
Should be well versed with understanding of design documents like HLD, TDD etc
Should be well versed with Data historical load and overall Framework concepts
Should have participated in different kinds of testing like Unit Testing, System Testing, User Acceptance Testing, etc
Preferred Skills
Exposure to PySpark, Cloudera/ Hortonworks, Hadoop and Hive.
Exposure to AWS S3/EC2 and Apache Airflow
Participation in client interactions/meetings is desirable.
Participation in code-tuning is desirable.