You will have to be responsible to
- Design and development of large scale data structures and pipelines to organize, collect and standardize data that helps generate insights and addresses reporting needs.
- Writes ETL (Extract / Transform / Load) processes, designs database systems and develops tools for real-time and offline analytic processing, by integrating data from a variety of sources, assuring that they adhere to data quality and accessibility standards.
- Uses programming in Python to build robust data pipelines and dynamic systems.
- Collaborates with different client teams, to develop and maintain long-term relationships with key stakeholders.
- Deep engagement and consultation with all business teams to understand current and future needs.
- Works with onshore team in establishing design patterns and development standards. Conducts code reviews and oversees unit testing.
- Collaborates between on-shore and offshore teams for project plan, code reviews, QA, and deployments.
- Brainstorm with development team on optimizing existing data-flow, quality and performance tuning, and building proof-of-concepts.
- Lead a team of data engineers.
Qualifications
Minimum qualifications
- Professional graduate/post-graduate degree in Computer Science discipline
- Good work experience
Preferred skills:
- Subject Matter Expert in Big Data and the Hadoop ecosystem.
- Good amount of experience in designing and developing ETL processes.
- Expertise in Hadoop, Hive, MapReduce, Sqoop, Oozie, Hue, HCatalog
- Production experience on Cloudera Hadoop Distribution.
- Experience in AWS cloud platform.
- Hands-On in DevOps tools – Github, Maven, Jenkins, Docker and Implementation of CI/CD pipelines
- Hands on experience on Snowflake Data Warehouse
- Expert level programming experience, ideally in Python, Pyspark or Shell and a willingness to learn new programming languages
Preferred qualifications
Python
Pyspark
Unix Shell Scripting
Big Data
Hive
Hue
Sqoop
Good to Have:
Snowflake
AWS Cloud
CI/CD- Jenkins, Maven Docker