Senior / Lead Data Engineer



Company Name



We are looking for a Lead Data Engineer for the Machine Learning (ML) engineering development team. The primary focus will be to gather requirements from ML/DS teams and identify the optimal solution. Then design, implement, monitor and maintain these scalable distributed big data pipelines for different big data ML use-cases. You will be working with Data Scientists to train, refresh and serve models using big data ML pipelines.


Collaborate with ML engineers and Data Scientists to gather requirements.

Design and Implement ETL big data pipelines to train ML models.

Streaming processing and Batch pipelines using UDFs, ML libraries and load processed data to multiple distributed data sources.

API programming knowledge to train and server the ML models.

Selecting and integrating a variety of big data tools and frameworks required for processing

Responsible for availability, scalability, reliability, and performance of the big data platform.

Skills and Qualifications

Minimum of 6+ years relevant experience

Proven background in ETL development and large scale data processing.

Proficiency with Big Data ecosystem – Spark (PySpark), Hadoop, HDFS, HIVE, NoSQL, and modern Cloud Data lakes (Cloudera Data Platform or Deltalake)

Strong SQL expertise, optimizing complex joins and database concepts

Strong programming development experience in languages like Python and Java.

Experience with building stream-processing systems, using Spark-Streaming.

Experience with workflow orchestration tools, such as Oozie, Airflow.

Experience with Unix/Shell or Python scripting.

Knowledge of AWS is a plus.

Knowledge of AI/ML and MLOps is a plus.

Get matched with
Top companies

Available on Play Store & App Store

Addtional features on Zigup app:

Hire 40,000+ Top Talents from Product-based Companies ​