Engineering

Data Engineer

Hyderabad
Work Type: Full Time
About Quantela
 
We are a technology company that offers outcomes business models. We empower our customers with the right digital infrastructure to deliver greater economic, social, and environmental outcomes for their constituents.

When the company was founded in 2015, we specialized in smart cities technology alone. Today, working with cities and towns; utilities, and public venues, our team of 280+ experts offer a vast array of outcomes business models through technologies like digital advertising, smart lighting, smart traffic, and digitized citizen services.

We pride ourselves on our agility, innovation, and passion to use technology for a higher purpose. Unlike other technology companies, we tailor our offerings (what we can digitize) and the business model (how we partner with our customers to deliver that digitization) to drive a measurable impact where our customers need it most. Over the last several months alone, we have served customers to deliver outcomes like increased medical response times to save lives, reduced traffic congestion to keep cities moving, and created new revenue streams to tackle societal issues like homelessness.

We are headquartered in Billerica, Massachusetts, in the United States, with offices across Europe and Asia.

The company has been recognized with the World Economic Forum’s Technology Pioneers’ Award in 2019 and CRN’s IoT Innovation Award in 2020.
For the latest news and updates, please visit us at www.quantela.com.
 
Overview of the Role
This role focuses on designing and managing scalable data infrastructure that powers predictive models. The ideal candidate will build high-performance Spark pipelines for feature engineering and leverage Apache Airflow to orchestrate the complete ML lifecycle, from data ingestion to automated model retraining and deployment.

Roles and Responsibilities
  • Design and develop scalable ETL/ELT workflows to support large-scale machine learning initiatives.
  • Build optimized PySpark pipelines for feature engineering and large-scale data preprocessing.
  • Ensure point-in-time data consistency for model training and inference.
  • Develop, schedule, and maintain complex Apache Airflow DAGs.
  • Automate model training, validation, retraining, and batch inference workflows.
  • Monitor and troubleshoot pipeline failures to ensure high reliability.
  • Implement model versioning and artifact tracking using tools such as MLflow or DVC.
  • Support CI/CD integration for ML pipelines to streamline testing and deployment.
  • Collaborate with DevOps teams to containerize and deploy ML workloads.
  • Design data architectures such as Medallion Architecture to support analytics and modeling.
  • Build and maintain structured datasets optimized for statistical analysis and ML experimentation.
  • Tune distributed Spark jobs for large-scale data processing.
  • Optimize compute usage to reduce latency and cloud costs while maintaining performance.
  • Implement monitoring solutions for data quality, drift detection, and model performance tracking.
  • Ensure proactive identification of data anomalies that may impact model accuracy.
  • Work closely with Data Scientists to productionize experimental ML code.
  • Translate research prototypes into stable, scalable, and maintainable pipelines.
Desired Skills / Background
  • 3 to 4 years of experience in Data Engineering with exposure to ML or analytics teams.
  • Strong hands-on experience with Apache Spark (PySpark) for complex transformations.
  • Advanced SQL skills for data modeling and analytics use cases.
  • Proven expertise in designing and debugging Apache Airflow workflows.
  • Solid understanding of the ML lifecycle, including training, validation, deployment, and monitoring.
  • Familiarity with frameworks such as Scikit-learn, TensorFlow, or PyTorch.
  • Experience working with cloud platforms such as AWS, GCP, or Azure.
  • Hands-on experience with Docker and Kubernetes for containerization and orchestration.
  • Experience building or maintaining a Feature Store.
  • Understanding of statistical techniques such as regression and hypothesis testing.
  • Exposure to CI/CD practices for ML model deployment.
Notice Period:
Immediate Joiner
 

Submit Your Application

You have successfully applied
  • You have errors in applying