Baking Quality into Your Data Pipeline - Ali Khalid

Ali Khalid
Ali Khalid
Baking Quality into Your Data Pipeline - Ali Khalid

Ensuring data quality is identified as one of the most challenging issues in Big Data. This starts with identifying scope of the Data Pipeline at each junction. Next is to pick the appropriate data quality dimensions relevant to business criticality and build automated checks providing insights into the quality of data. 

The session is designed to give participants an introduction to how a big data project is structured, how data flows, what quality checks are generally used and how to automate them. The main sections in the talk are:

  • Difference between Big data and conventional data usage
  • Sample technology stack for a big data project
  • Introduction to a data pipeline
  • The kind of tests and automation needed
  • Data quality dimensions (Why is data quality important, 6 dimensions explanation along with demo how to test them in our sample pipeline)
  • Automating data quality checks

What You'll Learn

  • Sample technology stack used in big data projects
  • What does a sample data pipeline look like
  • How to test a data pipeline
  • What are data quality dimensions and how to use them
  • How to go about automating data quality checks

Ali Khalid

Founder @ Quality Spectrum

Ali is an International award-winning speaker and contributor to many tech communities. As a test architect he is leading quality transformation projects across teams working on SaaS, mobile applications and Big data projects running analytics using AI & ML. To transform team’s quality practices, he helps them with assess automation practices, develops testing and automation trainings, building test strategies & quality road maps, develop automation frameworks & DevOps enablers. He is passionate about “Redefining Software Quality” (transforming quality practices and skills) helping teams increase product quality and reduce time to market. Learn more about him & his work at www.quality-spectrum.com