Baking Quality into Your Data Pipeline - Ali Khalid thumbnail

Baking Quality into Your Data Pipeline - Ali Khalid

Ensuring data quality is identified as one of the most challenging issues in Big Data. This starts with identifying scope of the Data Pipeline at each junction. Next is to pick the appropriate data quality dimensions relevant to business criticality and build automated checks providing insights into the quality of data. 

The session is designed to give participants an introduction to how a big data project is structured, how data flows, what quality checks are generally used and how to automate them. The main sections in the talk are:

  • Difference between Big data and conventional data usage
  • Sample technology stack for a big data project
  • Introduction to a data pipeline
  • The kind of tests and automation needed
  • Data quality dimensions (Why is data quality important, 6 dimensions explanation along with demo how to test them in our sample pipeline)
  • Automating data quality checks

Comments

Sign in to comment
Explore MoT
Leading with AI - The London Edition image
Fri, 19 Jun
A half-day educational experience to navigate the world of AI
MoT Software Testing Essentials Certificate image
Boost your career in software testing with the MoT Software Testing Essentials Certificate. Learn essential skills, from basic testing techniques to advanced risk analysis, crafted by industry experts.
This Week in Quality image
Debrief the week in Quality via a community radio show hosted by Simon Tomes and members of the community
Subscribe to our newsletter