Baking Quality into Your Data Pipeline - Ali Khalid
19 Nov 2020
-
Locked
Ensuring data quality is identified as one of the most challenging issues in Big Data. This starts with identifying scope of the Data Pipeline at each junction. Next is to pick the appropriate data quality dimensions relevant to business criticality and build automated checks providing insights into the quality of data.Â
The session is designed to give participants an introduction to how a big data project is structured, how data flows, what quality checks are generally used and how to automate them. The main sections in the talk are:
- Difference between Big data and conventional data usage
- Sample technology stack for a big data project
- Introduction to a data pipeline
- The kind of tests and automation needed
- Data quality dimensions (Why is data quality important, 6 dimensions explanation along with demo how to test them in our sample pipeline)
- Automating data quality checks
Manage your entire QA lifecycle in one place. Sync Jira, automate scripts, and use AI to accelerate your testing.
Explore MoT
Fri, 19 Jun
A half-day educational experience to navigate the world of AI
Boost your career in software testing with the MoT Software Testing Essentials Certificate. Learn essential skills, from basic testing techniques to advanced risk analysis, crafted by industry experts.
Debrief the week in Quality via a community radio show hosted by Simon Tomes and members of the community
Comments