Discover Data Science Testing - Laveena Ramchandani

Discover Data Science Testing - Laveena Ramchandani

Description:

I would like to share my knowledge about testing a model in a data science team. I appreciate this is a new area for testers to be in, but it has been a great experience to learn from.

I’d like to share how I explored the world of data science as a tester when testing a model and how we can apply that if we find ourselves in this situation. As part of an emerging team, how I contributed value in a field I have never tested.

I have heard from other senior testers that they know of data science teams but no testers testing the models, how do we have enough confidence what is produced is good enough? A model is a statistical black box, how to test it so we understand its behaviours to test is properly.  Main aim would be to help inspire testers to explore data science models.

I’d like to invite you to my talk where we will go through my journey of discovering data science model testing and find the following takeaways useful not just for testing a data science model but day to day testing too.

1. Some background of what a data science model is, and how data plays a role in these models. Understand from vast amounts of data.

  • structured data
  • unstructured data
  • metadata
  • semi-structured data

2. Understand data pipelines

3. Importance of pairing -Follow an SDLC process which may require a bit more of exploratory testing and investigation, therefore pairing with data scientist is a good way of working and understanding the model

4. Pre-testing thoughts:

  • is the model custom made/ off the shelf?
  • How as a team are, we training our own model to behave?
  • What is my input and what’s my output?
  • Am I experiencing the right behaviour? (models do contain some element of randomness so how we will make sure what’s acceptable when testing the results?)

5. As testers, we expect input + model that uses predictive analytics  = output example 5+3 = 8 but for data scientists  5+3= 8 is not always 8 but  8.1,.8.001,.8.5 in simple words stochastic, so how will we bring processes and strategies to make sure we capture the right output results and the consumers still benefit from this? In a nutshell, making sure the model’s quality is good and we have the confidence in what we provide to consumers.

6. Test the areas we are certain about the behaviour and those areas uncertain about have some bounds around averages - expectations set

7. Exploratory testing and looking for edge cases, regression testing to see that new features are not breaking baseline results

8. Understanding what tests to perform:

  • what is an acceptable test for the model?
  • Have we found anomalies? (results too off the threshold?)
  • How do we know what we produced as results is the right result?
  • How accurate are my results from the model?
  • What is an acceptable deviation?

9. I will give away tips that helped me and could help testers who want to explore testing models and making sure the quality of a model is providing the team with enough confidence and helping a business

10. Post testing – Have we got a good understanding of what the model has provided? Are the predictive analytics working as expected? Does the shape of my data look as expected? (testing the outputs will explain if the values are of the right type from the data input stage)