WebDriver For Journalists: Scraping the Web To Report the Truth

10th October 2023
WebDriver For Journalists: Scraping the Web To Report the Truth image
Talk Description
Did you know that in 2021, a Pulitzer Prize was awarded to a project that had WebDriver code at its core?

The New York Times COVID data-tracking project became the United States' most-watched dashboard for tracking changes in the spread of the pandemic. It worked by aggregating data from municipalities across the nation. These sources ranged in sophistication from sanitized data available for download, to bespoke HTML maps.

In this presentation, we'll discuss the role of WebDriver and other web-crawling technologies in that and other journalistic endeavours. 
We'll review using selectors to find data for journalists, cleaning source data, and the value of agility in deadline-driven workflows. 
We'll also explore how the lessons learned in this line of work are applicable to the practice of software testing and beyond.
What you’ll learn

By the end of this talk, you'll be able to:

  • Reflect about WebDriver as a tool for something other than just application testing
  • Recite the story of the New York Times COVID dashboard; how it works and what it accomplished
  • Consider alternative information-gathering methods for cases when WebDriver is not the best tool in the box
Phil Wells has been a software quality practitioner for over a decade. Now, Phil is a senior software engineer with Epic Games. His team builds shared services to help developers get their systems tested and delivered safely. Phil likes to go beyond writing tests and building infrastructure for delivery. He also acts as a coach for his peers in web development, teaching and advocating for modern test practices and technologies. People have all sorts of funny ideas about what Phil does every day. Phil does not play Fortnite all day. Phil lives in the hills of New Jersey, USA. He has three little kids who also do not play Fortnite all day.
