As the internet expands and the technology advances, there is more and more every day we can do with the information that is generated each day. As big data comes into play, unusual information sources have cropped up, which upon analyzing can give insights into human behavior and consumption patterns. This has been a boost to all the industries like retail, real estate, hospitality, travel etc. But lately, journalists have begun to harness the power of big data analytics to increase their scope for a better journalistic capacity. This has given rise to Data Journalism.
Before newer technologies for analyzing a vast amount of data sprang up, there was a limit to the amount of data to a business’s disposal. Professionals always tried to understand and grapple with what they had. With the advent of faster big data technologies, the sources from which data can be drawn and conclusions formed are virtually limitless. And this has expanded the potential of journalism immensely.
From Journalism to Data Journalism
While Journalism refers to investigating a topic or an event and then publishing that knowledge in a systematic and organized form. Data journalism could be understood as collecting huge amounts of info and then with the help of procedures such as filtering system, handling and evaluating info to come up with journalistic details. And this is exactly where web scrapping becomes the backbone of data journalism. To be able to come up with accurate insights which have journalistic value, a lot of data needs to be processed and analyzed.
Reporters are obligated to maintain a quality and real-time data extraction helps them to deliver their bit to ensure accurate info and consistency in work. For this, personalized internet scraping devices have grown by leaps and bounds and now the most sophisticated scrappers offer striking results within seconds.
In the field of journalism, any kind of mistake can have a long-term effect. It can potentially end careers and leave a terrible mark, thus it is very important for reporters to be accurate with their facts and info at all times. As the glob gets closer by every minute, maintaining complete accuracy and transparency is very important. Data scrapping tools make this process a lot simpler and accurate by providing all the vital info for a specific task very quickly and efficiently.
Data Journalism includes three main components-
- Availability of open resources to bring down the cost of computer-based data analysis and insight generation.
- Open access to data and published content that helps remove the restriction on access.
- The concept of open data that makes data freely available on channels like the net, govt. servers and trade publications.
Workflow of a typical working data journalism
The most famous workflow model of data journalism was released in 2011 by Paul Brawd Shaw which outlines 6 different phases and was called the ‘inverted pyramid of data journalism.’
- Find- sourcing the info online.
- Clean- filter the data collected through various sets of logic and variables.
- Visualize- transformed data then shows results in form of static or animated visuals, trends, patterns etc.
- Publish- join the visuals together to weave a perfect story.
- Distribute- sharing the media of various platforms of social media, emails etc.
- Measure- monitors the consumption of the content to view trends and types of users using it.
How can Journalism harness from Web Data Scraping?
Data scraping services have come up with a variety of web scraping solutions that are designed specifically for the needs of press the reporters. A lot of reporters also produce their own scraping services to make the most of the open resource devices that are very simple to install and also can be tailored to their needs and preferences. In this way, they have been more control over the accuracy and authenticity of the info and they can have it more readily available than ever.
Let’s look at an example of how data scraping can make things easy for reporters.
For a story for Journal Metro, a reporter used a web scraper to compare the price of 12,000 products from the Société des alcools du Québec with the price of 10,000 products of the LCBO in Ontario.
In another instance, a reporter from Sudbury wanted to investigate food inspectors in restaurants. Although all the results from such investigations are published on Sudbury Health Unit’s website, it is impossible to download them all and manually go through the results for each restaurant. So he coded a bot to extract all the results from the website. It clicked on each result for the 1600 facilities inspected by the Health Unit, extracted the data and then sent the information into an Excel file. If done manually, it would have taken an anymore a few weeks, while the bot did all this in a night.
Web data scraping plays an integral role in gathering information to drive data journalism. The advancement of information owned journalism can pave a new way for a better assistance of information to reporters and give rise to fascinating opportunities. Data Journalism owes largely to web data scraping and shall continue to do so.
Data journalism is a new way forward. Interested in being one of the first few? Contact us on Datahut, your big data experts.