June 23, 2025

alainalexanianconsulting

Leggo My Finance

A new framework for web scraping data to ensure its validity for use in marketing studies

web network
Credit score: CC0 Public Domain

Scientists from Erasmus College Rotterdam, Tilburg College, INSEAD, and Oxford University released a new paper in the Journal of Internet marketing that proposes a methodological framework targeted on enhancing the validity of net knowledge.

The research is authored by Johannes Boegershausen, Hannes Datta, Abhishek Borah, and Andrew T. Stephen.

The modern ruling of the Ninth Circuit in HiQ Labs v. LinkedIn underscores the great importance of navigating the authorized issues when utilizing world wide web scraping to collect knowledge for academic research. Even though it may be permissible to obtain information from publicly accessible websites, researchers nevertheless require to be cautious about how they style and design their extraction application. For case in point, collecting facts from publicly obtainable consumer profiles in some jurisdictions may perhaps set off privacy concerns—and prompts scientists to anonymize their knowledge in the course of the collection.

While marketing researchers ever more make use of world wide web details, the idiosyncratic and from time to time insidious difficulties in its collection have been given limited focus. How can researchers be certain that the datasets created via world wide web scraping and APIs are legitimate? This exploration crew produced a novel framework that highlights how addressing validity worries needs the joint thought of idiosyncratic technological and authorized/ethical concerns.

The authors say that their “framework handles the broad spectrum of validity problems that come up together the a few phases of the automated collection of internet details for educational use: deciding upon data resources, designing the information assortment, and extracting the facts. In talking about the methodological framework, we supply a stylized marketing example for illustration. We also supply suggestions for addressing problems researchers experience through the selection of world-wide-web data by way of website scraping and APIs.”

The short article more offers a systematic assessment of more than 300 posts utilizing internet knowledge released in the top five advertising journals. Using this evaluate, the researchers explain how net information has superior internet marketing imagined. Knowing the richness and flexibility of world-wide-web details is priceless for scholars curious about integrating it into their investigation programs.

Intrigued researchers can obtain the databases created for this critique on the companion internet site. This web site also capabilities extra valuable resources and tutorials for accumulating net information via website scraping and APIs.

The researchers include that they use their “methodological framework and typology to unearth new and underexploited ‘fields of gold’ associated with world-wide-web data. We look for to demystify the use of website scraping and APIs and thus aid broader adoption of internet data across the marketing self-control. Our Upcoming Analysis part highlights novel and resourceful avenues of working with net details that include things like checking out underutilized resources, developing loaded multi-resource datasets, and totally exploiting the opportunity of APIs over and above knowledge extraction.”


Choose orders LinkedIn to cease blocking knowledge-scraping business


Extra info:
Johannes Boegershausen et al, Convey: Fields of Gold: Scraping Website Info for Advertising and marketing Insights, Journal of Marketing (2022). DOI: 10.1177/00222429221100750

Web databases: internet-scraping.org/

Delivered by
American Internet marketing Association


Citation:
A new framework for world-wide-web scraping facts to assure its validity for use in advertising experiments (2022, June 2)
retrieved 10 June 2022
from https://techxplore.com/news/2022-06-framework-world wide web-validity.html

This doc is topic to copyright. Apart from any honest working for the function of personal examine or study, no
portion may well be reproduced without having the written permission. The material is offered for information and facts uses only.