A U.S. appeals court has ruled that web scraping data that is publicly accessible on the internet is not a violation of the Computer Fraud and Abuse Act (CFAA), which governs what constitutes computer hacking under U.S. law.
This is a major win for archivists, academics, researchers, and journalists who use tools to mass-collect or scrape information that is publicly available online but may be at risk of disappearing in the future. The ruling by the U.S. Ninth Circuit of Appeals comes in a long-running legal battle brought by LinkedIn against a rival company called hiQ Labs, which scrapes public profile data from LinkedIn users with their permission. LinkedIn had argued that hiQ was violating the CFAA by accessing and copying this data without authorization, but the Ninth Circuit disagreed, finding that hiQ was not “hacking” into LinkedIn’s system since the data was already publicly available to anyone who visited those profiles on LinkedIn’s site.
This ruling could have far-reaching implications for how we think about access to information on the internet and what constitutes unauthorized access under the CFAA. It also highlights the importance of open-source scraping tools like Scrapy, which can be used legally to collect public data for research or journalism purposes without running afoul of anti-hacking laws like the CFAA.
What does Web Scrapping means?
Web scraping is the process of extracting data from websites. It can be done manually but is usually done using automated tools that extract data from web pages and store it in a format that can be easily analyzed, such as a spreadsheet. Web scraping is often used to collect large amounts of data that would be difficult or impossible to collect manually. It can also be used to extract data from sites that do not provide an API or have an API that is difficult to use.
Who uses Web Scrapping?
There are a number of reasons why people might use web scraping techniques. For example, someone might want to collect data from a website in order to create a dataset for academic or research purposes. Journalists might use web scraping to gather data about a particular topic in order to write an article about it. And businesses might use web scraping in order to price-compare products or track competitors. In general, anyone who needs large amounts of data that is readily available on the web may benefit from using web scraping techniques.