Photo: Shutterstock

As the amount (and value) of online data continues to grow exponentially, so does the practice of internet data scraping—that is, the harvesting of data from third-party websites for commercial purposes. Because scraping activity, and the efforts to stop it, have continued apace, it’s necessary to stay refreshed on the topic.

Copyright and the DMCA

Internet data often is protected by copyright, leading website owners to contend that scraping constitutes infringement and/or violation of the Digital Millennium Copyright Act. In recent years, the DMCA—which prohibits circumventing technological measures that effectively control access to a copyrighted work (17 U.S.C. § 1201(a)(1)(A))—has become a more popular enforcement tool; unlike infringement, DMCA plaintiffs need not own or hold an exclusive license to the copyrighted works at issue. To state a DMCA violation, plaintiffs generally must allege that they implemented technological barriers (e.g., password- or CAPTCHA-protections) within their website, but a scraper evaded those barriers by some technological workaround.

For example, in Ticketmaster v. Prestige Entertainment in the U.S. District Court for the Central District of California earlier this year, a DMCA claim was properly alleged where the defendant used automated software “bots” to bypass CAPTCHA controls and purchase event tickets. By contrast, in CouponCabin v. Savings.com in the U.S. District Court for the Northern District of Indiana in 2016, a DMCA claim was dismissed where access to a website did not require the application of “information or a process or treatment,” such as a password. Under such circumstances, merely blocking defendant’s servers—prompting defendant to utilize different servers for scraping—did not support a DMCA claim because the website remained available to any server that had not been specifically blocked.

Computer Fraud and Abuse Act

Web content owners also have asserted claims under the CFAA, 18 U.S.C. § 1030(a)(2)(C), which prohibits obtaining information from a protected computer and causing damage by (i) intentionally accessing the computer without authorization or (ii) exceeding authorized access. CFAA claims generally proceed where a website owner affirmatively rescinded a defendant’s authorization to access its website, but the defendant nevertheless continued scraping the site.

In Facebook v. Power Ventures in the U.S. Court of Appeals for the Ninth Circuit, for example, a CFAA violation was affirmed with respect to the scraping of data after Facebook had sent the defendant a cease-and-desist letter and attempted to block future access to Facebook’s website; the scraping of Facebook’s website prior to the express rescission of permission, however, was not “without authorization” for CFAA purposes.

Other recent cases suggest that the CFAA does not prohibit the scraping of data from publicly available portions of a website (i.e., information for which a password or log-in is not required), because the scraping of data from a generally-accessible website is merely a particular use of information that users otherwise are entitled to see. In hiQ Labs v. LinkedIn in the U.S. District Court for the Northern District of California in 2017, a data analytics company obtained a preliminary injunction against attempts to block it from accessing and scraping publicly available user data from LinkedIn’s website. In granting the injunction, the court contrasted hiQ’s scraping of publicly available information with the scraping of password-protected data in Facebook.

Trespass to Chattels

Unauthorized access to a computer system also can give rise to a common-law claim for trespass to chattels. Determination of whether access is unauthorized generally is the same as for a CFAA claim, although trespass claims are not limited to the scraping of private, password-protected information. (See hiQ.) Trespass claims often turn on whether the data scraping caused actual damages, such as impairing a website’s functionality. This is a fact-dependent inquiry that often is not resolved at the motion to dismiss stage. (See Fidlar Technologies v. LPS Real Estate Data Solutions (C.D. Ill. 2013).)

Breach of Contract

Because scraping often violates a website’s terms of use, breach of contract is another common claim in this area. Terms of use typically are conveyed to website users through either a “clickwrap” or “browsewrap” agreement, with the former generally being enforceable (due to the user’s affirmative acknowledgement of assent) and the latter’s enforceability often dependent upon a fact-specific inquiry about the location, accessibility and defendant’s awareness of the terms. Plaintiffs seeking to enforce a browsewrap agreement typically must demonstrate that the user had actual or constructive knowledge of the terms.

At least one court has looked to a data scraper’s practices on its own website to enforce a browsewrap agreement against the scraper. In DHI Group v. Kent in the U.S. District Court for the Southern District of Texas in 2017 a breach of contract claim survived a motion to dismiss because defendant’s operation of “a similar site with a similar browsewrap agreement” constituted constructive notice of plaintiff’s terms. This finding of notice was limited, however, to instances where “both parties are sophisticated businesses that use browsewrap agreements.”

Additional Observations

In addition to the claims discussed in this article, we note that statutory claims may be available under applicable state law. (E.g., Texas’ Harmful Access by a Computer Act; California’s Unfair Competition Law.) For their part, some data scrapers are fighting back by challenging attempts to block or restrict their access to websites as tortious interference (Fidlar) or violating antitrust laws (Authenticom v. CDK Global (7th Cir. 2017)).

Ultimately, parties that engage in scraping should ensure that their activities are consistent with website authorizations and terms of use. Website owners, in turn, should have terms of use—in clickwrap, where possible—that clearly notify third parties of prohibited scraping and/or circumvention activities. For an additional layer of protection, website owners should place their most valuable data behind password protections.

Anthony J. Dreyer is a partner, and Andrew Green is an associate, with Skadden, Arps, Slate, Meagher & Flom.