Monday, April 13, 2020

Latest on Legality of Scraping (Profiles)

Latest interesting court case helps shed light onto the shaky legal status of scraping, crawling, botting, fake profiles and similar automatic processes to harvest great amounts of data from websites you do not own. To what extent are you allowed to go on a fishing expedition in somebody else's pond?

Currently, it’s an unsettled grey area of law. The law does not prohibit automation to harvest publicly available data from others' websites. But those websites’ Terms of Use prohibit it. For example, take a look at how  YouTube's Terms of Service prohibit access via "any automated means"and harvesting users' identifying info:

Facebook has a whole separate set of Terms for Automated Data Collection where it's prohibited without FB's express written permission:

Most Terms of Use nowadays have similar standard boilerplate clauses that prohibit automatic data harvesting in any form. Check your favorite websites' Terms and you'll find it.  LinkedIn, Instagram, everybody. All of the Terms I draft also have it:


So, the law allows it but the Terms prohibit. Users must agree to the Terms prior to accessing the website; it's contract law. So, what happens in courts when the law clashes against the Terms and platforms sue scrapers for violation of the Terms? Lately, scrapers have been winning, although not everybody was so lucky. In this post, I'll describe what lessons can we learn from professors with fake LinkedIn profiles and other notable harvesters up to date.

Latest Case: Fake LinkedIn Research Profiles

A federal court in D.C. has recently ruled that it's not a crime to create fake online profiles for research purposes in violation of the website's terms of use. Professors of computer science at Northeastern University want to investigate whether job sites' like LinkedIn algorithms discriminate against candidates based on race, gender and other protected classes. To do it, they intended to create fake profiles, then compare rankings and responses depending on race, etc.

But the problem is that every professional website's terms of use prohibit troll profiles and provision of false information. Furthermore,  the Computer Fraud and Abuse Act (“CFAA,” the anti-hacking law) makes it a crime to “intentionally access[] a computer without authorization or exceed[] authorized access, and thereby obtain[] . . . information from any protected computer.”

Courts Disagree

So, the professors asked the court in 2016 to determine whether their intended research would be a crime under the CFAA. There is no easy answer to that question because courts disagree and produce inconsistent rulings.

In 2009, a California federal court judge acquitted a woman who was charged under the CFAA for contributing to a MySpace hoax that led to the suicide of a 13-year-old. Defendant was criminally charged under the CFAA for violating MySpace's terms of service.

In 2014, another federal court in CA also  rejected another CFAA prosecution based on a terms of use violation when an employee had used a valid password to access confidential information.

In 2013 a federal court in CA ruled that the data mining company 3Taps potentially violated the anti-hacking law by scraping real estate listings from Craigslist, after Craigslist had demanded that 3Taps stop doing so.

A 2015, a court threw out a the conviction of a police officer who had used a police database to look up information about women he knew personally. That, the judge reasoned, was not a criminal violation of the CFAA.

And yet other courts were much stricter. For example, in a 2010, there was a ruling that a Social Security Administration employee had violated the CFAA when he used an SSA database to look up information about people he knew personally. So, that's contrary to what the other court had decided in the creepy cop's case above.  In 2006, a court ruled that an employee had violated the CFAA when, after quitting his job, he deleted valuable information from his work computer, as well as data that would have revealed his misconduct.

Latest Ruling

In the latest case at hand, the judge ruled that the professors' fake profiles research is not criminal under the CFAA. It could still be a civil violation because the terms of use are a legally binding contract. But the courts in CA aren't clear on that either; there are conflicting rulings. E.g., in 2016, a court ruled against a startup for logging in to Facebook using credentials supplied by users in violation of Facebook's policies. But last year, the court ruled that a company didn't violate the CFAA when it scraped data from LinkedIn in violation of its terms of use. It is up to the Supreme Court to reconcile inconsistent jurisprudence on the question of what kind of unauthorized access in violation of the terms of use constitutes a crime under the anti-hacking law.