Is scraping data part of your game plan?
- 25 August, 2015 07:00
She wasn’t surprisedthat Google knew, but found it unnerving that Google chose to let her know that it knew by displaying a personalised version of its logo with birthday-like icons and a Happy Birthday Sandra label. Did she provide that information to Google at some point? Probably. Did they scrape it from another online cache? No idea. How would she know?
As an advisor to technology companies we occasionally get asked about the legal risks associated with harvesting data via scraping and similar means. If this is a major part of your game plan, here are the starters you should be thinking about.
Consider the information you are extracting – are you taking information that is personally sensitive (such as personal contact details) or commercially sensitive (such as brand names)? The legal rules around taking personal information, and using another’s brand for one’s own commercial purposes, are both well-developed areas of law and highly protectionist. Are you taking images or compilations of information that are likely to be seen as proprietary either because it is highly original or would have been a labour intensive exercise to collate (e.g. images, or product catalogues)?
Although the law is always a step behind technological developments, the sentiment of the legal cases to date in a number of jurisdictions is that the website owner should be able to prevent scrapers from harvesting information without authorisation.
Typically, harvested data includes product descriptions and pictures reproduced from other sites. As soon as you are reproducing another person’s text or images you raise legal issues of potential copyright infringement. These risks are lessened if (a) the images and text are not reproduced in whole, (b) the text is not reproduced verbatim but, as your school teacher would say, restated in your own words (taking care not to mislead or misstate any aspect of the goods or services, however), (c) the images are unoriginal, do not reproduce trade marks, are sourced from a different place to the product description, or otherwise are less likely to be the subject of copyright held by the same owner as the other extracted information.
What is often a surprise to would-be scrapers is that they must also consider the collective effect of a swarm of scrapers (a muster of miners? a horde of harvesters?).
Legal cases have focused on the potential for scraping activity by multiple persons to diminish the owner’s available bandwidth or server capacity. Interfering with a computer system owner’s data usage right (distinct from the use of the data itself) has been recognised as theft in New Zealand (Davies vs Police; and depriving an owner of bandwidth and server capacity has been held to constitute the old-fashioned tort of trespass to chattels in the US (Ebay vs Bidder’s Edge).
New Zealand’s Crimes Act sets out a number of computer-related crimes including dishonestly accessing computer systems without authorisation (s 252), and accessing any computer system dishonestly, to obtain (or even merely intending to obtain) any property, privilege, service, pecuniary advantage, benefit or valuable consideration (s 249).
The first of these (s 252) expressly does not include accessing a computer system as a permitted user and using it for a non-permitted purpose. Arguably accessing a website which is intended for public use, and scraping for data, even though not permitted, would merely be using that computer system for a non-permitted purpose and would not fall foul of this section.
The scope of the second of these (s 249) is relatively untested, however, all indications are that the threshold is not high. Accessing a computer system can be as simple as sending an email (as occurred in Burt v Police) and would certainly include harvesting data by automated scraping. Dishonesty is no more than an absence of any belief that there was any express or implied consent from the relevant person as to the act carried out (s 217).
If the terms of the website expressly or impliedly prohibit scraping, dishonesty seems hard to argue against. Conversely, if the website terms are silent, dishonesty will be harder to establish. A pecuniary advantage, benefit, etc. is simply anything that enhances the accused’s financial position, according to New Zealand’s Supreme Court (Hayes v R). This would almost certainly cover obtaining data (at little or no cost) so as to increase the scraper’s potential for commercial sales.
So, what can you do? Well, all of the legal issues mentioned here can be overcome if the scraping is authorised by the relevant website or product owner. Can you build a relationship with the website or product owner by demonstrating you can add value to their business in some way?
If, like many of our clients you don’t want to be bothered with carefully analysing the legal rights and wrongs of your particular process then the take-home message should be this: although the law is always a step behind technological developments, the sentiment of the legal cases to date in a number of jurisdictions is that the website owner should be able to prevent scrapers from harvesting information without authorisation.
Courts appear willing to mould existing laws to find a legal wrong committed by the scraper. Extracting information from well-resourced companies, extracting sensitive or proprietary information, or using the information in a way which adversely impacts on the company’s bottom line, interferes with its bandwidth usage or weakens its control over its brand and marketing will all put you squarely in the firing line.
The legal rules in this area will only get firmer and tighter. Monitor legal developments. Put in place a Plan B and a rapid-response action plan to adopt Plan B as and when the time comes!
Averill Dickson is a senior lawyer at Simmonds Stewart, a boutique technology law firm providing corporate and commercial legal services focused on the New Zealand technology sector. With almost 20 years in the technology sector, Averill has extensive experience advising on the corporate and commercial aspects of technology businesses and transactions. Find out more at Simmonds Stewart, or follow Averill on Twitter @averilldickson. Simmonds Stewart has free online templates for tech companies, and a blog on IT legal issues.
Send news tips and comments to firstname.lastname@example.org
Follow Divina Paredes on Twitter: @divinap
Follow CIO New Zealand on Twitter:@cio_nz