Data scraping: “everybody else was doing it, so I thought it was ok”

By Angus McLean, Partner, Simmons & Simmons LLP

Published: 30 September 2015

I learnt to my cost as a schoolboy that while there can be considerable merit in taking a risk-based approach to compliance decisions, the “everybody else was doing it” defence tends not to hold much water if you are the unlucky one who gets caught. In no area of my practice have I been reminded about this salutary lesson more frequently in recent years than on the issue of data scraping.

A fast growing trend

Call it what you will - data mining, web scraping or any of the other commonly used euphemisms - the practice of systematically extracting data from third party websites (without the permission of the website owner) is on the rise in the hedge fund industry. This can be done manually or, as is more often the case, by specially developed computer programmes. The same legal issues arise in both cases, although it is arguable that manual extraction is marginally less risky because it tends to be harder for a website owner to detect than software-enabled scraping.

The mere fact that data scraping is becoming so ubiquitous seems to be the main cause of the commonly held assumption that it carries no legal risk. However, as the 13 or so European flight price comparison websites that have been the target of Ryanair’s wrath over the last 3-4 years can vouch, my childhood excuse does not provide much insurance against costly litigation.

Is data scraping illegal?

As things currently stand, many acts of data scraping are potentially illegal under UK law. The exact nature of the illegal activity depends on a variety factors. Unfortunately, therefore, every situation needs to be analysed on its own facts. However, the two most common claims that can be brought against data scrapers are (a) breach of contract and (b) IP infringement (specifically, database right infringement). Depending on the precise circumstances, it is possible that a data scraper could also infringe copyright or trade mark rights, breach data protection legislation and/or contravene the Computer Misuse Act 1990.

To have a justified breach of contract claim, the owner of the website in question has to show that its terms and conditions of use (Ts&Cs) are enforceable and have been breached. The second requirement is obviously down to the wording of the Ts&Cs in question. However, it is becoming increasingly common for website Ts&Cs to expressly prohibit data scraping (or equivalent activities). The other issue is whether the data scraper is technically bound by the Ts&Cs in question.

At present there is no clear English case law on this issue. However, it is reasonably safe to assume that any Ts&Cs that a user has had to “click to accept” will be binding. If the Ts&Cs are binding and rule out data scraping, then in the vast majority of cases the website owner will have a valid breach of contract claim.

Determining whether there is also a database right infringement claim is also a highly fact specific exercise. The analysis will depend on:

the type and volume of data that is being extracted;
the frequency with which the data is being extracted; and
the level of investment that was required to develop the database from which the data is being extracted.

If the database required a substantial investment to put together and data is being taken on a systematic basis, database right infringement may also be an issue.

What are the risks in practice?

To date, relatively few European website owners seem to have been sufficiently exercised about third parties extracting data from their sites to pursue full-blown litigation. That said, as the Ryanair cases show, past performance is no guarantee of future results. It is, therefore, important to understand what the consequences of a data scraping complaint might be to provide the proper context for any risk-based analysis of whether those risks are outweighed by the benefits the scraping activities are expected to generate.

Depending on the type of claim that is available to the website owner in question, the key risks faced by a data scraper under UK law are likely to be:

injunction (including pre-trial injunctions);
financial liability (in the form of damages or, in certain circumstances, an account of profits);
disclosure obligations; and
reputational damage.

Although the final two risks are not really formal legal remedies, in my experience they have just as much of a deterrent effect as the more traditional legal remedies (e.g. injunctions and damages or an account of profits). This is because the prospect of having to disclose the type of investment activities for which the data in question is being used, is often seen as the most commercially damaging consequence of a data scraping dispute. Of course, as with the other risks identified above, it may be possible to avoid having to disclose information about the ends to which the data is being applied by settling a potential claim before it escalates into full-blown litigation. However, assuming that will be possible in every case clearly involves a degree of risk in itself.

The calculation method that will be used to determine any financial liability a fund might incur also plays a big part in the risk analysis. The precise calculation method that applies will depend on the type of claims that are available to the website owner (in particular, whether it has a valid claim for database right infringement as well as breach of contract). If it is limited to a contractual claim, a website owner will generally only be able to recover the loss it has incurred. If it does not license out the data in question, its loss may well be negligible. In such circumstances the website owner might be able to claim damages based on a notional reasonable royalty set by the court by reference to the licence fees that are charged for similar datasets.

If a website owner also has a valid claim for database right infringement, it is entitled to opt for an account of the profits the fund has made from its infringing activities. Clearly, such an award could be substantial if the fund generates significant profits directly from the use of the data in question. However, it is often the case that the data in question forms just one data point in a model that includes a variety of other factors. In that case, the fund’s liability should be limited to the proportion of any profits that are attributable to the use of the data in question only.

This means that it may ultimately be difficult for a website owner to identify any significant profits that are directly attributable to the use of the data in question. Unfortunately, that will not necessarily prevent a sufficiently motivated website owner from trying.

[email protected]