Legal Issues Papers

Use Case - Reuse of Emergency Calls embedded in TV Shows (official)

In this document, the ELRC legal helpdesk analyzes under which legal conditions audio, video and dialogue subtitles coming from emergency calls embedded in a German TV show can be re-used for developing AI models.

Several legal aspects specific to the German legislation including intellectual property and copyright protection are reviewed. The use and sharing of different types of data (audio, video and transcriptions of the dialogues), and their derivatives, for research and commercial purposes, are also tackled.

What’s new in the Directive on Copyright in the Digital Single Market

[This article was initially published in the ELRC+3 Newsletter on April 24, 2019]

1.Text and data mining exceptions (articles 3 and 4)

As per article 2(2), text and data mining (TDM) is defined as “any automated analytical technique aimed at analyzing text and data in digital form in order to generate information which includes but is not limited to patterns, trends and correlations”. The Directive provides for two exceptions for TDM: one mandatory (will be the same in all EU Member States) and one optional (Member States do not have to implement, or can implement a limited version thereof).

a) Mandatory TDM exception

Research organisations (only non-commercial or public, see article 2(1)) and cultural heritage institutions (libraries, museums and archives) can copy material that they have lawful access to, in order to carry out TDM for scientific research purposes. The copies made in the process can be stored (with an appropriate level of security) for research purposes (including verification of results).

Right-holders can apply technological protection measures to prevent TDM of their content, but only to ensure the security and integrity of their networks and databases.

b) Optional TDM exception

This optional exception has potentially unlimited beneficiaries, and TDM can potentially be performed for any purpose. Reproductions made in the process can potentially be retained for as long as necessary. No sharing is allowed.

This exception only applies to content for which right-holders have not expressly reserved the right to mine (“mineable by default”). In other words, it is enough for the right-holder to indicate that she/he does not allow for her/his content to be mined under this exception to be able to lawfully prevent such acts.

2. Extended collective licensing (article 12)

Member States may allow collective management organisations (e.g. SACEM, VG Wort) to grant licenses (limited to the territory of the Member State) for use of all the works in their sector of activity (especially when seeking permission from individual authors would be too costly and impractical). Such a mechanism already exists in Scandinavian countries, where research institutions use it with a lot of success to negotiate access to data.

3. New exclusive right concerning online uses of press publications (articles 15 & 16)

Publishers of press publications will have a new exclusive right to prevent online uses of their publications by “information society service providers” (such as news aggregators). The right does not affect private or non-commercial uses by individuals. It does not apply to mere hyperlinking, and to the use of individual words or “very short extracts”. It does not in any way affect copyright and other related rights.

This right expired two years after the publication. Scientific and academic periodicals are excluded (article 2(4)).

The impact of this new right on the Language Technology community will be very limited, but it is important to mention it here, as it can easily be misinterpreted as granting free access to newspaper articles two years after their publication.

4. New rules concerning liability of online content-sharing service providers (articles 17 and ss.)

Online content-sharing service providers (OCSSP) are those who for profit-making purposes provide services whose main or one of the main purposes is to store and give the public access to a large amount of content uploaded by their users (article 2(6)). Not-for profit encyclopedias (Wikipedia), not-for-profit educational and scientific repositories (ArXiv), open source software developing and sharing platforms (GitHub), online marketplaces and some cloud services are excluded.

OCSSP need licenses from right-holders to provide their services (despite the fact that content is uploaded by users). Otherwise, they are liable for copyright infringement, unless they demonstrate that they made their best efforts to obtain a license, prevent access to infringing content and in any event acted promptly to disable access to the content upon receiving a notice from right-holders.

A license given to an OCSSP automatically grants users of the service rights to use the content for non-commercial purposes.

A license is not needed if the uploaded content was generated on the basis of the quotation exception, or the caricature/parody/pastiche exception (no research exception).

“Young” and small OCSSP (active for less than 3 years and with annual turnover of less than 10 million EUR) are subject to less strict obligations.

Right-holders are entitled to “appropriate and proportionate” remuneration for the use of their content by OCSSP. At least once a year, the OCSSP sends them information about the uses of their content, in particular regarding the modes of exploitation and the generated revenues. This information should also cover any uses made by sub-licencees.

The new Directive will enter into force on the twentieth day following its publication in the Official Journal (which will happen any day now). After that, the Member States will have 24 months to implement it in their national laws.

The General Data Protection Regulation (GDPR)

[This article was initially published in the ELRC+3 Newsletter on March 28, 2019]

The General Data Protection Regulation (GDPR) is an EU regulation (2016/679) of 27 April 2016. It entered into force on 25 May 2018 and replaced the Data Protection Directive of 1995 (95/46/EC).

Unlike a directive (which requires transposition into national law), a regulation applies directly and uniformly across all the EU Member States. The shift from a directive to a regulation in the domain of data protection is therefore a very significant step towards unification of national laws, and the establishment of a single European market – in practice, however, numerous articles of the GDPR require national transposition (including some of those relevant to scientific research), so the legal framework remains fragmented.

Contrary to popular belief, the GDPR is far from being revolutionary. Most of the definitions and principles that governed the processing of personal data under the 1995 Directive remain the same. However, the administrative fines are now significantly higher: up to 20 000 000 EUR or 4% of global annual turnover (whichever is higher). The efforts to comply with the GDPR have therefore intensified, and so have the audits carried out by Data Protection Authorities (in France, Google was recently fined 50 000 000 EUR only for some features of their Android operating system).

“Personal data” is defined very broadly as “any information relating to an identified or identifiable natural person (data subject)”. The notion covers directly identifying information (name, address, personal e-mail, phone number), but also elements that in combination with others may identify the person that they relate to (a mother of five of Moroccan descent who lives in Paris, works as a nurse and has a collection of 1960’s Jaguars). The ‘public’ and the ‘private’ spheres of life are equally protected. However, information related to legal persons (e.g. companies) as well as the deceased is not concerned (although some Member States may have specific protection for personal data of the deceased).

The process of “breaking the link” between the information and the person it refers to is called anonymization. Anonymized data are no longer personal data and can be processed without restrictions. However, the standard for anonymization is high: it should be irreversible, and the person should be impossible to identify by anyone and ‘by any means reasonably likely to be used’. Anonymization is now a research discipline in its own right: some well-described anonymization techniques include noise addition, k-anonymity and t-closeness.

“Processing” is also defined broadly as “any operation or set of operations which is performed on personal data or on sets of personal data, whether or not by automated means”. This includes collection, storage, consultation, transfer, but also deletion.

The person (natural or legal) that determines the purposes and means of processing is referred to as “data controller”. The person that merely processes data on behalf of the controller is called “data processor” (it is important to note that processors are not, contrary to popular belief, completely exempted from liability for processing).

In order to comply with the GDPR, processing has to respect the following principles:

  • lawfulness (see below), fairness and transparency;
  • purpose limitation (data can only be processed for a specified, explicit and legitimate purpose, and not further processed for an incompatible purpose);
  • data minimization (data processed have to be adequate, relevant and limited to what is necessary);
  • accuracy (data have to be accurate and when necessary kept up to date);
  • storage limitation (data cannot be stored for longer than necessary to achieve the purpose of processing);
  • integrity and confidentiality (data have to be stored in a secure environment and protected against unauthorized access or accidental destruction);
  • accountability (the data controller has to be able to demonstrate compliance).

In order to be lawful, processing has to be based on one of the grounds enumerated in article 6 of the GDPR. This is the case when, e.g. :

  • the data subject has given his informed consent to the processing (consent can be withdrawn at any time, but not retroactively); or
  • processing is necessary for the performance of a contract to which the data subject is party; or
  • there is a legal obligation to process the data; or
  • there is a legitimate interest in the processing which overrides the interests of the data subject in the protection of his data.

Apart from the abovementioned principles, data controllers may have to comply with other obligations, such as:

  • keeping a register of data processing operations;
  • implementing “data protection by design and by default”;
  • when necessary (i.e. when the processing may result in a high risk for the rights and freedoms of the data subject), carrying out a Data Protection Impact Assessment prior to the processing.

The data subject has certain rights regarding his data, e.g.:

  • information (some information such as the identity of the controller and the purpose of the processing has to be provided to the data subject by the data controller, even if the data were not collected directly from the data subject; on the Internet, this is typically done via a ‘privacy policy’);
  • access and rectification;
  • erasure (“right to be forgotten”);
  • right to data portability;
  • right not to be subject to automated decision-making.

This strict framework is assorted with various exceptions, including for research purposes. First of all, research is exempted from the purpose limitation principle, as it is always regarded as a ‘compatible purpose’. Furthermore, the storage limitation principle is tempered, and rights of data subjects may also be limited. All those benefits are available under one condition: appropriate safeguards for the rights and freedoms of data subjects have to be implemented. These may include pseudonymization, increased transparency, carrying out of a Data Protection Impact Assessment etc. The details are left for national legislators to decide, so it is important to know the national provisions in this respect.

European Commission adopts new standard license for online content

The European Commission has adopted the Attribution 4.0 International (CC BY 4.0) standard license to make information they publish online reusable by the public.[1] But why do we need licenses and how do we use them correctly? What impact could the application of a standard license have for language data sharing? The ELRC consortium has asked Dr. iur. Pawel Kamocki to help us understand the underlying implications.

ELRC: Pawel, thank you for agreeing to this interview. Before we ask you about the CC BY 4.0 license itself, can you explain why we need a license when we want to reuse information that has been made openly available by the European Commission or other entities online?

Dr. iur. Pawel Kamocki: While raw information itself should and theoretically is free, its expression, or the data in which it is embodied, can be protected by intellectual property rights. These rights are in fact similar to traditional property: if you own a physical object, you can prevent others from using it, and sue anyone who does so without your permission. Intellectual property is the same, but – and this is what makes it both complicated and fascinating – it’s about immaterial goods: things of value that exist independently from their material support.

So, as I said, in principle, with a few exceptions, the use of IPR-protected content requires permission from the rightholder. This permission is granted in a document called ‘a license’. Licentia means ‘permission’ in Latin.

ELRC: To whom can such a license be granted?

Dr. iur. Pawel Kamocki: Usually, such a license is granted to a specific person or entity. However, a license can also be granted to the general public, i.e. everybody who has access to the content. The latter type of licenses is called ‘public licenses’ (although I really like the German term Jedermann-Lizenz). They were first developed with software in mind (we have all heard of the GPL, General Public License). At the very beginning of this century, the Creative Commons Foundation developed a series of public licenses for creative works, called… Creative Commons and maybe more commonly known as CC licenses.

The latest version of these licenses, the 4.0 version, covers not only copyright, but also related rights, such as the sui generis database right, which makes them a great tool for licensing of digital datasets in the European Union.

ELRC:  Could you explain (in one short sentence) what the sui generis database right entails?

Dr. iur. Pawel Kamocki: It’s an intellectual property right similar to copyright that was created by the Database Directive 1996 to protect investment in producing a database. I explain it in more detail in an article published in one of the recent ELRC newsletters (http://lr-coordination.eu/node/969).

ELRC: Coming back to CC licenses: what is their unique benefit?

Dr. iur. Pawel Kamocki: The idea behind Creative Commons licenses is simple: to grant everyone permission to use the work and thereby shift from the traditional ‘all rights reserved’ logic to ‘some rights reserved’. It is important to keep in mind that CC-licensed content is still under copyright, but a broad permission to use it (at least in a certain manner) is granted up front. The use of the work, however, is still subject to some conditions, the violation of which amounts to copyright infringement.

ELRC: Can you explain the main characteristics of the “Attribution 4.0 International” license (CC BY 4.0)?

Dr. iur. Pawel Kamocki: BY or ‘attribution’ is the fundamental condition of all CC licenses. It is commonly believed that all that it requires is to mention the source. However, the attribution requirement under CC BY 4.0 goes further than this.

In short, the CC BY 4.0 license allows everyone to re-use, share and modify the licensed content, provided that:

  • the creator of the work is identified;
  • any other person or entity designated by the rightholder to receive attribution is identified (e.g. the funder);
  • the copyright notice (if present) is retained;
  • the CC BY 4.0 license is referred to, preferably with a URL;
  • if practicable, a URL to the original work should be retained;
  • if the content was modified, it should be indicated, too.

So, a proper attribution notice should at least look like this:

This work was created by P. Kamocki and is available under a CC BY 4.0 license

ELRC: That does not sound like a very permissive license after all. Is the CC BY 4.0 an open license?

Dr. iur. Pawel Kamocki: CC BY 4.0 is by all accounts an open license, as it meets the criteria set forth in the Open Definition. Actually, only two CC licenses: CC BY and CC BY-SA are open licenses.

It should also be noted that no additional conditions or restrictions can be imposed on CC-licensed content. Therefore, anyone who shares CC-licensed content saying that it can only be used by institution x actually violates the license and infringes on the rightholder’s copyright. That said, modified versions of CC BY 4.0-licensed content can be shared under any conditions, including as ‘all rights reserved’. Only the SA (share-alike) requirement, for example in the CC BY-SA license, creates an obligation to share modified content under the same license.

ELRC: What are the main changes to the previous (not international) version of this license and what are potential weaknesses of the license?

Dr. iur. Pawel Kamocki: Most importantly, the previous versions did not cover the sui generis database right, which meant that they did not provide for an appropriate level of legal security in the EU. In fact, a bona fide user of a dataset licensed under a CC BY 3.0 license could theoretically still be sued for infringement of the sui generis database right.

Secondly, previous versions of CC licenses had many national versions, called “ported versions”. These versions were not only translated, but also adapted to local law. Oftentimes, some rather far-reaching choices were made in the adaptation process. And since in legal matters every word counts, we ended up with rather substantial differences between for example Dutch and Belgian versions of the same license.

Now, imagine you want to use a dataset licensed under a Dutch version of the license. You know it may not be identical to the German version, so you should probably make an effort to read it first. But what if you don’t speak Dutch…? As you see, the porting process had some adverse consequences, and this is why porting for CC 4.0 licenses is not authorised.

ELRC: The Commission has adopted the CC BY 4.0 license as a new standard license for the reuse of Commission documents. Is this concept easily transferrable to the reuse of Public Sector Information made available by national public administrations as well?

Dr. iur. Pawel Kamocki: There is no doubt that CC BY 4.0 is currently the best tool for those who want to make digital datasets openly available. So, the Commission made a wise choice.

However, this choice is not binding the Member States, some of which have their own, long-standing traditions when it comes to making Public Sector Information available for re-use.

Some countries, such as Poland or Germany, dedicate a fair share of Public Sector Information to the Public Domain. Public Domain material is by definition not protected by intellectual property rights and therefore cannot be licensed at all, it can be freely used by anyone for any purpose.

Many other countries have their own licenses, more or less inspired by or compatible with CC BY. I can think of France, Norway, and above all the UK. In the UK, public sector information is protected by copyright belonging to the Crown (so-called Crown copyright). It is then made available under a license called Open Government License.

Ireland, for example, endorses the use of CC BY 4.0 for Public Sector Information (Circular 12/2016 of the Department of Public Expenditure and Reform), whereas France, probably for fear of ‘americanisation’, actually… prohibits the use of CC licenses (only Licence ouverte and Open Database License are allowed by art. D323-2-1 of the Code des relations entre le public et l’administration).

Undoubtedly, some more harmonisation at the EU level would be welcome, and the official endorsement of CC BY 4.0 by the Commission is a step in the right direction.

ELRC: In what way can making Public Sector Information available under the CC BY 4.0 license, for example in Open Data Portals, help to make language data sharing easier between national institutions but also across borders?

Dr. iur. Pawel Kamocki: As mentioned above, CC 4.0 licenses cover not only copyright, but also the sui generis database right, which makes them a perfect tool for sharing digital datasets such as language resources (LR).

Unlike their previous versions, CC BY 4.0 will not have ‘ported’ versions. This indeed makes them perfect for international use.

The use of popular, internationally recognized tools can considerably reduce transaction costs related to the sharing of LR both at the national and the international level.

ELRC: You made this rather complex issue much clearer. Thank you very much Pawel!

 

Pawel Kamocki was trained in both law and corpus linguistics; he holds a Dr. iur. degree from the University of Münster, as well as a docteur en droit degree from Sorbonne Paris Cité. Pawel is working for ELDA as a Legal Issues Expert.

The interview was conducted by Lilli Smal (DFKI) who is part of the ELRC consortium.

[1] Cf. European Commission: https://ec.europa.eu/jrc/en/news/commission-makes-it-even-easier-citizens-reuse-all-information-it-publishes-online, last accessed: 12 June 2019.

Web crawling Report

This report was produced in the framework of “ELRC+ L2 Project “Tools and Resources for CEF Automated Translation“, under SMART 2015/1091 service contract granted by the European Commission.

The purpose of this report is to analyze the question whether and under what conditions web crawling operations can be lawfully conducted.

It starts with a general overview of web crawling, which briefly presents the procedure and discusses possible scenarios for which crawled data can be used. Then, it proceeds to the legal analysis of the problem, which takes into account such legal frameworks as copyright, the sui generis database right, digital rights management, data protection, contract law and conflict of laws.

The analysis is focused on EU law (with the laws of Germany and France often quoted as examples), but some questions specific to the US law are also discussed. The Sanctions section discusses possible sanctions for unlawful web crawling.

The conclusion proposes a roadmap – a set of recommendations that should be taken into account before the start of any web crawling operation.