Key takeaways
- Scraping is not prohibited as such, but it is subject to a strict legal framework, and the fact that data are publicly accessible is not sufficient to authorise their exploitation.
- Database law restricts automated extractions, permitting only targeted, non-substantial uses or uses carried out within very narrowly defined statutory frameworks.
- Where scraping involves data that make it possible to identify individuals, the GDPR applies and requires compliance with core principles such as lawfulness, transparency, data minimisation and proportionality.
- The terms and conditions of online platforms may expressly prohibit scraping, and any breach of those terms is capable of giving rise to user liability, even where the data concerned are public.
- Non-compliant scraping practices expose operators to significant risks, ranging from contractual and administrative sanctions to criminal penalties in the most serious cases.
“Web scraping” (or “scraping”, sometimes translated as moissonnage [1]) refers to a technique for the automated collection of data available on the Internet, whether freely accessible or protected by technical barriers, for the purposes of reuse or analysis. This method, which relies on the use of indexing bots or automated scripts, covers a wide range of practices: the raw extraction of visible content (such as data from public LinkedIn profiles, for example), the processing or enrichment of such data to populate databases, the training of artificial intelligence (AI) algorithms, the provision of comparison services, or the automation of access to large volumes of information.
However, these uses raise complex legal issues. The mere fact that data are publicly accessible online is not sufficient to render them freely exploitable. Depending on the nature of the information collected and the technical means used for its extraction, scraping practices may infringe several legally protected rights. These infringements are all the more concerning given the growing prevalence of scraping, which increases the risks to privacy, freedom of expression and, more broadly, fundamental rights.
Accordingly, the lawfulness of such practices depends on a combination of legal norms drawn from different regimes, including intellectual property law (in particular the rights of database producers), the right to respect for private life and the protection of personal data, as well as the rules governing the use of data for the development or training of artificial intelligence systems. This statutory framework is complemented by a contractual framework that is equally decisive. By way of illustration, LinkedIn has updated its terms and conditions of use [2], strengthening the prohibition on scraping. This raises the question of the obligations applicable to the exploitation and extraction of data.
The legal framework governing scraping
While scraping is not prohibited as such in France or in most European Union (EU) Member States, a number of legal frameworks regulate and restrict this practice. These include, in particular, the following regimes.
Database law
Governed in particular by Article L.342-3 of the French Intellectual Property Code (Code de la propriété intellectuelle, CPI) [3], the sui generis right of the database producer regulates the extraction and re-utilisation of the contents of a database. Where a database is made available to the public by the right holder, certain statutory exceptions allow, subject to conditions, the carrying out of extractions, including by automated means such as scraping, without infringing the producer’s rights. Accordingly, the database producer may not prohibit, in particular:
- The extraction or re-utilisation of a non-substantial part of the database by a person who has “lawful access” to it. [4] The concept of a “non-substantial part” is assessed both quantitatively (the volume of data extracted) and qualitatively (the value or strategic importance of the information extracted).
As a result, limited, targeted, non-mass, occasional, non-repetitive and proportionate scraping, carried out for example by an authenticated user or by a person benefiting from authorised access through a subscription to the service (lawful access), may in principle fall within the scope of this exception.
- The extraction and re-utilisation of a substantial part of the contents of a database (substantiality again being assessed both qualitatively and quantitatively) [5]. Such acts may be permitted solely for the purposes of illustration in the context of research, including where the database is designed for educational purposes or originates from a digital edition of written works, provided that cumulative conditions are met: the intended audience must consist mainly of researchers directly concerned; the source must be clearly indicated; the extraction and re-utilisation must not give rise to any recreational, entertainment or commercial exploitation; and a lump-sum remuneration must be negotiated, in particular where the database derives from a digital edition of written works.
- Extractions or copies carried out in the context of text and data mining (TDM) [6]. These are permitted under certain conditions, in accordance with Article L.122-5-3 of the CPI [7]. First, where such extractions, copies or reproductions of databases to which lawful access has been obtained are carried out, without the authorisation of the database producer, by or on behalf of a research organisation, a publicly accessible library, a museum, an archival service or an institution responsible for cinematographic, audiovisual or sound heritage, and solely for the purposes of scientific research. These operations may involve the secure retention of data, in particular to allow verification of scientific results. This exception does not apply, however, where an undertaking associated with or holding shares in the entity carrying out the mining benefits from privileged access to the results.[8] Secondly, any person may carry out TDM, provided that lawful access to the databases concerned has been obtained, regardless of the purpose pursued, unless the right holder has expressly objected in an appropriate manner, in particular by means of machine-readable mechanisms for content made available to the public online. Digital copies or reproductions made for this purpose must be stored securely for the duration of the operation and destroyed once the mining has been completed.[9]
In addition, any contractual clause purporting to prohibit the exceptions relating to the extraction of a non-substantial part of a database or to TDM is deemed null and void [10].
Finally, these exceptions must not undermine the normal exploitation of the database or cause unjustified prejudice to the legitimate interests of its producer [11]. In other words, even where use falls within a regulated exception (for example, for research or educational purposes, or under TDM), the use of the extracted data must remain proportionate, limited in scope, and must not compete directly with the producer’s commercial or institutional uses.
Outside these strictly defined cases, a scraping operation may be characterised as unlawful extraction.
Personal data protection law
Where data collected through scraping make it possible to identify, directly or indirectly, a natural person, the applicable legal regime is that of personal data protection, governed primarily by the General Data Protection Regulation (Regulation (EU) 2016/679 – GDPR [12]) and by French Law No. 78-17 of 6 January 1978, known as the Loi Informatique et Libertés, as amended. In this context, any processing of personal data must, in particular, comply with the principles set out in Article 5 of the GDPR: lawfulness, fairness and transparency; purpose limitation (data must be collected for specified, explicit and legitimate purposes); data minimisation (only data that are strictly necessary must be collected); accuracy; storage limitation; and the integrity and confidentiality of the data.
As regards the lawfulness of processing [13], in the context of scraping the legal basis relied upon will often be legitimate interest. As regularly emphasised by the French data protection authority, the Commission nationale de l’informatique et des libertés (CNIL) [14], reliance on legitimate interest is subject to the fulfilment of three cumulative conditions.
First, the interest pursued must be legitimate, meaning that it must be lawful in light of all applicable rules, clearly and precisely defined, real and current. It cannot therefore consist of a hypothetical objective or one that conflicts with existing legislation, such as copyright law, nor of processing that bears no connection with the purposes or activities of the data controller. In addition, that interest must be brought to the attention of the data subjects, in accordance with the transparency obligations laid down by the GDPR.
Secondly, the envisaged processing must be necessary for the pursuit of the legitimate interest relied upon. The data controller must therefore demonstrate that no alternative means, less intrusive of the rights and freedoms of the data subjects, would allow the objective to be achieved. This requirement must be assessed in conjunction with the principle of data minimisation [15]. In practice, this involves verifying that the use of personal data is indispensable, and that processing them in a form allowing direct or indirect identification is justified by the purpose pursued. This assessment must also take account of existing technological alternatives, in particular developments enabling systems to be built on smaller volumes of data or on data that are less identifying. In this respect, data controllers are encouraged to favour privacy-respecting solutions from the design stage, in accordance with the principle of “privacy by design”.
Thirdly, the processing must not result in a disproportionate interference with the interests or fundamental rights and freedoms of the data subjects. To this end, the data controller is required to carry out a rigorous balancing exercise between the interest pursued and the concrete impact of the processing on individuals. This assessment requires not only the identification of the expected benefits, but also an evaluation of the risks incurred by the data subjects, taking into account, in particular, the specific context of the processing, the nature of the data processed, their sensitivity, and the category of persons concerned (vulnerable individuals, minors, etc.).
In this regard, the CNIL stresses that respect for the reasonable expectations of the data subjects is a central criterion when assessing the lawfulness of processing based on legitimate interest. While individuals may be aware that certain data published online may be consulted and/or reused, they cannot reasonably expect such processing to take place in all circumstances and for any purpose whatsoever.
Assessing those expectations therefore requires consideration of a range of factors, including whether the data are publicly accessible, the nature of the source websites – whether social networks, forums or specialist platforms – and any contractual or technical restrictions imposed by those sites, such as terms and conditions of use or exclusion protocols such as robots.txt.
Moreover, where the balancing exercise reveals an imbalance to the detriment of the rights of individuals, additional safeguards must be implemented in order to mitigate the effects and comply with the principle of proportionality. It is for the data controller to assess, on a case-by-case basis, the relevance and necessity of such measures, having regard to the specific features of the envisaged processing. In this respect, it is recommended to exclude by default the collection of data from particularly sensitive websites, such as those containing pornographic content, health forums or genealogical platforms, where the nature or volume of the information hosted could lead to the collection of particularly intrusive personal data. Similarly, data should not be collected from sites that explicitly object to the scraping of their content or to its reuse for the purpose of building databases intended for the training of AI systems.
In the same vein, data collection should be limited to content that is freely accessible, namely content that can be consulted without registration or account creation and in respect of which individuals are clearly aware of its public nature. This excludes, in particular, content published as part of private use on social networks or on websites whose public dimension is not explicit, such as certain petition platforms.
In addition, the data controller should endeavour to disseminate, as widely as possible, clear and accessible information regarding the data collection and the rights of the data subjects, using a variety of communication channels, whether through the publication of articles on its own platforms, via social media, or by making available an up-to-date list of the websites concerned by scraping practices. In some cases, information relayed directly by the publishers of the source websites may also constitute good practice.
Provision should also be made for a prior and discretionary right to object, enabling individuals to refuse the use of their data even before collection takes place. In this respect, the CNIL encourages the use of technical solutions facilitating the exercise of this right, such as opt-out mechanisms or “block lists” [16] , where appropriate. Furthermore, the processing of collected data should be accompanied, as soon as possible, by anonymisation procedures or, failing that, by pseudonymisation. To prevent abusive cross-referencing, it is also recommended to replace direct identifiers with random pseudonyms specific to each item of content – for example, to each post on a publicly accessible forum – unless the data controller is able to demonstrate that, for the development of the AI system or model, it is necessary to group together several data points relating to the same individual.
Criminal law
Scraping, particularly where it is carried out by deliberately circumventing technical protection measures, may constitute an offence involving automated data processing systems (systèmes de traitement automatisé de données, STAD) [17]. Such offences include, in particular, fraudulent access to a system, the unauthorised deletion or modification of data, and any interference with or impairment of its operation.
In addition, infringement of the rights of a database producer, in particular where acts of extraction or re-utilisation fail to comply with the conditions laid down in Article L.342-3 of the French Intellectual Property Code, may also constitute a specific criminal offence, punishable under Article L.343-4 of the same code.
A contractual framework for scraping: an illustration using LinkedIn’s terms and conditions of use
Article 8.2.2 of LinkedIn’s terms and conditions of use expressly prohibits scraping, stating that: “You agree not to develop, support or use software, devices, scripts, robots or any other means or processes (such as indexing robots, browser extensions and add-ons, or any other technology) intended to carry out web scraping or to copy the Services, including profiles and other data from the Services.”
This wording reflects the platform’s clear intention to prohibit any form of automated extraction, whether relating to profiles, content, technical feeds or elements of the platform’s infrastructure. This general prohibition is also closely linked to other contractual provisions. Article 8.2.3 thus prohibits users from “remplacer toute fonctionnalité de sécurité ou contourner ou éviter tout contrôle d’accès ou utiliser des limites des Services (comme des résultats de recherche, des profils ou des vidéos)”. By way of reminder, scraping, as an automated method of large-scale data extraction, frequently entails the circumvention of technical protection measures implemented by LinkedIn, such as CAPTCHAs, exclusion files such as robots.txt, or blocking mechanisms based on IP addresses. Accordingly, the prohibition set out in this clause forms part of a broader approach designed to prevent any attempt to neutralise these technical barriers intended to protect the integrity of the service.
Article 8.3.4 further expressly prohibits “copying, using, displaying or distributing information (including content) obtained from the Services, whether directly or via third parties […] without the consent of the content owner.” On that basis, any scraping operation carried out without prior authorisation is potentially capable of amounting to a clear contractual breach of this clause.
In addition, LinkedIn’s prohibition on scraping was brought into focus in the LinkedIn v hiQ Labs proceedings before the US courts from 2017 onwards. This case concerned, specifically, the automated exploitation of public LinkedIn profiles for predictive profiling for employers. While the initial decisions temporarily allowed hiQ Labs to continue its activities, the proceedings ultimately concluded in 2022 [18] with a settlement in LinkedIn’s favour, requiring the third-party company to cease all scraping activity on the platform.
What sanctions may be incurred?
Whether contractual, administrative or criminal, the risks associated with scraping practices are significant and should prompt enhanced vigilance as to the extraction and use of data from websites or platforms.
Contractual sanctions and contractual liability
To return to the LinkedIn example, whereas the previous version of the terms and conditions, which entered into force in 2022, limited sanctions to measures such as restricting, suspending or closing a user account, the version currently in force significantly expands the platform’s contractual toolkit. LinkedIn now expressly reserves the right to limit, suspend or permanently block access to its services, and to delete any content or any data shared by a user in the event of a breach of the terms and conditions (including in relation to scraping practices).[19] This development strengthens LinkedIn’s ability to prevent and sanction prohibited conduct, while protecting the integrity of its digital ecosystem.
In addition, the contractual liability of a “scraper” [20] or platform user may be incurred, enabling the recovery of damages in compensation for losses caused by the breach of the terms and conditions, in accordance with the ordinary rules.[21] In this respect, the Paris Judicial Court [22] has in particular recalled that a website’s terms and conditions cannot produce effects vis-à-vis a third party who has neither accepted them nor otherwise contractually subscribed to them. Only users who have actually adhered to the terms and conditions may have their contractual liability engaged in the event of unauthorised scraping, which excludes any action on this basis against an unconnected third party.
Non-contractual liability
Even in the absence of a contractual relationship with a platform, scraping practices may give rise to non-contractual liability. This may arise, in particular, from an infringement of Article L.342-3 of the CPI in the event of unlawful extraction from databases, as confirmed by the decision referred to above. [23] In that case, however, the claimant was unsuccessful, having failed to demonstrate that the extractions at issue undermined the recovery of its investments (a condition set by the Court of Justice of the European Union (CJEU) in the CV Online judgment). [24]
Other legal bases may also be relied upon in support of claims against unlawful scraping practices, including economic parasitism, where an operator improperly takes advantage of a platform’s investments, technical resources or reputation to develop a competing service based on the extraction of that platform’s data. Such conduct may also amount to unfair competition, where it results in the creation of a similar service offering by exploiting a competitor’s data to replicate its functionalities, divert its customers or weaken its position in the market. [25]
In this respect, the Paris Commercial Court [26], in a judgment of 30 September 2024, that a company specialising in recruitment software, which had used and collected large volumes of public data from LinkedIn profiles for sourcing purposes, was not required to obtain the explicit consent of the individuals concerned, since those data had been made public by the users themselves. Relying on Article 5 of the GDPR, the court considered that publishing a profile on LinkedIn evidences an intention to make it visible to potential employers, and therefore to accept implicitly that the information may be collected or processed. [27] The court nevertheless found that the conduct amounted to unfair competition, because the defendant company had infringed LinkedIn’s terms and conditions by using unauthorised scraping techniques. It was therefore ordered to pay damages to the platform.
Administrative sanctions
Non-compliance with the GDPR exposes scrapers, in particular, to administrative fines of up to EUR 20 million or 4% of worldwide annual turnover, whichever is higher. [28] By way of illustration, in a decision of 5 December 2024, [29] the CNIL’s restricted committee imposed an administrative fine of EUR 240,000 on KASPR for several infringements of the GDPR, in particular the obligation to have a lawful basis [30] and the transparency and information obligations towards individuals. [31] This case concerned “aspiration de données”, to use the term adopted by the restricted committee. The company offered a browser extension allowing the automatic collection of professional contact details (in particular telephone numbers and email addresses) from, among other sources, LinkedIn profiles, including where the individuals had expressly limited the visibility of their information to first- or second-degree connections, totalling around 160 million contacts. LinkedIn provides four privacy settings for contact details: visible only to the user; visible to everyone; visible to first-degree connections; or visible to first- and second-degree connections. The CNIL considered that circumventing these privacy settings went beyond users’ reasonable expectations as to the protection of their personal data. In those circumstances, the collection carried out by the company was held to be unlawful and could be justified neither by the individuals’ consent, nor by legitimate interest, nor by any other lawful basis (contract, etc.), given that the intrusion into private life was disproportionate.
In another case, by a decision of 8 December 2020, [32] the restricted committee imposed a fine of EUR 20,000 on a company for using and collecting data accessible online, via scraping practices, to populate a prospecting database used for commercial canvassing. In those circumstances, the restricted committee considered that this purpose required prior explicit consent from users, in accordance with Article L.34-5 of the French Post and Electronic Communications Code. [33] The company was also sanctioned for opaque practices, including the sending of emails where recipients had not been informed either of the collection or of the purposes for which their data were being used.
Criminal sanctions
Articles 323-1 and following of the French Criminal Code set out significant penalties for offences relating to STAD. For natural persons, penalties may reach EUR 300,000 in fines and ten years’ imprisonment in the most serious cases. For legal persons, fines are multiplied by five, and may therefore reach EUR 1,500,000, pursuant to Article 131-38 of the Criminal Code.
The specific offence under Article L.343-4 of the CPI (relating to the infringement of database producers’ rights) is punishable by up to three years’ imprisonment and a fine of EUR 300,000 for natural persons (increased to seven years’ imprisonment and a fine of EUR 750,000 where the offence is committed by an organised group). For legal persons, pursuant to Article L.343-6 of the CPI, these sanctions are aggravated, with a fine that may be multiplied by five, i.e. up to EUR 1,500,000.
In addition, in all cases, ancillary penalties may be imposed.
Practical guidance and recommendations
In summary, before initiating any approach that may include scraping, appropriate good-practice reflexes should be adopted in order to ensure legal certainty.
| Steps | Key actions |
| Definition of objectives and needs | – Clearly identify the objectives of the scraping: specific purposes (monitoring, market analysis, recruitment, etc.). – Identify the specific requirements (type of data, frequency, volume of information to be collected). |
| Validation through a prior legal audit | – Carry out a thorough legal audit of the proposed scraping practices in order to assess the associated legal risks. |
| Review of terms and conditions and of legitimate data access methods (Application Programming Interface – API) | – Review the terms and conditions in particular to determine the limits imposed on the use of the platform’s data and to verify whether the proposed scraping is expressly prohibited or restricted. – Ensure that scraping activities comply with the platform’s technical limitations and do not expose the organisation to blocking measures or sanctions. – Check whether the required information is available through legitimate access methods, such as the platforms’ APIs [34], in order to avoid any breach of the terms and conditions, and prioritise those access channels. |
| Compliance with applicable regulations (GDPR, intellectual property, etc.) | In particular, ensure compliance with the following: – Lawful basis: justify the processing of data by reference to an appropriate lawful basis. – Purpose limitation: verify that the collection complies with a defined, legitimate and explicit purpose. – Data minimisation: collect only the data strictly necessary for the defined purpose. – Respect for the database producer’s sui generis right: avoid substantial or repeated extractions that infringe the rights of the database producer. |
| Securing collected data | – Implement security measures to protect the collected data against any unauthorised access. – Avoid prolonged data retention and provide for deletion or anonymisation mechanisms once the purpose has been achieved. |
| Documentation of operations | – Document all stages and decisions relating to the project (review of terms and conditions, data protection impact assessment, etc.). – Maintain an audit trail in order to demonstrate compliance in the event of an inspection by a platform such as LinkedIn or by a data protection authority. |
| Training of teams | – Raise awareness among staff involved in the project of the legal and technical issues associated with scraping. |
| Monitoring and adaptation | – Implement a monitoring mechanism to detect changes to platforms’ terms and conditions (LinkedIn) or to the applicable legal framework. – Provide for regular review of the action plan in order to maintain compliance. |
References
1 CNIL, The legal basis of legitimate interest: focus note on measures to be implemented where data are collected through harvesting (web scraping), 19 June 2025
https://www.cnil.fr/fr/recommandations-developpement-ia-interet-legitime
2 LinkedIn, Terms and Conditions of Use, version in force since 20 November 2024
https://fr.linkedin.com/legal/user-agreement
3 French Intellectual Property Code (CPI), Article L.342-3
https://www.legifrance.gouv.fr/codes/article_lc/LEGIARTI000044365654
4 CPI, Article L.342-3, 1°
5 CPI, Article L.342-3, 4° and 4° bis
6 Text and data mining (TDM) is defined as “the implementation of an automated analysis technique applied to texts and data in digital form in order to extract information, in particular patterns, trends and correlations” (CPI, Article L.122-5-3, I)
https://www.legifrance.gouv.fr/codes/article_lc/LEGIARTI000044363192/2025-06-27
7 CPI, Article L.342-3, 6°
8 CPI, Article L.122-5-3, I
9 CPI, Article L.122-5-3, II
10 CPI, Article L.342-3, paragraph 10
11 CPI, Article L.342-3, paragraph 11
12 Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data
13 GDPR, Article 6
14 CNIL, AI: relying on the legal basis of legitimate interest to develop an AI system, 19 June 2025
https://www.cnil.fr/fr/base-legale-interet-legitime-developpement-systeme
CNIL, The legal basis of legitimate interest: focus note on measures to be implemented where data are collected through harvesting (web scraping), 19 June 2025
https://www.cnil.fr/fr/focus-interet-legitime-collecte-par-moissonnage
15 GDPR, Article 5
16 CNIL, How to use a suppression list to comply with objections to direct marketing
https://www.cnil.fr/fr/comment-utiliser-une-liste-repoussoir-pour-respecter-lopposition-la-prospection-commerciale
17 French Criminal Code, Articles 323-1 to 323-8
18 hiQ Labs, Inc. v LinkedIn Corp., United States Court of Appeals for the Ninth Circuit, 9 September 2019, No. 17-16783
https://cdn.ca9.uscourts.gov/datastore/opinions/2022/04/18/17-16783.pdf
19 LinkedIn, Terms and Conditions of Use, Article 3.4
20 Person engaging in scraping activities
21 French Civil Code, Article 1231-1
22 Paris Judicial Court (Tribunal judiciaire de Paris), 3rd Chamber, 2nd Section, 21 February 2025, No. 21/09261
23 See note 21
24 Court of Justice of the European Union, 3 June 2021, CV-Online Latvia SIA v Melons SIA, Case C-762/19
https://curia.europa.eu
25 Parasitic behaviour and unfair competition are based on Articles 120 and 1241 of the French Civil Code.
26 Paris Commercial Court (Tribunal de commerce de Paris), 15th Chamber, 30 September 2024
27 This interpretation does not, however, appear to be consistent with that of the CNIL; see the section “Personal data protection law”.
28 GDPR, Article 83
29 CNIL, Restricted Committee, Decision No. SAN-2024-023 of 5 December 2024
https://www.legifrance.gouv.fr/cnil/id/CNILTEXT000050791828
30 GDPR, Article 6
31 GDPR, Articles 12 and 14
32 CNIL, Decision SAN-2020-018 of 8 December 2020
https://www.legifrance.gouv.fr/cnil/id/CNILTEXT000042848036
33 French Post and Electronic Communications Code (CPCE), Article L.34-5, paragraph 1:
“Direct marketing by means of an automated electronic communications system within the meaning of 6° of Article L.32, a fax machine or electronic mail using the contact details of a natural person, subscriber or user, who has not given prior consent to receive direct marketing by such means, is prohibited.”
34 For example, in the case of LinkedIn: the official LinkedIn API
Disclaimer
The opinions, presentations, figures and estimates set forth on the website including in the blog are for informational purposes only and should not be construed as legal advice. For legal advice you should contact a legal professional in your jurisdiction.
The use of any content on this website, including in this blog, for any commercial purposes, including resale, is prohibited, unless permission is first obtained from Evidency. Request for permission should state the purpose and the extent of the reproduction. For non-commercial purposes, all material in this publication may be freely quoted or reprinted, but acknowledgement is required, together with a link to this website.



