Privacy Policy
This page describes how GROBID Tools (the "site", reachable at grobid.org) handles information when someone visits the site or uses the in-browser PDF extractor. The goal is to explain things plainly, not to bury practices in legalese.
The short version
- The PDF you choose is processed entirely in your browser. The file, its text, and the metadata extracted from it are never transmitted to this site or any third party.
- Standard, anonymous web traffic data is collected via Google Analytics so the site can be maintained.
- Pages may display advertising provided by Google AdSense. AdSense and its partners may use cookies to personalise ads.
- You can opt out of personalised advertising and clear or block cookies in your browser at any time.
Information processed in your browser only
When you choose a PDF on the homepage, the file is loaded into your browser's memory using the standard FileReader and ArrayBuffer APIs. PDF.js parses the page content, and a set of regular-expression based heuristics extract titles, authors, identifiers, citations, and similar fields. Export files (JSON, plain text, Markdown, DOCX, BibTeX) are constructed in memory and offered as a browser download.
None of that data crosses the network. There is no server-side processing endpoint to send it to. If you stop the page or close the tab, the file and everything derived from it disappears with the tab's memory.
Information collected by the site
The site collects the kind of basic data that nearly every web property collects:
- Server logs. The hosting provider keeps short-term records of HTTP requests, including IP address, user agent, referring URL, and timestamps. These are used to diagnose outages and to detect abuse.
- Web analytics. Google Analytics (measurement ID G-H1FWND0S6P) records page views, approximate location, device type, and similar aggregated metrics. IP addresses are processed by Google in line with its current handling policy.
- Advertising. Google AdSense may serve ads on this site. AdSense, Google's advertising partners, and other third parties may use cookies to serve ads based on prior visits to this and other sites.
Cookies and similar technologies
Cookies are small text files that a website (or a third party loaded by the website) stores in your browser. The categories used here are described in the cookies policy. In summary:
- Strictly necessary. Required for the site to load and function (for example, the service worker cache used to make the page work offline).
- Analytics. Used by Google Analytics to count unique visitors and measure broad usage patterns.
- Advertising. Used by Google AdSense and its partners to deliver and personalise ads, frequency-cap them, and measure their performance.
Google AdSense disclosure
Third-party vendors, including Google, use cookies to serve ads based on a user's prior visits to this website or other websites. Google's use of advertising cookies enables it and its partners to serve ads to you based on your visit to this site and/or other sites on the Internet.
You may opt out of personalised advertising by visiting Google Ads Settings. You can also opt out of a third-party vendor's use of cookies for personalised advertising by visiting www.aboutads.info or, in Europe, www.youronlinechoices.eu. Google's own advertising and privacy practices are described at policies.google.com/technologies/ads.
Third-party services in use
- Google Analytics — anonymous traffic measurement.
- Google AdSense — advertising.
- jsDelivr (cdn.jsdelivr.net) — delivery of the PDF.js, pdf-lib, and docx libraries that run inside your browser.
- The site's hosting provider — TLS termination and serving of static files.
Each of those vendors handles data under its own privacy policy.
Your rights under GDPR and similar laws
If you are in the European Economic Area, the United Kingdom, or another region with comparable privacy law, you generally have the right to:
- Ask what personal data, if any, the site holds about you.
- Ask for inaccurate data to be corrected.
- Ask for data to be deleted, subject to legal retention obligations on the hosting and analytics providers.
- Object to processing or restrict it.
- Withdraw consent for non-essential cookies through your browser settings.
- Lodge a complaint with your local data protection authority.
Because the PDF extractor runs entirely client-side, the site itself does not hold a database of file contents that could be the subject of a typical access request — there is nothing on the server tied to your file.
Your rights under CCPA
If you are a California resident, you have the right to know what categories of personal information are collected, the right to request deletion, and the right to opt out of the "sale" or "sharing" of personal information as those terms are defined under California law. The site does not sell personal information for money. The use of advertising cookies described above may, however, qualify as "sharing" under California's broader definition, and you may opt out using the AdSense controls linked above and your browser's privacy controls.
Children
The site is not directed at children under the age of 13 (or the equivalent minimum age in your jurisdiction). The site does not knowingly collect personal information from children. If you believe a child has provided personal information here, please contact us so the relevant data can be removed.
Data retention
Server logs are retained for a short rolling window for diagnostic purposes. Analytics data is retained according to the Google Analytics retention setting in effect for this property. The site does not retain anything related to the contents of files processed by the extractor, because nothing related to those files reaches the server.
International transfers
The hosting and analytics providers used here operate globally. Data may be processed in countries other than the one you are in, including in the United States. Where transfers are subject to GDPR or comparable rules, the providers rely on the legal mechanisms (such as Standard Contractual Clauses) that they describe in their own policies.
Changes to this policy
This policy may be updated from time to time, for example when a new vendor is added or when an existing vendor changes its practices. The "Last reviewed" date at the top of the page reflects the most recent substantive update. Material changes will be reflected in the date and, where appropriate, called out on the homepage.
Contact
Privacy questions and data-related requests can be sent to [email protected] with "Privacy" in the subject line. Please describe the request as specifically as possible so it can be handled correctly.