Microsoft Word (.docx) export
The DOCX export produces a Word document with the extracted PDF's metadata, abstract, body text, and references. It is built locally in the browser using the docx library, then offered as a download. Like the other export formats, the file never leaves your machine on its way to being created.
Document structure
The exported file uses Word's default heading styles so it inherits the look of whatever template Word applies on opening:
- Heading 1 — the extracted title, centred.
- Centred italic line — the author list, when authors were detected.
- Heading 2 — "Document Metadata", followed by lines for DOI, year, page count, and any other identifiers.
- Heading 2 — "Abstract", with a single paragraph below, when an abstract was extracted.
- Heading 2 — "Full Text", with paragraph breaks reconstructed from the PDF's text layer.
- Heading 2 — "References", with one entry per parsed citation, prefixed with its number.
Because the document uses Word's standard heading styles, generating a table of contents from References → Table of Contents works without further configuration.
What works well
- Reading and annotating. The result is a Word document like any other. Track Changes, Comments, and review tools all behave normally.
- Restyling. Apply a different template (View → Word Templates) to give the document the look of a journal submission, an internal report, or a personal note style.
- Importing into a reference manager. Run the file through Word's reference manager add-in (Mendeley, EndNote, Zotero, the built-in citations tool) to keep its bibliographic data in one place.
What does not transfer
- Equations. Mathematical formulas in the original PDF are usually rendered as images or as inline character runs that read poorly. The export does not attempt to reconstruct LaTeX or MathML.
- Tables. The cell-level structure of tables is lost. Table contents will appear as a sequence of paragraphs.
- Figures. Image content of the PDF is not extracted into the DOCX; only the text layer is.
- Two-column layout. The exported document is single-column. The original visual layout is not preserved.
Combining with Word's bibliography features
Word's built-in citations feature accepts XML files of the same shape Word writes itself. Many users prefer Mendeley or Zotero plug-ins; both can read the BibTeX or RIS file produced alongside the DOCX export, then insert formatted citations into the body of the document. A reasonable workflow is:
- Export both DOCX (for the body) and BibTeX or RIS (for the references) from the same PDF.
- Open the DOCX in Word, with your reference manager's plug-in active.
- Import the BibTeX or RIS file into the reference manager.
- Replace the inline citation markers in the body with proper citations from the manager, choosing the citation style you want.
- Regenerate the bibliography from the manager so it matches the inline citations exactly.
Sharing and privacy
The DOCX is created on your device and stored wherever your browser saves downloads. It does not contain any reference back to grobid.org and does not include hidden remote-loaded content. Sharing the file is no different from sharing any other Word document.