By PDFKits Team — Published February 19, 2026
TL;DR: PDF privacy fails in invisible places: author names in document properties, recoverable text under cosmetic redactions, GPS coordinates in embedded photos, and Word revision history that survives export. Sanitizing means stripping metadata, flattening forms and annotations, and verifying redactions are content-stream. Both GDPR and HIPAA treat leaked metadata as part of the record — a hidden field can trigger a reportable incident. PDFKits sanitizes free, 100% in your browser, with no upload.
A PDF is far more than its pages. Around the visible content stream sits an object table cataloging every element, an XMP metadata stream recording authoring software and timestamps, document-info fields (author, title, keywords) inherited from the source file, plus optional embedded files, JavaScript, form data, and annotations. Most of it is invisible during normal reading and trivially extractable with a free inspector — or a text editor.
Ranked by real-world exposure: document-info fields (author identity, original filename, sometimes the local file path); XMP metadata (software, edit timestamps, document ID chains — covered in depth in our PDF metadata editing guide); embedded image EXIF (phone photos keep GPS coordinates — a "photo of a contract page" can leak a home address); recoverable redactions (black rectangles drawn as annotations leave the text selectable underneath); persistent form data (values hidden in JavaScript even when fields display empty); and track-changes history that survives Word-to-PDF export, deletions and all. Cloud cleaners like Smallpdf can strip some of this — after you upload the sensitive file to their servers, which inverts the goal.
Best for: lawyers, journalists, healthcare staff, compliance teams, and anyone whose documents would make news if their hidden layers were read aloud.
Beyond the classic failures, two quieter channels deserve a place on the checklist. First, named object layers: illustrations pasted from design tools can carry layer names visible in any PDF object inspector — one publicly traded company leaked an unreleased product codename this way, embedded in its own earnings PDF as "Q3 launch" layer metadata. Second, derived-document chains: XMP's DerivedFrom entries can map a published file back to internal drafts, revealing filenames and folder structures that were never meant to be public. Neither shows up in a visual review; both fall out automatically when you strip XMP and re-flatten before release. The deeper the document's editing history, the more of these breadcrumbs it accumulates — a one-pass export from a clean source is always safer than a file that has been through five tools and three contractors.
| Feature | PDFKits | Adobe Acrobat Pro | Smallpdf | iLovePDF |
|---|---|---|---|---|
| Cost | Free | $14.99/month | $9/month | $48/year Premium |
| Sensitive file leaves your device | Never | No (desktop) | Yes — cloud upload | Yes — cloud upload |
| Strip document-info + XMP | Yes | Yes (Sanitize) | Limited | Limited |
| Content-stream redaction | Yes | Yes | Limited | No |
| Flatten hidden form data | Yes | Yes | Partial | Yes |
| DPA / vendor review needed | No — no vendor | For cloud features | Yes | Yes |
For anyone bound by GDPR's data-minimization principle or HIPAA's minimum-necessary rule, browser-only processing simplifies the paperwork as much as the workflow: no third-party processor means no Data Processing Agreement, no sub-processor list, no breach-notification clause to negotiate.
Under GDPR Article 4(1), personal data is any information relating to an identifiable person — names, identifiers, location data. A document author's name in metadata and GPS coordinates inside an embedded photo both qualify, so they fall under the same lawful-basis, minimization, and breach rules as visible content. HIPAA's Privacy Rule draws the equivalent line for health information: a referral PDF whose metadata identifies the patient or clinic can constitute an impermissible disclosure even when the visible pages were properly authorized.
The operational rule is simple. If the visible content is meant to be public, strip everything else. If the content is restricted, control distribution and sanitize anyway — copies leak, and metadata travels with every copy.
Trusting the black rectangle. The most common high-profile failure: annotation overlays instead of content removal. Always run the select-copy test.
Encrypting instead of sanitizing. A password gates access, but every authorized recipient sees the metadata in full. The two layers solve different problems.
"Save As" before forwarding. Some viewers update document-info or add annotations on re-save, inserting your identity into a file you merely relayed. Forward the original unchanged, or sanitize first.
Forgetting scans from phones. Phone-"scanned" PDFs embed camera EXIF, including location if enabled. Desktop scanner output is cleaner but still carries scanning-software fields.
Sanitizing the final copy only. Drafts shared mid-negotiation leak the same way finals do. Make sanitization the default for anything leaving the organization, not a closing ritual.
No. Viewers render the content and leave the file intact; metadata survives views, downloads, and re-uploads until something explicitly strips it.
Start with File → Properties (Description and Advanced tabs), then test redactions with select-and-copy, and inspect embedded images for EXIF. Command-line users can run pdfinfo or search the raw file for /Author and /Creator entries.
Select across the redacted region and press Ctrl+C — the clipboard must stay empty. Then search the document for the redacted string and expect zero hits. Real redaction removes content; cosmetic redaction merely hides it.
Only against people without the password. Every legitimate recipient sees the metadata fully. Encryption is a confidentiality layer; sanitization is a leak-prevention layer; most sensitive sharing needs both.
Re-exporting carries some fields forward and adds fresh Creator and Producer entries. Sanitization deliberately strips document-info, XMP, JavaScript, form data, and annotations. Acrobat's Sanitize Document and a Clean Metadata + Flatten sequence in PDFKits both qualify.
No. Desktop scans carry scanner-software fields; phone scans add camera EXIF that may include GPS. The image pixels are clean, but the wrapper is not.
Yes. Any metadata that identifies a person — author names, usernames in edit trails, location data in embedded photos — is personal data under Article 4(1) and must be included in processing records and minimization decisions.
Yes. Protected health information includes identifiers in any form; metadata pointing to a patient, doctor, or clinic that reaches an unauthorized party can be a reportable disclosure under the Privacy Rule.
Transit adds headers to the email, not the attachment — the PDF arrives byte-identical, hidden data included. If you would not show a recipient the metadata, sanitize before attaching.
It removes functionality by design: interactive forms, comments, embedded media. For a final read-only release that is desirable. For a working document, sanitize a copy for external distribution and keep the interactive original internally.
Clean Metadata — strip document-info and XMP fields. Redact PDF — true content-stream redaction. Flatten PDF — remove hidden form data and annotations. Extract Images — pull embedded photos for EXIF checks. Protect PDF — encrypt the sanitized output. Edit PDF — adjust visible content before sanitizing.