How to Fix PDF Text That Copies as Boxes or Symbols
You select text in a PDF, paste it into your document, and instead of words you see a line of empty boxes: □□□□□. Or maybe hollow rectangles, question marks in diamonds, or other symbols that clearly aren't what was on the page. The text looked perfectly fine in the PDF — so what happened?
This is a specific and particularly frustrating variation of the broken PDF text layer problem. Here's exactly what those boxes mean and how to fix it.
What Those Boxes Mean
When you see □ (empty boxes) or similar symbols after pasting from a PDF, your computer is telling you: "I received character codes, but I don't have any visual representation for them." These are called "replacement characters" or "missing glyph" indicators.
The problem starts inside the PDF's font mapping. Every font in a PDF assigns an internal number to each glyph (character shape). When you copy text, the PDF viewer looks up each glyph number in a mapping table called a toUnicode CMap to find the corresponding Unicode character code. If that mapping table is missing, the viewer often outputs the raw glyph number as a character code.
These raw glyph numbers don't correspond to real Unicode characters. When your text editor tries to display them, it has no font glyph to show, so it draws a box as a placeholder. The box is your operating system's way of saying "I don't know what character this is supposed to be."
Why This Happens
The box problem has two primary causes:
Font subsetting without CMap. To reduce file size, PDFs often include only the glyphs actually used in the document rather than the entire font. This is called font subsetting. When a font is subset, the glyph IDs are typically renumbered (glyph #0, #1, #2...) for efficiency. If the PDF creator doesn't include a toUnicode CMap that maps these new glyph IDs to Unicode characters, the copied text becomes meaningless numbers that display as boxes.
Private Use Area mapping. Some PDF creators map glyphs to the Unicode Private Use Area (PUA) — a range of character codes reserved for custom symbols. These codes have no standard meaning, so they display as boxes in most text editors. The PDF viewer can still show the correct glyph because it uses the embedded font images, but copied text uses the PUA codes which are meaningless outside the PDF.
Both causes share the same root issue: the PDF was created without properly linking the visual glyphs to their Unicode character equivalents.
Quick Diagnosis: Is It a Font or Layer Problem?
When you encounter boxes, it helps to determine the exact nature of the problem:
- All text copies as boxes: The entire font mapping is missing. The PDF likely has no toUnicode CMap for any of its fonts.
- Some text copies correctly, some as boxes: Only certain fonts in the PDF are affected. This is common when a document uses multiple fonts and some have proper mappings while others don't.
- Text copies as a mix of correct characters and boxes: The font has a partial CMap that maps some glyphs correctly but is missing entries for others.
- Text copies as different wrong characters (not boxes): This is a different problem — the CMap exists but has incorrect mappings. See our article on why PDF copy-paste gives gibberish.
Regardless of which variant you're experiencing, the fix is the same: rebuild the text layer from the visible content.
How to Fix It
Since the visual text on the page is correct — the PDF viewer draws the right glyphs — OCR can read that visual content and create a proper text layer with correct Unicode character codes. The process is:
- Remove the broken text layer. Strip out the existing text operators and their useless character mappings.
- Run OCR on each page. Read the visible text, identifying every character and its position on the page.
- Rebuild with correct Unicode. Create a new text layer where every character is mapped to its proper Unicode code point.
You can do this manually using tools like OCRmyPDF (free, command-line) or Adobe Acrobat Pro (paid, GUI). For the fastest and most accurate results, FixPDFCopy.com handles all three steps automatically using enterprise-grade OCR. Upload your PDF, and the boxes become proper, copyable text.
The key thing to understand is that the information isn't lost — it's visible right there on the page. The PDF just has a broken internal data structure that prevents your computer from accessing it as text. OCR bridges that gap by reading the visual content and creating the correct data structure.
For a broader understanding of all PDF copy-paste problems and their solutions, see our complete guide to the PDF copy-paste problem. If you're not sure whether your issue is boxes, gibberish, or something else, our fix PDF copy-paste page walks through all the symptoms.
Fix Your PDF Now
Upload your PDF and we'll fix the text layer in minutes. Just $1 + $0.01/page. 100% money-back guarantee.
Fix My PDF →Frequently Asked Questions
Why do I see boxes instead of text when I paste?
The boxes represent Unicode characters that your text editor cannot display. This happens when the PDF's font mapping table points to non-standard character codes instead of normal letters. The PDF viewer draws the correct glyphs using embedded font images, but the copied character codes are meaningless.
Is this different from gibberish text when pasting?
It is a variation of the same problem. Gibberish means the mapping points to wrong but displayable characters. Boxes mean the mapping points to character codes that have no visual representation in your font. Both are caused by broken or missing Unicode mapping tables in the PDF.
Will a different PDF viewer fix the box problem?
No. The problem is in the PDF file, not the viewer. You might see slightly different symbols in different viewers — some show hollow boxes, others show question marks — but the underlying issue is the same. The character mapping data inside the PDF is broken.
Can I fix just the pages that have the box problem?
When you process a PDF through an OCR service, all pages are processed. This ensures consistent text quality throughout the document. Even pages that appear to copy correctly may have subtle encoding issues that get fixed in the process.