For PDFs

  • The text can’t be selected

    Try selecting the text with your mouse. If you can’t highlight words (only boxes appear, or nothing at all), the file is likely a scanned image.

  • Copying text gives strange results

    If you copy and paste a few words into another document and they come out scrambled, missing letters, or showing weird characters, the PDF’s text is not properly readable.

  • The file asks for a password or shows restrictions

    If opening the PDF prompts for a password or you see a message about copying restrictions, it’s a sign that the document is locked.

  • The PDF feels like a scanned document

    If the file looks more like a photograph or printed page than clean, digital text, it’s probably an image-based PDF.


For links (webpages)

  • The content appears only after interaction

    If you open a webpage and the information loads only after you click something, scroll down, or wait a few seconds, it may not be immediately available for automatic extraction.

  • You see a login screen instead of content

    If you land on a page that asks for a username and password before showing any information, the content is hidden behind a login.

  • Text is shown inside images

    If the important text looks like it’s part of a banner, screenshot, or decorative image (and you can’t select it), it’s actually an image and not real text.

  • You are asked to verify you’re human (CAPTCHA)

    If a site asks you to click boxes, solve puzzles, or otherwise prove you’re human, it’s protecting its content from automatic tools.


Pro tip


If the file or webpage feels hard to highlight, copy, or immediately see, there’s a good chance the app might also have trouble processing it.

When in doubt, try previewing the file or link first to catch any issues early before using it as a source.