If DeepResearch isn't analyzing an uploaded PDF or image, the issue likely falls into one of three categories: unsupported formats or corrupted files, server-side processing limitations, or content extraction failures. Let’s break these down with concrete examples.
First, file format compatibility is a common culprit. While PDFs and standard image formats (like PNG or JPEG) are widely supported, specific variants can cause issues. For example, a PDF might use encryption, non-embedded fonts, or vector-based content that the tool’s parser can’t handle. Similarly, images saved in less common formats (e.g., HEIC from iPhones) or corrupted during upload (partial file transfer) might fail to process. Check the tool’s documentation for supported formats and verify the file isn’t password-protected or damaged. Tools like file (Linux) or online validators can help diagnose file integrity.
Second, server-side processing constraints could prevent analysis. Large PDFs (e.g., 100+ pages) or high-resolution images might exceed memory/time limits set by the service. For instance, a 50MB image might trigger a timeout if the tool expects smaller inputs. Check if the service has size restrictions in its API documentation or UI. If you’re using an API, inspect HTTP response headers for errors like 413 Payload Too Large or 504 Gateway Timeout. Testing with smaller files (e.g., a 1-page PDF or compressed image) can help isolate this issue.
Third, content extraction failures might occur even with valid files. Scanned PDFs or low-quality images lacking machine-readable text (OCR) will stump tools relying on text extraction. For example, a scanned receipt saved as a PDF without OCR layers appears as an image to the tool, making text analysis impossible. Similarly, images with distorted text, complex layouts, or non-Latin characters might not be parsed correctly. If the tool provides logs or error messages, look for clues like "no text detected" or "unsupported layout." Using OCR preprocessing tools (e.g., Tesseract) or ensuring textual content is embedded in PDFs can mitigate this.
To resolve the issue, start by validating the file’s format and integrity, then check service limitations, and finally verify the content’s machine-readability. If the problem persists, consult the tool’s error logs or support channels with specific details about the file and failure behavior.
