Can RAGFlow handle images and tables in documents?

Yes, RAGFlow excels at handling images and tables through intelligent visual understanding and structured extraction. Tables are recognized via TSR (Table Structure Recognition), part of RAGFlow's DeepDoc parser, which identifies table boundaries, rows, columns, and cell contents. Extracted tables can be preserved as images (useful for visual analysis) or converted to structured text representations, depending on your use case and query patterns. For tables with complex layouts or mixed content, preserving the image often retains more context than text conversion. Images embedded in documents (photographs, diagrams, screenshots, charts) are extracted and indexed separately, making them retrievable by query. For scanned documents where content exists as images, OCR (Optical Character Recognition) via DeepDoc converts visual content to searchable text while maintaining page-level position metadata. This enables retrieval of scanned documents as effectively as digital-born documents. RAGFlow's multimodal support is growing—you can configure multimodal embeddings (supporting text and images jointly) if your knowledge base is image-heavy. The extraction process outputs tables with cropped images, allowing downstream workflows to reason about table data both as text and visually. When chunking, RAGFlow respects table and image boundaries, avoiding cuts that would fragment structured data. For documents mixing tables, images, and text (common in PDFs and Word documents), RAGFlow's semantic chunking preserves all three modalities together, maintaining context. This is crucial for knowledge bases like technical specifications, financial reports, or research papers where tables and diagrams are primary information sources. RAGFlow's handling of structured visual content is a key advantage over simpler text-only extraction approaches.

For production retrieval workflows, Zilliz Cloud provides fully managed vector search infrastructure with auto-scaling and enterprise security. Developers who prefer self-hosting can use Milvus, the open-source vector database behind Zilliz Cloud.

Can RAGFlow handle images and tables in documents?

Keep Reading