{"exhaustive":{"nbHits":false,"typo":false},"exhaustiveNbHits":false,"exhaustiveTypo":false,"hits":[{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"yingfeng"},"story_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["infiniflow"],"value":"Hybrid search of bm25,dense vector and sparse vector, and tensor based reranking(ColBERT).<p>Technology details:\n<a href=\"https://medium.com/@infiniflowai/dense-vector-sparse-vector-full-text-search-tensor-reranker-best-retrieval-for-rag-9c86b02a55ef\" rel=\"nofollow\">https://medium.com/@<em>infiniflow</em>ai/dense-vector-sparse-vector-...</a>"},"title":{"matchLevel":"none","matchedWords":[],"value":"Show HN: Infinity \u2013 Incredibly fast database for RAG with powerful hybrid search"},"url":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["infiniflow"],"value":"https://github.com/<em>infiniflow</em>/infinity"}},"_tags":["story","author_yingfeng","story_40986523","show_hn"],"author":"yingfeng","created_at":"2024-07-17T14:48:30Z","created_at_i":1721227710,"num_comments":0,"objectID":"40986523","points":2,"story_id":40986523,"story_text":"Hybrid search of bm25,dense vector and sparse vector, and tensor based reranking(ColBERT).<p>Technology details:\n<a href=\"https:&#x2F;&#x2F;medium.com&#x2F;@infiniflowai&#x2F;dense-vector-sparse-vector-full-text-search-tensor-reranker-best-retrieval-for-rag-9c86b02a55ef\" rel=\"nofollow\">https:&#x2F;&#x2F;medium.com&#x2F;@infiniflowai&#x2F;dense-vector-sparse-vector-...</a>","title":"Show HN: Infinity \u2013 Incredibly fast database for RAG with powerful hybrid search","updated_at":"2024-09-20T17:30:59Z","url":"https://github.com/infiniflow/infinity"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"demilich"},"comment_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["infiniflow"],"value":"<a href=\"https://github.com/infiniflow/infinity\">https://github.com/<em>infiniflow</em>/infinity</a>, dense vector + sparse vector + fulltext search(BM25) + late interact reranker(Colbert)"},"story_title":{"matchLevel":"none","matchedWords":[],"value":"Full text search over Postgres: Elasticsearch vs. alternatives"},"story_url":{"matchLevel":"none","matchedWords":[],"value":"https://blog.paradedb.com/pages/elasticsearch_vs_postgres"}},"_tags":["comment","author_demilich","story_41173288"],"author":"demilich","comment_text":"<a href=\"https:&#x2F;&#x2F;github.com&#x2F;infiniflow&#x2F;infinity\">https:&#x2F;&#x2F;github.com&#x2F;infiniflow&#x2F;infinity</a>, dense vector + sparse vector + fulltext search(BM25) + late interact reranker(Colbert)","created_at":"2024-08-07T13:31:01Z","created_at_i":1723037461,"objectID":"41181277","parent_id":41173288,"story_id":41173288,"story_title":"Full text search over Postgres: Elasticsearch vs. alternatives","story_url":"https://blog.paradedb.com/pages/elasticsearch_vs_postgres","updated_at":"2024-09-20T17:37:25Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"yingfeng"},"comment_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["infiniflow"],"value":"paradedb could also deliver three-way hybrid search through pg_vector, pg_sparse and pg_search. Compared with paradedb, infinity has following advantages:<p>1. Performance<p>The performance of pg_vector is far slower than vector search of Infinity due to the vector index design.\nThe performance of pg_sparse is also slower than sparse vector search of infinity.\nThe performance of pg_search is much slower than full text search of infinity. pg_search is based on Tantivy, which is much slower than the inverted index of infinity.<p>Detailed benchmark could be seen in this article : <a href=\"https://infiniflow.org/blog/fastest-hybrid-search\" rel=\"nofollow\">https://<em>infiniflow</em>.org/blog/fastest-hybrid-search</a> or github repo.<p>2. Infinity has all the builtin implementation of the above three search approaches. These indices could work smoothly together with the executor of infinity. The users could use any combination of the search approaches, together with the fused ranking algorithms, in a very efficient approach.<p>3. Infinity has also builtin support for tensor, which makes it possible to deliver an in-database colbert reranker compared with the cross encoder based reranker outside. The colbert reranker could bring much benefits for search qualities.<p>4. Infinity is much easier to use, it could be deployed as either a standalone server, or as an embedded python library just through pip install.<p>5. Infinity is designed start from scratch, it does not have the burden of postgresql, and is evolving fast. It will run on cloud in very near future which could save the cost a lot."},"story_title":{"matchLevel":"none","matchedWords":[],"value":"Dense Vector and Sparse Vector and Fulltext and Tensor Reranker = Best for RAG?"},"story_url":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["infiniflow"],"value":"https://<em>infiniflow</em>.org/blog/best-hybrid-search-solution"}},"_tags":["comment","author_yingfeng","story_40981886"],"author":"yingfeng","children":[40985016],"comment_text":"paradedb could also deliver three-way hybrid search through pg_vector, pg_sparse and pg_search. Compared with paradedb, infinity has following advantages:<p>1. Performance<p>The performance of pg_vector is far slower than vector search of Infinity due to the vector index design.\nThe performance of pg_sparse is also slower than sparse vector search of infinity.\nThe performance of pg_search is much slower than full text search of infinity. pg_search is based on Tantivy, which is much slower than the inverted index of infinity.<p>Detailed benchmark could be seen in this article : <a href=\"https:&#x2F;&#x2F;infiniflow.org&#x2F;blog&#x2F;fastest-hybrid-search\" rel=\"nofollow\">https:&#x2F;&#x2F;infiniflow.org&#x2F;blog&#x2F;fastest-hybrid-search</a> or github repo.<p>2. Infinity has all the builtin implementation of the above three search approaches. These indices could work smoothly together with the executor of infinity. The users could use any combination of the search approaches, together with the fused ranking algorithms, in a very efficient approach.<p>3. Infinity has also builtin support for tensor, which makes it possible to deliver an in-database colbert reranker compared with the cross encoder based reranker outside. The colbert reranker could bring much benefits for search qualities.<p>4. Infinity is much easier to use, it could be deployed as either a standalone server, or as an embedded python library just through pip install.<p>5. Infinity is designed start from scratch, it does not have the burden of postgresql, and is evolving fast. It will run on cloud in very near future which could save the cost a lot.","created_at":"2024-07-17T04:46:13Z","created_at_i":1721191573,"objectID":"40982630","parent_id":40982600,"story_id":40981886,"story_title":"Dense Vector and Sparse Vector and Fulltext and Tensor Reranker = Best for RAG?","story_url":"https://infiniflow.org/blog/best-hybrid-search-solution","updated_at":"2024-10-04T01:47:20Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"yingfeng"},"comment_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["infiniflow"],"value":"From 0.8, RAGFlow(<a href=\"https://github.com/infiniflow/ragflow\">https://github.com/<em>infiniflow</em>/ragflow</a>) will provide no code workflow orchestration. This article describes what kind of graph orchestration engine is needed, and how it can be used to implement Agentic RAG."},"story_title":{"matchLevel":"none","matchedWords":[],"value":"Agentic RAG: Definition and Low-Code Implementation"},"story_url":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["infiniflow"],"value":"https://medium.com/@<em>infiniflow</em>ai/agentic-rag-definition-and-low-code-implementation-d0744815029c"}},"_tags":["comment","author_yingfeng","story_40727327"],"author":"yingfeng","comment_text":"From 0.8, RAGFlow(<a href=\"https:&#x2F;&#x2F;github.com&#x2F;infiniflow&#x2F;ragflow\">https:&#x2F;&#x2F;github.com&#x2F;infiniflow&#x2F;ragflow</a>) will provide no code workflow orchestration. This article describes what kind of graph orchestration engine is needed, and how it can be used to implement Agentic RAG.","created_at":"2024-06-19T11:55:51Z","created_at_i":1718798151,"objectID":"40727395","parent_id":40727327,"story_id":40727327,"story_title":"Agentic RAG: Definition and Low-Code Implementation","story_url":"https://medium.com/@infiniflowai/agentic-rag-definition-and-low-code-implementation-d0744815029c","updated_at":"2024-09-20T17:14:17Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"yingfeng"},"comment_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["infiniflow"],"value":"RRF is a simple and effective means of fused ranking for multiple recall.\nWithin our open source RAG product RAGFlow(<a href=\"https://github.com/infiniflow/ragflow\">https://github.com/<em>infiniflow</em>/ragflow</a>), Elasticsearch is currently used instead of other general vector databases, because it can provide hybrid search right now. Under the default cases, embedding based reranker is not required, just RRF is enough, while even if reranker is used, keywords based retrieval is also a MUST to be hybridized with embedding based retrieval, that's just what RAGFlow's latest 0.7 release has provided.<p>On the other hand let me introduce another database we developed, Infinity(<a href=\"https://github.com/infiniflow/infinity\">https://github.com/<em>infiniflow</em>/infinity</a>), which can provide the hybrid search, you can see the performance here(<a href=\"https://github.com/infiniflow/infinity/blob/main/docs/references/benchmark.md\">https://github.com/<em>infiniflow</em>/infinity/blob/main/docs/refere...</a>), both vector search and full-text search could perform much faster than other open source alternatives.<p>From the next version(weeks later), Infinity will also provide more comprehensive hybrid search capabilities, what you have mentioned the 3-way recalls(dense vector, sparse vector, keyword search) could be provided within single request."},"story_title":{"matchLevel":"none","matchedWords":[],"value":"Better RAG Results with Reciprocal Rank Fusion and Hybrid Search"},"story_url":{"matchLevel":"none","matchedWords":[],"value":"https://www.assembled.com/blog/better-rag-results-with-reciprocal-rank-fusion-and-hybrid-search"}},"_tags":["comment","author_yingfeng","story_40524759"],"author":"yingfeng","children":[40532447],"comment_text":"RRF is a simple and effective means of fused ranking for multiple recall.\nWithin our open source RAG product RAGFlow(<a href=\"https:&#x2F;&#x2F;github.com&#x2F;infiniflow&#x2F;ragflow\">https:&#x2F;&#x2F;github.com&#x2F;infiniflow&#x2F;ragflow</a>), Elasticsearch is currently used instead of other general vector databases, because it can provide hybrid search right now. Under the default cases, embedding based reranker is not required, just RRF is enough, while even if reranker is used, keywords based retrieval is also a MUST to be hybridized with embedding based retrieval, that&#x27;s just what RAGFlow&#x27;s latest 0.7 release has provided.<p>On the other hand let me introduce another database we developed, Infinity(<a href=\"https:&#x2F;&#x2F;github.com&#x2F;infiniflow&#x2F;infinity\">https:&#x2F;&#x2F;github.com&#x2F;infiniflow&#x2F;infinity</a>), which can provide the hybrid search, you can see the performance here(<a href=\"https:&#x2F;&#x2F;github.com&#x2F;infiniflow&#x2F;infinity&#x2F;blob&#x2F;main&#x2F;docs&#x2F;references&#x2F;benchmark.md\">https:&#x2F;&#x2F;github.com&#x2F;infiniflow&#x2F;infinity&#x2F;blob&#x2F;main&#x2F;docs&#x2F;refere...</a>), both vector search and full-text search could perform much faster than other open source alternatives.<p>From the next version(weeks later), Infinity will also provide more comprehensive hybrid search capabilities, what you have mentioned the 3-way recalls(dense vector, sparse vector, keyword search) could be provided within single request.","created_at":"2024-05-31T02:09:12Z","created_at_i":1717121352,"objectID":"40530785","parent_id":40524759,"story_id":40524759,"story_title":"Better RAG Results with Reciprocal Rank Fusion and Hybrid Search","story_url":"https://www.assembled.com/blog/better-rag-results-with-reciprocal-rank-fusion-and-hybrid-search","updated_at":"2024-09-20T17:10:15Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"vissidarte_choi"},"comment_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["infiniflow"],"value":"There are numerous strategies and methods available to enhance RAG performance, particularly when it comes to improving performance in parsing vast amounts of unstructured data. Additionally, various scenarios call for different parsing techniques. I would suggest exploring a RAG project that excels in document parsing: <a href=\"https://github.com/infiniflow/ragflow\">https://github.com/<em>infiniflow</em>/ragflow</a>"},"story_title":{"matchLevel":"none","matchedWords":[],"value":"Ask HN: RAG and unstructured data from several docs"}},"_tags":["comment","author_vissidarte_choi","story_40522529"],"author":"vissidarte_choi","comment_text":"There are numerous strategies and methods available to enhance RAG performance, particularly when it comes to improving performance in parsing vast amounts of unstructured data. Additionally, various scenarios call for different parsing techniques. I would suggest exploring a RAG project that excels in document parsing: <a href=\"https:&#x2F;&#x2F;github.com&#x2F;infiniflow&#x2F;ragflow\">https:&#x2F;&#x2F;github.com&#x2F;infiniflow&#x2F;ragflow</a>","created_at":"2024-05-30T11:53:30Z","created_at_i":1717070010,"objectID":"40522626","parent_id":40522529,"story_id":40522529,"story_title":"Ask HN: RAG and unstructured data from several docs","updated_at":"2024-09-20T17:09:26Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"demilich"},"comment_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["infiniflow"],"value":"Check <a href=\"https://github.com/infiniflow/infinity\">https://github.com/<em>infiniflow</em>/infinity</a> which combines vector search and full-text search providing extremely fast search performance."},"story_title":{"matchLevel":"none","matchedWords":[],"value":"Tantivy \u2013 full-text search engine library inspired by Apache Lucene"},"story_url":{"matchLevel":"none","matchedWords":[],"value":"https://github.com/quickwit-oss/tantivy"}},"_tags":["comment","author_demilich","story_40492834"],"author":"demilich","children":[40497138],"comment_text":"Check <a href=\"https:&#x2F;&#x2F;github.com&#x2F;infiniflow&#x2F;infinity\">https:&#x2F;&#x2F;github.com&#x2F;infiniflow&#x2F;infinity</a> which combines vector search and full-text search providing extremely fast search performance.","created_at":"2024-05-28T02:39:21Z","created_at_i":1716863961,"objectID":"40496851","parent_id":40494383,"story_id":40492834,"story_title":"Tantivy \u2013 full-text search engine library inspired by Apache Lucene","story_url":"https://github.com/quickwit-oss/tantivy","updated_at":"2024-09-20T17:09:37Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"demilich"},"comment_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["infiniflow"],"value":"Try RAPTOR: <a href=\"https://arxiv.org/html/2401.18059v1\" rel=\"nofollow\">https://arxiv.org/html/2401.18059v1</a><p>An implementation: github.com/<em>infiniflow</em>/ragflow"},"story_title":{"matchLevel":"none","matchedWords":[],"value":"Systematically Improving Your RAG"},"story_url":{"matchLevel":"none","matchedWords":[],"value":"https://jxnl.co/writing/2024/05/22/systematically-improving-your-rag/"}},"_tags":["comment","author_demilich","story_40443558"],"author":"demilich","comment_text":"Try RAPTOR: <a href=\"https:&#x2F;&#x2F;arxiv.org&#x2F;html&#x2F;2401.18059v1\" rel=\"nofollow\">https:&#x2F;&#x2F;arxiv.org&#x2F;html&#x2F;2401.18059v1</a><p>An implementation: github.com&#x2F;infiniflow&#x2F;ragflow","created_at":"2024-05-24T07:54:51Z","created_at_i":1716537291,"objectID":"40463970","parent_id":40443558,"story_id":40443558,"story_title":"Systematically Improving Your RAG","story_url":"https://jxnl.co/writing/2024/05/22/systematically-improving-your-rag/","updated_at":"2024-09-20T17:06:10Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"demilich"},"comment_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["infiniflow"],"value":"RAGFlow (github.com/<em>infiniflow</em>/ragflow) use OCR/layout recognition/TSR(table structure recognition) to understand the document structure and context. Is there any difference between RAGFlow and ZenDB?"},"story_title":{"matchLevel":"none","matchedWords":[],"value":"Towards accurate and efficient document analytics with large language models"},"story_url":{"matchLevel":"none","matchedWords":[],"value":"https://arxiv.org/abs/2405.04674"}},"_tags":["comment","author_demilich","story_40349145"],"author":"demilich","children":[40351602],"comment_text":"RAGFlow (github.com&#x2F;infiniflow&#x2F;ragflow) use OCR&#x2F;layout recognition&#x2F;TSR(table structure recognition) to understand the document structure and context. Is there any difference between RAGFlow and ZenDB?","created_at":"2024-05-14T04:04:20Z","created_at_i":1715659460,"objectID":"40351510","parent_id":40349145,"story_id":40349145,"story_title":"Towards accurate and efficient document analytics with large language models","story_url":"https://arxiv.org/abs/2405.04674","updated_at":"2024-09-20T17:03:37Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"demilich"},"comment_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["infiniflow"],"value":"Use C++20 modules, take a look at this project: <a href=\"https://github.com/infiniflow/infinity\">https://github.com/<em>infiniflow</em>/infinity</a>"},"story_title":{"matchLevel":"none","matchedWords":[],"value":"Speeding up C++ build times"},"story_url":{"matchLevel":"none","matchedWords":[],"value":"https://www.figma.com/blog/speeding-up-build-times/"}},"_tags":["comment","author_demilich","story_40178634"],"author":"demilich","comment_text":"Use C++20 modules, take a look at this project: <a href=\"https:&#x2F;&#x2F;github.com&#x2F;infiniflow&#x2F;infinity\">https:&#x2F;&#x2F;github.com&#x2F;infiniflow&#x2F;infinity</a>","created_at":"2024-04-29T13:05:53Z","created_at_i":1714395953,"objectID":"40197819","parent_id":40178634,"story_id":40178634,"story_title":"Speeding up C++ build times","story_url":"https://www.figma.com/blog/speeding-up-build-times/","updated_at":"2024-09-20T16:57:51Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"_akhe"},"comment_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["infiniflow"],"value":"Just link them to <a href=\"https://github.com/infiniflow/ragflow/blob/main/rag/llm/chat_model.py#L132\">https://github.com/<em>infiniflow</em>/ragflow/blob/main/rag/llm/chat...</a> :)"},"story_title":{"matchLevel":"none","matchedWords":[],"value":"RAGFlow is an open-source RAG engine based on OCR and document parsing"},"story_url":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["infiniflow"],"value":"https://github.com/<em>infiniflow</em>/ragflow"}},"_tags":["comment","author__akhe","story_39896923"],"author":"_akhe","comment_text":"Just link them to <a href=\"https:&#x2F;&#x2F;github.com&#x2F;infiniflow&#x2F;ragflow&#x2F;blob&#x2F;main&#x2F;rag&#x2F;llm&#x2F;chat_model.py#L132\">https:&#x2F;&#x2F;github.com&#x2F;infiniflow&#x2F;ragflow&#x2F;blob&#x2F;main&#x2F;rag&#x2F;llm&#x2F;chat...</a> :)","created_at":"2024-04-03T19:49:41Z","created_at_i":1712173781,"objectID":"39922180","parent_id":39902443,"story_id":39896923,"story_title":"RAGFlow is an open-source RAG engine based on OCR and document parsing","story_url":"https://github.com/infiniflow/ragflow","updated_at":"2024-09-20T16:46:42Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"_akhe"},"comment_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["infiniflow"],"value":"<a href=\"https://github.com/infiniflow/ragflow/blob/main/rag/llm/chat_model.py#L132\">https://github.com/<em>infiniflow</em>/ragflow/blob/main/rag/llm/chat...</a>"},"story_title":{"matchLevel":"none","matchedWords":[],"value":"RAGFlow is an open-source RAG engine based on OCR and document parsing"},"story_url":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["infiniflow"],"value":"https://github.com/<em>infiniflow</em>/ragflow"}},"_tags":["comment","author__akhe","story_39896923"],"author":"_akhe","comment_text":"<a href=\"https:&#x2F;&#x2F;github.com&#x2F;infiniflow&#x2F;ragflow&#x2F;blob&#x2F;main&#x2F;rag&#x2F;llm&#x2F;chat_model.py#L132\">https:&#x2F;&#x2F;github.com&#x2F;infiniflow&#x2F;ragflow&#x2F;blob&#x2F;main&#x2F;rag&#x2F;llm&#x2F;chat...</a>","created_at":"2024-04-03T19:47:59Z","created_at_i":1712173679,"objectID":"39922164","parent_id":39906414,"story_id":39896923,"story_title":"RAGFlow is an open-source RAG engine based on OCR and document parsing","story_url":"https://github.com/infiniflow/ragflow","updated_at":"2024-09-20T16:46:42Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"mpeg"},"comment_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["infiniflow"],"value":"it's <a href=\"https://huggingface.co/InfiniFlow/deepdoc\" rel=\"nofollow\">https://huggingface.co/<em>InfiniFlow</em>/deepdoc</a> and the code for usage is in <a href=\"https://github.com/infiniflow/ragflow/blob/main/deepdoc/README.md\">https://github.com/<em>infiniflow</em>/ragflow/blob/main/deepdoc/READ...</a> \u2013 it took me a bit of trial and error to get it working<p>It seems to be a YOLOv8 fine-tune, I only did a couple tests but results were decent. Another model that is supposed to be fine tuned for borderless is <a href=\"https://huggingface.co/keremberke/yolov8m-table-extraction\" rel=\"nofollow\">https://huggingface.co/keremberke/yolov8m-table-extraction</a> but I haven't had great results myself with it, but maybe worth a try for you."},"story_title":{"matchLevel":"none","matchedWords":[],"value":"RAGFlow is an open-source RAG engine based on OCR and document parsing"},"story_url":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["infiniflow"],"value":"https://github.com/<em>infiniflow</em>/ragflow"}},"_tags":["comment","author_mpeg","story_39896923"],"author":"mpeg","children":[39901507],"comment_text":"it&#x27;s <a href=\"https:&#x2F;&#x2F;huggingface.co&#x2F;InfiniFlow&#x2F;deepdoc\" rel=\"nofollow\">https:&#x2F;&#x2F;huggingface.co&#x2F;InfiniFlow&#x2F;deepdoc</a> and the code for usage is in <a href=\"https:&#x2F;&#x2F;github.com&#x2F;infiniflow&#x2F;ragflow&#x2F;blob&#x2F;main&#x2F;deepdoc&#x2F;README.md\">https:&#x2F;&#x2F;github.com&#x2F;infiniflow&#x2F;ragflow&#x2F;blob&#x2F;main&#x2F;deepdoc&#x2F;READ...</a> \u2013 it took me a bit of trial and error to get it working<p>It seems to be a YOLOv8 fine-tune, I only did a couple tests but results were decent. Another model that is supposed to be fine tuned for borderless is <a href=\"https:&#x2F;&#x2F;huggingface.co&#x2F;keremberke&#x2F;yolov8m-table-extraction\" rel=\"nofollow\">https:&#x2F;&#x2F;huggingface.co&#x2F;keremberke&#x2F;yolov8m-table-extraction</a> but I haven&#x27;t had great results myself with it, but maybe worth a try for you.","created_at":"2024-04-02T00:23:39Z","created_at_i":1712017419,"objectID":"39901031","parent_id":39900759,"story_id":39896923,"story_title":"RAGFlow is an open-source RAG engine based on OCR and document parsing","story_url":"https://github.com/infiniflow/ragflow","updated_at":"2024-09-20T16:43:28Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"rosspackard"},"comment_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["infiniflow"],"value":"Looks like they do but aren't really documented yet:<p><a href=\"https://github.com/infiniflow/ragflow/pull/119\">https://github.com/<em>infiniflow</em>/ragflow/pull/119</a>"},"story_title":{"matchLevel":"none","matchedWords":[],"value":"RAGFlow is an open-source RAG engine based on OCR and document parsing"},"story_url":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["infiniflow"],"value":"https://github.com/<em>infiniflow</em>/ragflow"}},"_tags":["comment","author_rosspackard","story_39896923"],"author":"rosspackard","comment_text":"Looks like they do but aren&#x27;t really documented yet:<p><a href=\"https:&#x2F;&#x2F;github.com&#x2F;infiniflow&#x2F;ragflow&#x2F;pull&#x2F;119\">https:&#x2F;&#x2F;github.com&#x2F;infiniflow&#x2F;ragflow&#x2F;pull&#x2F;119</a>","created_at":"2024-04-01T23:08:31Z","created_at_i":1712012911,"objectID":"39900457","parent_id":39898012,"story_id":39896923,"story_title":"RAGFlow is an open-source RAG engine based on OCR and document parsing","story_url":"https://github.com/infiniflow/ragflow","updated_at":"2024-09-20T16:43:14Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"rosspackard"},"comment_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["infiniflow"],"value":"Looks like they do but aren't really documented yet:<p><a href=\"https://github.com/infiniflow/ragflow/pull/119\">https://github.com/<em>infiniflow</em>/ragflow/pull/119</a>"},"story_title":{"matchLevel":"none","matchedWords":[],"value":"RAGFlow is an open-source RAG engine based on OCR and document parsing"},"story_url":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["infiniflow"],"value":"https://github.com/<em>infiniflow</em>/ragflow"}},"_tags":["comment","author_rosspackard","story_39896923"],"author":"rosspackard","comment_text":"Looks like they do but aren&#x27;t really documented yet:<p><a href=\"https:&#x2F;&#x2F;github.com&#x2F;infiniflow&#x2F;ragflow&#x2F;pull&#x2F;119\">https:&#x2F;&#x2F;github.com&#x2F;infiniflow&#x2F;ragflow&#x2F;pull&#x2F;119</a>","created_at":"2024-04-01T23:08:14Z","created_at_i":1712012894,"objectID":"39900451","parent_id":39900324,"story_id":39896923,"story_title":"RAGFlow is an open-source RAG engine based on OCR and document parsing","story_url":"https://github.com/infiniflow/ragflow","updated_at":"2024-09-20T16:43:14Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"esafak"},"comment_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["infiniflow"],"value":"Apparently &quot;deep document understanding&quot; refers to OCR and structured document parsing: <a href=\"https://github.com/infiniflow/ragflow/blob/main/deepdoc/README.md\">https://github.com/<em>infiniflow</em>/ragflow/blob/main/deepdoc/READ...</a><p>Since &quot;deep document understanding&quot; is not a term of art, I would have just said &quot;OCR and document parsing&quot;.<p>How well does it work? Please include benchmarks. You may be interested in<p><a href=\"https://paperswithcode.com/sota/optical-character-recognition-on-benchmarking\" rel=\"nofollow\">https://paperswithcode.com/sota/optical-character-recognitio...</a><p><a href=\"https://paperswithcode.com/task/document-layout-analysis\" rel=\"nofollow\">https://paperswithcode.com/task/document-layout-analysis</a><p>The models seem to be closed source, hosted here: <a href=\"https://huggingface.co/InfiniFlow/deepdoc\" rel=\"nofollow\">https://huggingface.co/<em>InfiniFlow</em>/deepdoc</a>"},"story_title":{"matchLevel":"none","matchedWords":[],"value":"RAGFlow is an open-source RAG engine based on OCR and document parsing"},"story_url":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["infiniflow"],"value":"https://github.com/<em>infiniflow</em>/ragflow"}},"_tags":["comment","author_esafak","story_39896923"],"author":"esafak","children":[39898051,39900261,39900480,39902470,39902662],"comment_text":"Apparently &quot;deep document understanding&quot; refers to OCR and structured document parsing: <a href=\"https:&#x2F;&#x2F;github.com&#x2F;infiniflow&#x2F;ragflow&#x2F;blob&#x2F;main&#x2F;deepdoc&#x2F;README.md\">https:&#x2F;&#x2F;github.com&#x2F;infiniflow&#x2F;ragflow&#x2F;blob&#x2F;main&#x2F;deepdoc&#x2F;READ...</a><p>Since &quot;deep document understanding&quot; is not a term of art, I would have just said &quot;OCR and document parsing&quot;.<p>How well does it work? Please include benchmarks. You may be interested in<p><a href=\"https:&#x2F;&#x2F;paperswithcode.com&#x2F;sota&#x2F;optical-character-recognition-on-benchmarking\" rel=\"nofollow\">https:&#x2F;&#x2F;paperswithcode.com&#x2F;sota&#x2F;optical-character-recognitio...</a><p><a href=\"https:&#x2F;&#x2F;paperswithcode.com&#x2F;task&#x2F;document-layout-analysis\" rel=\"nofollow\">https:&#x2F;&#x2F;paperswithcode.com&#x2F;task&#x2F;document-layout-analysis</a><p>The models seem to be closed source, hosted here: <a href=\"https:&#x2F;&#x2F;huggingface.co&#x2F;InfiniFlow&#x2F;deepdoc\" rel=\"nofollow\">https:&#x2F;&#x2F;huggingface.co&#x2F;InfiniFlow&#x2F;deepdoc</a>","created_at":"2024-04-01T19:14:19Z","created_at_i":1711998859,"objectID":"39897959","parent_id":39896923,"story_id":39896923,"story_title":"RAGFlow is an open-source RAG engine based on OCR and document parsing","story_url":"https://github.com/infiniflow/ragflow","updated_at":"2024-09-20T16:47:06Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"thm"},"title":{"matchLevel":"none","matchedWords":[],"value":"RAGFlow is an open-source RAG engine based on OCR and document parsing"},"url":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["infiniflow"],"value":"https://github.com/<em>infiniflow</em>/ragflow"}},"_tags":["story","author_thm","story_39896923"],"author":"thm","children":[39897959,39898012,39899351,39900324,39900338,39900598,39900736,39901356,39901380,39901514,39901877,39903108,39903304,39903439,39906512],"created_at":"2024-04-01T17:50:54Z","created_at_i":1711993854,"num_comments":53,"objectID":"39896923","points":230,"story_id":39896923,"title":"RAGFlow is an open-source RAG engine based on OCR and document parsing","updated_at":"2025-10-21T16:12:10Z","url":"https://github.com/infiniflow/ragflow"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"vissidarte_choi"},"title":{"matchLevel":"none","matchedWords":[],"value":"Dense Vector and Sparse Vector and Fulltext and Tensor Reranker = Best for RAG?"},"url":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["infiniflow"],"value":"https://<em>infiniflow</em>.org/blog/best-hybrid-search-solution"}},"_tags":["story","author_vissidarte_choi","story_40981886"],"author":"vissidarte_choi","children":[40981941,40981951,40982503,40982600,40982602],"created_at":"2024-07-17T01:40:53Z","created_at_i":1721180453,"num_comments":12,"objectID":"40981886","points":7,"story_id":40981886,"title":"Dense Vector and Sparse Vector and Fulltext and Tensor Reranker = Best for RAG?","updated_at":"2025-07-15T11:10:40Z","url":"https://infiniflow.org/blog/best-hybrid-search-solution"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"vissidarte_choi"},"title":{"matchLevel":"none","matchedWords":[],"value":"All you need to know about RAG: Past, Present, and Future"},"url":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["infiniflow"],"value":"https://medium.com/@<em>infiniflow</em>ai/all-you-need-to-know-about-rag-past-present-and-future-e3dd8c7ae641"}},"_tags":["story","author_vissidarte_choi","story_40362411"],"author":"vissidarte_choi","created_at":"2024-05-15T02:16:44Z","created_at_i":1715739404,"num_comments":0,"objectID":"40362411","points":7,"story_id":40362411,"title":"All you need to know about RAG: Past, Present, and Future","updated_at":"2024-09-20T17:04:43Z","url":"https://medium.com/@infiniflowai/all-you-need-to-know-about-rag-past-present-and-future-e3dd8c7ae641"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"vissidarte_choi"},"title":{"matchLevel":"none","matchedWords":[],"value":"DeepSeek-V2 integrated, RAGFlow v0.5.0 is released"},"url":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["infiniflow"],"value":"https://github.com/<em>infiniflow</em>/ragflow/releases/tag/v0.5.0"}},"_tags":["story","author_vissidarte_choi","story_40294775"],"author":"vissidarte_choi","created_at":"2024-05-08T05:54:31Z","created_at_i":1715147671,"num_comments":0,"objectID":"40294775","points":7,"story_id":40294775,"title":"DeepSeek-V2 integrated, RAGFlow v0.5.0 is released","updated_at":"2024-09-20T17:05:21Z","url":"https://github.com/infiniflow/ragflow/releases/tag/v0.5.0"}],"hitsPerPage":20,"nbHits":33,"nbPages":2,"page":0,"params":"query=infiniflow&advancedSyntax=true&analyticsTags=backend","processingTimeMS":20,"processingTimingsMS":{"_request":{"roundTrip":16},"afterFetch":{"merge":{"mergeLoop":{"prepareNextHit":15,"total":15},"total":15},"total":15},"fetch":{"query":3,"total":4},"total":20},"query":"infiniflow","serverTimeMS":22}
