[Xapian-discuss] storing documents in Xapian vs. external store (when other indexes are needed)

Marinos Yannikos mjy at geizhals.at
Mon Jan 19 12:46:58 GMT 2009


Hello,

for a set of documents that are indexed with Xapian for fast search and
also with external (hash/B-Tree etc., like tokyocabinet) indexes for fast
access by value, is it a good idea to store the whole document in Xapian's
DB and fetch it by Xapian's doc_id after searching in the external index,
or the other way round, i.e. store the document somewhere else and use
some external oid as the Xapian "document"?

In other words/short version: is Xapian/Flint good for storing documents
even if they are often fetched by doc_id?

I can think of the following advantages/disadvantages for storing
documents in Flint:

+ faster retrieval by doc_id and by query since no external index
operation is needed
- possibly slower retrieval by some other indexed value if fetching from
Flint by doc_id is slower than the external storage solution (tokyocabinet
etc.)
- bigger DB, perhaps slower access
- document changes are probably slower even if the indexed text is not
changed

Any opinions/suggestions? Am I on the wrong track for storing documents
with several indexed values + fast text search? (I know that the problem
fits an RDBMS well, but Xapian is so much faster)

Regards,
 Marinos



More information about the Xapian-discuss mailing list