[Xapian-discuss] high update-frequency strategy
Jan
jan at griebsch.net
Thu Aug 13 08:18:40 BST 2009
Hi Everyone,
I'm evaluating Xapian for the following -hard- use-case:
1) document structure: avg. 100kb full-text, 5x meta-data a 100bytes, 3x
bool. flags
2) big index, i.e. full-text volume ~ 1TB/disk (2x HD, mirrored)
3) low query-frequency (<1/sec)
4) 10 inserts/sec (on a 4core host)
5) *high-update frequency of meta-data* mostly onto the bool. flags:
~20-30/sec
Requirements 3 and 4 are no problem, inserts can be cached and mostly
steered towards bulk disk I/O when the load allows for it.
The question is, if 5) can be achieved. It seems that an
updateMyDoc(myDocId, meta-key, meta-value)
implementation, invariably ends up running some variation of the
following by the (Flint) backend:
docid = query(myDocId)
doc get_document(docid)
// "updating" then maps to:
* replace doc's meta-data in-memory
* delete(mark-deleted ?) old doc in the index
* re-insert the new doc
The last two ops work on the index cache. The bottleneck seems to be the
get_document operation which apparently causes (un-cached**) disk seeks.
**Our RAM/Disk quotient is too small for the OS disk cache to be effective.
Is there any way to make get_document "lazier" i.e. not do lookups in
the persistent index - and do the meta-date replace "dirty" i.e. simply
write the new value in the cache and don't make it persistent until
flush() ?
What are the performance dis-/advantages of modeling meta-data as
prefix-terms vs. document values ?
Did I leave out any important constraints/facts ?
Otherwise: Any help, hints, experiences would be *greatly* appreciated.
Thanks,
--jan
More information about the Xapian-discuss
mailing list