[Xapian-discuss] high update-frequency strategy
jan_web at gmx.net
jan_web at gmx.net
Thu Aug 13 08:22:26 BST 2009
Hi Everyone,
I'm evaluating Xapian for the following -hard- use-case:
1) document structure: avg. 100kb full-text, 5x meta-data a 100bytes, 3x
bool. flags
2) big index, i.e. full-text volume ~ 1TB/disk (2x HD, mirrored)
3) low query-frequency (<1/sec)
4) 10 inserts/sec (on a 4core host)
5) *high-update frequency of meta-data* mostly onto the bool. flags:
~20-30/sec
Requirements 3 and 4 are no problem, inserts can be cached and mostly
steered towards bulk disk I/O when the load allows for it.
The question is, if 5) can be achieved. It seems that an
updateMyDoc(myDocId, meta-key, meta-value)
implementation, invariably ends up running some variation of the
following by the (Flint) backend:
docid = query(myDocId)
doc get_document(docid)
// "updating" then maps to:
* replace doc's meta-data in-memory
* delete(mark-deleted ?) old doc in the index
* re-insert the new doc
The last two ops work on the index cache. The bottleneck seems to be the
get_document operation which apparently causes (un-cached**) disk seeks.
**Our RAM/Disk quotient is too small for the OS disk cache to be effective.
Is there any way to make get_document "lazier" i.e. not do lookups in
the persistent index - and do the meta-date replace "dirty" i.e. simply
write the new value in the cache and don't make it persistent until
flush() ?
What are the performance dis-/advantages of modeling meta-data as
prefix-terms vs. document values ?
Did I leave out any important constraints/facts ?
Otherwise: Any help, hints, experiences would be *greatly* appreciated.
Thanks,
--jan
--
<html><head>
<title>DEREFER</title>
<META HTTP-EQUIV="REFRESH" CONTENT="0; URL=http://www.gmx.net/de/?status=hinweis">
</head>
<body bgcolor="#ffffff" link="#666666" vlink="#666666">
<table width="100%" height="100%" border="0"><tr><td align="center"><a href="http://www.gmx.net/de/?status=hinweis"><font face="Arial, Helvetica, sans-serif" size="2" color="#666666">Einen Moment bitte, die angeforderte Seite wird geladen...</font></a></td></tr></table>
</body></html>
Neu: GMX Doppel-FLAT mit Internet-Flatrate + Telefon-Flatrate
für nur 19,99 Euro/mtl.!* http://portal.gmx.net/de/go/dsl02
More information about the Xapian-discuss
mailing list