[Xapian-discuss] xapian's cache

Andrey alpha04 at netvigator.com
Fri Nov 23 22:25:31 GMT 2007


Hi

About the "warming-up" of xapian from the first few queries, in which 
prespective does it cache the data in?
xapian / xapian-binding / filesystem IO?

I don't know if that was the right question to ask, say, i have 2 machines
Machine A)Write Head of xapian, write to local HD (continuous writing 24hrs) 
[Python]
Machine B)Read Head, network mount to A's xapian DB folder [PHP]

I wonder if i want a faster search, which machine's amount of RAM matters 
most?

What happen to the cache when the DB is flush? The cache in memory will gone 
or will incrementally added up?

If i use python to search and cache up, does it benefit to php searches?

notice that the DB keep flushing every 10,000 doc (@5mins), will the search 
preformance better-off if seperated to 2 DBs, and search over them like 
this? will the cache of db1 stays and benefits?
db1 < very large
db2 < only todays document, flush every 5mins 10,000 doc

one more question on Enquire.sort_by_value(), does it use string comparasion 
only? because its relative slow comparing to sort_by_docid().. (my values 
are all numeric timestamps)

ar.. I when i use set_collapse_key ( MD5(title+domain) ) for removing 
duplicated title under a domain, i found it a bit expensive in %.
30M documents with collapse_key : 2-9+ secs
30M documents without collapse_key: 0.01 - 0.9 secs
(my keys are currently 32-byte string)

I will keep testing it after tunned the cache part and database grows..

Big Thanks
Andrey 






More information about the Xapian-discuss mailing list