[Xapian-discuss] making my db leaner and meaner

Ben Campbell ben at scumways.com
Thu Mar 26 16:30:09 GMT 2009


I'm trying to shrink my xapian database in an effort to reduce load on 
the poor server (I think it's just creeping up in size enough now to the 
point where the machine is struggling with it a bit)

My indexing is pretty naive, and I've learnt a lot since I first began.
I suspect there is a lot of fat that could be trimmed...
Here are the improvements I'm planning:

- use a stopword list
I expect this to be a pretty big win, but I'm not yet sure how to pick a 
good set of stopwords (I've posted separately asking about this).

- reduce the number of values I use.
Currently, I'm using 6 values - most of them are only used to store 
things I want to display in my search results. These things I'll move 
into a serialised form in the document data (which is currently unused).
I only ever sort using one value (a datetime), so I'll ditch the other five.

- look at running xapian-compact from time to time
I add about 2000 documents per day (and almost never remove documents).
Not sure how much this would help, but you never know, and it's easy to 
try it out.

Does this all sound sane? Anything obvious I've missed?
I was toying with the idea of ditching the positional information on 
terms, but that would prevent me doing queries like "a walk in the 
park", right?

Any other ideas welcome :-)

Thanks,
Ben.



More information about the Xapian-discuss mailing list