[Xapian-discuss] preformance issue

Andrey Kong alpha04 at netvigator.com
Thu Dec 21 09:48:20 GMT 2006


Hi

I started to have some slow preformance issues after my DB Docs to 450,000, 
it takes 3 - 11 secs for a query now.
I think there are something wrong in my structure...

database total terms count:1,954,698
num of Docs: 470,000 (approx 30-300 terms per doc)

postlist_DB file size: 1.3G
position_DB file size: 2.8G
record_DB file size: 4.6G
termlist_DB file size: 1.1G

i wonder the 1,954,698 terms in my DB is normal or too much garbage?
the contents of the doc are basically stripped tags webpages, in Chinese 
(segmented)
the query is simple OP_OR e.g. (google OR PTITLE:google OR yahoo OR 
PTITLE:yahoo OR msn OR PTITLE:msn)

Dev. Server:
INTEL p4 2.8 HT
2G ram
120G IDE  7200rpm 2M cache raid 1

BTW, since processing each captured webpages contents (pre-index process) is 
very CPU demanding, do u have any suggestion on
some sort of 'parallel computing / computer custerling / GRID' solution from 
your experience in building a search engine?

What is the Cost of searching from more than 1 xapian DB? Is it a good idea 
to break down one DB into 2DBs if option available?
eg.

DB [thread title + thread content]

VS

DB[thread title]
DB[thread content]

(people normally search for thread title + thread content , BUT there is 
option to search for ONLY thread title)

Thanks
Andrey K 






More information about the Xapian-discuss mailing list