[Xapian-discuss] get_data not fast enough for query matches

James Aylett james-xapian at tartarus.org
Thu Feb 2 17:40:00 GMT 2006

On Thu, Feb 02, 2006 at 04:00:29PM +0000, Salem Berhanu wrote:

> Basically, I want users to be able to search different parts of a document. 
> For instance I want them to be able to search a title that contains the 
> term 'data compression' and in the description 'Rate-Distortion theory'. 

Another way of doing this is to use prefixes when generating the index
terms, eg Xtitle: for words in the title that generate terms.

> This is the main reason I'm using several dbs. In addition I read that it's 
> better to have smaller dbs for better performance. (Maybe it's wrong)

It's not wrong, but it's not necessarily right either. Multiple
databases are more of a hassle to set up, so unless you actually need
to split for speed you're probably making life harder for yourself
than necessary.

> I don't actually run out of space when I grab the data, it just takes a 
> long time. For instance I wrote a small query script to search for a term, 
> let me know how many matches it finds and then loops throught the match 
> getting the data. I search for the word theory in description, within the 
> first 7 seconds it tells me it found 137480 which is good but then it takes 
> 2m15s to grab the data for each match.

Erm - is this grabbing the data for the *entire* result set of 137480?
What sort of search application do you need to process all search
results in one go for? Usually you'd only bother with perhaps a couple
of hundred at most in one go.

I don't have any particularly big databases to hand, so I can't check
whether this is particularly fast or slow for getting that amount of
data out, but perhaps you could confirm first that this is what you're
actually trying to do.


