[Xapian-discuss] get_data not fast enough for query matches

James Aylett james-xapian at tartarus.org
Thu Feb 2 17:40:00 GMT 2006

On Thu, Feb 02, 2006 at 04:00:29PM +0000, Salem Berhanu wrote:

> Basically, I want users to be able to search different parts of a document. 
> For instance I want them to be able to search a title that contains the 
> term 'data compression' and in the description 'Rate-Distortion theory'. 

Another way of doing this is to use prefixes when generating the index
terms, eg Xtitle: for words in the title that generate terms.

> This is the main reason I'm using several dbs. In addition I read that it's 
> better to have smaller dbs for better performance. (Maybe it's wrong)

It's not wrong, but it's not necessarily right either. Multiple
databases are more of a hassle to set up, so unless you actually need
to split for speed you're probably making life harder for yourself
than necessary.

> I don't actually run out of space when I grab the data, it just takes a 
> long time. For instance I wrote a small query script to search for a term, 
> let me know how many matches it finds and then loops throught the match 
> getting the data. I search for the word theory in description, within the 
> first 7 seconds it tells me it found 137480 which is good but then it takes 
> 2m15s to grab the data for each match.

Erm - is this grabbing the data for the *entire* result set of 137480?
What sort of search application do you need to process all search
results in one go for? Usually you'd only bother with perhaps a couple
of hundred at most in one go.

I don't have any particularly big databases to hand, so I can't check
whether this is particularly fast or slow for getting that amount of
data out, but perhaps you could confirm first that this is what you're
actually trying to do.


  James Aylett                                                  xapian.org
  james at tartarus.org                               uncertaintydivision.org

More information about the Xapian-discuss mailing list