[Xapian-discuss] get_data not fast enough for query matches

Salem Berhanu salemb4 at hotmail.com
Sat Feb 4 15:20:38 GMT 2006


>Don't index different fields of a document into different databases if
>you want to be able to search them together - there's no good reason to
>and it just means you have to mess around merging the results from
>multiple searches.
>
>Instead prefix terms generated from additional fields, as James
>suggested.

This makes sense I am going to do this. Also I was wandering when I reindex 
if I should use a flint backend. I have read in the list that it's supposed 
to be faster to query. Is it slower to index but faster to query or fast in 
both cases? Also is xapian-compact what I should use to merge/compact dbs 
indexed with a flint backend. I wasn't sure if it was fully implemented like 
quartz.

> > I don't actually run out of space when I grab the data, it just takes a
> > long time. For instance I wrote a small query script to search for a 
>term,
> > let me know how many matches it finds and then loops throught the match
> > getting the data. I search for the word theory in description, within 
>the
> > first 7 seconds it tells me it found 137480 which is good but then it 
>takes
> > 2m15s to grab the data for each match.
>
>We don't expect people to want all the results of a search that matches
>so many documents, so I'm not suprised that this isn't lightning fast.
>
>You're forcing the matcher to avoid most of its possible optimisations
>(which is probably why the search takes 7 seconds), and then you're
>retrieving lots of entries from the record table, which has been
>designed with the expectation that you'll want more like 10-1000
>results.

I wasn't aware I was forcing the matcher to avoid its possible 
optimisations. What I am doing that's forcing this?

>I'm guessing you're only trying to get all the results so you can merge
>the results from searching two fields in different databases, in which
>case this ceases to be an issue if you use term prefixes instead.  If
>I'm wrong, please explain *WHY* you want all 137480 matches.
>

Yeap, that's the main reason. I think also we wanted to offer users the 
option of saving their search results but I guess we can save the matches 
and display the data in small ranges, per page.

Thanks a lot!
Salem





More information about the Xapian-discuss mailing list