[Xapian-discuss] get_data not fast enough for query matches

Salem Berhanu salemb4 at hotmail.com
Thu Feb 2 16:00:29 GMT 2006


I'm sorry. I was trying to explain, I guess it wasn't clear enough. 
Basically, I want users to be able to search different parts of a document. 
For instance I want them to be able to search a title that contains the term 
'data compression' and in the description 'Rate-Distortion theory'. This is 
the main reason I'm using several dbs. In addition I read that it's better 
to have smaller dbs for better performance. (Maybe it's wrong)
I don't actually run out of space when I grab the data, it just takes a long 
time. For instance I wrote a small query script to search for a term, let me 
know how many matches it finds and then loops throught the match getting the 
data. I search for the word theory in description, within the first 7 
seconds it tells me it found 137480 which is good but then it takes 2m15s to 
grab the data for each match.
Salem



>From: James Aylett <james-xapian at tartarus.org>
>To: Salem Berhanu <salemb4 at hotmail.com>, xapian-discuss at lists.xapian.org
>Subject: Re: [Xapian-discuss] get_data not fast enough for query matches
>Date: Thu, 2 Feb 2006 15:15:48 +0000
>
>On Thu, Feb 02, 2006 at 03:00:54PM +0000, Salem Berhanu wrote:
>
> > I am not storing anything in the document data other than the
> > ids. Eventually I will link to an external database but I will do it
> > in ranges of not more than 50 at a time. However, I will need the
> > initial compete ids to combine with results from other xapian
> > dbs. This is because each document is broken up into chunks (since
> > the information can be logically divided) and indexed in separate
> > dbs. (eg. there is a title db, a description db ... ) I want to be
> > able to combine the results across these dbs using boolean
> > expressions (since I am assuming there isn't a built in way of doing
> > this).
>
>I'm sorry, there's not much point in trying to answer questions like
>this without understand what you're actually trying to achieve. I
>don't know why you're using several different databases, for
>instance.
>
>Let's go back to the beginning. What sort of performance issues are
>you actually seeing? Have you investigated performance using the
>standard performance tools for your platform? For instance, are you
>running out of file buffer cache at the point that you start accessing
>the document data? Other people have handled larger data sets that
>you, so it may be that your hardware or configuration isn't quite
>right for what you're trying to do.
>
>(And please keep the discussion on list where others can help you as
>well :-)
>
>Cheers,
>James
>
>--
>/--------------------------------------------------------------------------\
>   James Aylett                                                  xapian.org
>   james at tartarus.org                               uncertaintydivision.org





More information about the Xapian-discuss mailing list