[Xapian-discuss] Incremental indexing

Marios Titas redneb8888 at gmail.com
Thu Mar 22 08:50:34 GMT 2012


On Tue, Mar 20, 2012 at 21:57, Olly Betts <olly at survex.com> wrote:
> So you could check if the wdf goes to zero and call remove_term() if it
> does.

If I am not mistaken there is no fast way to find a term in a specific
document. So you have to go through all terms in the document and
remove the ones with a wdf equal to zero. Is there any other better
way?


Olly, do you have any thoughts about the other idea, the one about
group by? Let me give you an example so that you can see what I mean:
Suppose that you have a data set about some music albums.
Specifically, for each album you have information such as the name of
the artist or the titles of the songs but you also have a list of user
reviews. The challenge of course is how to update the search indices
efficiently when a new review is posted. With the group by + aggregate
function approach you could just index each review as a separate
document and include the album id as a value. When querying you would
have to group the documents by the album id and return one result per
group having a weight that would take into account the grouping. This
has the additional advantage that you could also search for individual
user reviews (i.e. w/o doing any grouping) as well as for albums.



More information about the Xapian-discuss mailing list