[Xapian-discuss] Filter similar results

Robby Walker robby.walker at gmail.com
Sun Sep 17 04:57:58 BST 2006


> > I'm trying to implement something like Google's "similar results
> > omitted" and I'm not sure how to go about it.

> My thought would be to enhance the collapse feature to allow collapsing
> to leave N documents with an identical collapse key instead of just one.
> It's probably a comparable amount of work to trying to do it outside of
> Xapian and you get a much more satisfactory solution.

I've been thinking this over for a couple of days now, and I think in
fact going one step more generic might be even better.  Add the
concept of an MSetDecider that takes an MSetItem and an MSet and
decides what item to remove (if any).  The existing code would work as
an MSetDecider based on collapse_key, Weight class, and sort_key.  You
could easily write an MSetDecider that also uses collapse_count.

For my case, I could write a completely different MSetDecider that
decides partly based on the number of documents found so far in each
set.

For a simpler example (still semi-applicable to my problem) let's
assume that a user wants to sort based on similarity.  So, maybe we're
searching an employee database and we say something like
Lastname:Walter AND UsesGmail AND LikesXapian.  And maybe we want to
sort partially based on similarity to last name.  So I (Walker) would
show up early based on my last name's similarity to Walter.  I don't
believe a ranking like this can be done currently.  Is that true?

So, it would add a few nice ways to customize Xapian.  On the down
side, it puts one or more virtual function calls in the middle of an
inner loop and probably complicates the Remote communication. (for the
time being, you could only support the default MSetDecider over remote
conns)

I'm willing to write the patch provided you tell me it's 1) not stupid
and 2) maybe even useful.  Thoughts?

Thanks,
Robby



More information about the Xapian-discuss mailing list