[Xapian-devel] Document clustering module?

Olly Betts olly at survex.com
Sun Sep 16 16:17:01 BST 2007


[There's no need to Cc: me on list replies]

On Sun, Sep 16, 2007 at 08:26:05PM +0800, Yung-chung Lin wrote:
> The attached file is my current public clustering interface. I think
> it would be easier to have discussions with a header file present.

Good idea.

> DSet, in the header file, stands for one cluster of documents and
> MultiDSet stands for clusters of documents.

Returning a vector of vectors by value seems suboptimal.

Simply using typedef of a vector is problematic too - existing Xapian
classes are either reference counted handles, or have very few members,
so users can expect that copying them is cheap.

> I am using a standalone similarity function
> 'calculate_doc_similarity()' which is overridable.

Unfortunately, you can't usefully put virtual functions on classes which
use RefCntPtr - if you subclass, you're only subclassing the "pointer"
bit, so Xapian won't be able to call back to the overridden method.
Bug#186 is relevant (I had some further thoughts about how we could
address this but I don't have a full solution yet):

http://www.xapian.org/cgi-bin/bugzilla/show_bug.cgi?id=186

> Maybe putting the similarity function into a class would be even
> better. It needs discussion.

I think that is probably the answer.

> Now, I am using MultiDSet to store documents. I am thinking if it
> would better if it returns multiple MSets, MultiMset, but the design
> will be different and more complicated.

I think I need to mull over how this would all be used.  Reusing MSet
would be nice if it's a good fit, since adding more API classes tends to
make it harder to learn the API, so it's good if it can be avoided.  But
forcing reuse where something isn't a natural fit would be worse.

> #include <xapian/base.h>
> #include <xapian/deprecated.h>
> #include <xapian/enquire.h>
> #include <xapian/types.h>
> #include <xapian/database.h>
> #include <xapian/document.h>
> #include <xapian/visibility.h>

You don't use deprecated.h here.

And I don't think you need database.h or enquire.h - you can just
forward declare "class Database;" and "class MSet;" inside the
namespace.

Cheers,
    Olly



More information about the Xapian-devel mailing list