[Xapian-devel] Document clustering module?
Olly Betts
olly at survex.com
Sun Sep 16 16:17:01 BST 2007
[There's no need to Cc: me on list replies]
On Sun, Sep 16, 2007 at 08:26:05PM +0800, Yung-chung Lin wrote:
> The attached file is my current public clustering interface. I think
> it would be easier to have discussions with a header file present.
Good idea.
> DSet, in the header file, stands for one cluster of documents and
> MultiDSet stands for clusters of documents.
Returning a vector of vectors by value seems suboptimal.
Simply using typedef of a vector is problematic too - existing Xapian
classes are either reference counted handles, or have very few members,
so users can expect that copying them is cheap.
> I am using a standalone similarity function
> 'calculate_doc_similarity()' which is overridable.
Unfortunately, you can't usefully put virtual functions on classes which
use RefCntPtr - if you subclass, you're only subclassing the "pointer"
bit, so Xapian won't be able to call back to the overridden method.
Bug#186 is relevant (I had some further thoughts about how we could
address this but I don't have a full solution yet):
http://www.xapian.org/cgi-bin/bugzilla/show_bug.cgi?id=186
> Maybe putting the similarity function into a class would be even
> better. It needs discussion.
I think that is probably the answer.
> Now, I am using MultiDSet to store documents. I am thinking if it
> would better if it returns multiple MSets, MultiMset, but the design
> will be different and more complicated.
I think I need to mull over how this would all be used. Reusing MSet
would be nice if it's a good fit, since adding more API classes tends to
make it harder to learn the API, so it's good if it can be avoided. But
forcing reuse where something isn't a natural fit would be worse.
> #include <xapian/base.h>
> #include <xapian/deprecated.h>
> #include <xapian/enquire.h>
> #include <xapian/types.h>
> #include <xapian/database.h>
> #include <xapian/document.h>
> #include <xapian/visibility.h>
You don't use deprecated.h here.
And I don't think you need database.h or enquire.h - you can just
forward declare "class Database;" and "class MSet;" inside the
namespace.
Cheers,
Olly
More information about the Xapian-devel
mailing list