[Xapian-devel] Document clustering module?

Olly Betts olly at survex.com
Tue Sep 18 20:46:30 BST 2007


On Tue, Sep 18, 2007 at 12:46:30PM +0800, ??? ????????? ??? (Yung-chung Lin) wrote:
> I have made a new class for calculating document similarity. Please
> review it.

Generally looks good - a few comments below.

> Maybe the class should be an internal one, since this will
> only be called by Xapian::Cluster in my plan.

I think a similarity measure probably has other uses though.

Also, users might want to implement their own measure (like they can do
with Xapian::Weight, etc) and let Xapian cluster using it.

> +class Xapian::DocSim::Internal : public Xapian::Internal::RefCntBase {
> +  public:
> +    Xapian::Database db;
> +};

If we want to allow user subclassing, we can't use RefCntBase and
RefCntPtr.  Xapian::Weight is probably a good model, except this
probably doesn't need to be serialisable.

> +Xapian::DocSim::DocSim(Xapian::DocSim::Internal * internal_)
> +{
> +    internal = internal_;
> +}
> +
> +Xapian::DocSim::DocSim()
> +        : internal(new Xapian::DocSim::Internal())
> +{
> +}
> +
> +Xapian::DocSim::DocSim(const Xapian::DocSim & other)
> +        : internal(other.internal)
> +{
> +}

Generally we should probably put trivial constructors like these in the
headers so that then can be inlined.  Sometimes that means other headers
need to be pulled in, which is one reason not to do this.

Similarly, trivial non-virtual methods (especially setters and getters)
should probably go in the header.

> +libxapian_la_SOURCES += \
> +	docsim/docsim.cc docsim/docsim_cosine.cc

Use a vertical list here, so the "docsim/" bits all line up.

> --- ../xapian-core-1.0.2/tests/docsimtest.cc	1970-01-01 08:00:00.000000000 +0800
> +++ tests/docsimtest.cc	2007-09-18 12:20:22.000000000 +0800

I think these tests should probably go in apitest, so we can run them on
all backends (in particular, testing with the remote backend would be
useful).

> +    /// Make a new empty DocSim
> +    DocSimCosine() { };

There's a bogus trailing ";" here.

> +    /// Destructor
> +    ~DocSimCosine() { };

And here.

Also, virtual functions generally shouldn't be declared inline in the
header as they can't generally be inlined.

Cheers,
    Olly



More information about the Xapian-devel mailing list