[Xapian-devel] Add an example to the community page and contribute more code

aarsh shah aarshkshah1992 at gmail.com
Sun Jan 27 19:25:55 GMT 2013


Hey Hi :) I sent a pull request for the paicehusk stemmer branch on github
.Please review it and let me know what changes I should make as this is my
first contribution to the community.I also tested the stemmer on the
voc.txt file provided in the xapian stemming data directory and the stemmer
did a fine job with it (have sent the 'output.txt' file for the stemmer
along with the pull request.).

Ive also updated all possible documentation so as to include this stemmer
and have sent it along with the pull request.

However,the work is not yet complete and I've commented out the following
code from the xapian-core/languages/stem.cc file :-

case PAICEHUSK:
                               internal=new StemPaiceHusk;
                               return;

because I get an error saying "undefined reference to vtable for
Xapian::StemPaiceHusk" if I try to use  the code mentioned above.However,it
works fine when I use it externally with Xapian.I think this it is because
I have yet to figure out how to modify the
xapian-core/languages/sbl-dispatch.h file and the Makefile.mk file so as to
incorporate it into the library.Please can you help with this  ? I tried
what you mentioned in your mail(setting // alias paicehusk and appending
paicehusk.cc to the makefile.mk file),but it didn't work as I don't know
how to modify the sbl-dispatch.h file.

Please let me know if you  find any part of the pull request code
unsatisfactory and I'll modify it and send a new pull request.Thank you for
the awesome documentation and help which helped  in  development. :)

PS:-Debugged the stemmer by learning Valgrind ,feels good :)

-Regards
-Aarsh

On Thu, Jan 24, 2013 at 3:07 PM, Olly Betts <olly at survex.com> wrote:

> On Wed, Jan 23, 2013 at 10:45:42AM +0530, aarsh shah wrote:
> > Hi Olly :) I guess you are busy these days.
>
> We have visitors staying at the moment, so I'm afraid I'm not online as
> much as I typically am.  It sounds like you're making good progress
> unaided though!
>
> > Please can you just let me know about the  documentation standards
> > and expectations that the community has.Want to document the stemmer code
> > as nicely as I can :)
>
> I'd recommend reading the advice in the "HACKING" document, which is in
> the source tree in xapian-core/HACKING, but you can see it online too.
> It's useful to look through all of it if you're working on the code, but
> the part which is particular pertinent starts here:
>
> http://trac.xapian.org/browser/trunk/xapian-core/HACKING#L1043
>
> For a patch like this, there's not a lot of user documentation needed -
> look to see where we say which stemmers we offer and update those
> places.  It's an implementation on an existing algorithm, so a link to
> wherever it is officially described would be useful.
>
> For a new stemming algorithm, test coverage is quite important.  We want
> to check that it implements the described algorithm, so any examples
> from the description should definitely be in the test data.  Also make
> sure each rule in the stemmer (assuming it is rule based) has at least
> one example which exercises it in the test data.  It's also good to
> stem the english word list we already have with the new stemmer and
> include that, which helps to ensure it doesn't crash or hang on those
> inputs, and that it continues to return the same results for them in
> the future (which is useful even if those results haven't all been
> checked by hand).
>
> The data files for stemming tests live in xapian-data/stemming/ in
> the source tree.
>
> If there's one or more existing implementations available, then it's
> useful to run the english word list through those too and compare the
> results with what you get.
>
> Cheers,
>     Olly
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20130127/5df30a7c/attachment.htm>


More information about the Xapian-devel mailing list