[Xapian-tickets] [Xapian] #448: Allow usage of custom stemmers

Xapian nobody at xapian.org
Fri Feb 19 03:35:01 GMT 2010


#448: Allow usage of custom stemmers
-------------------------+--------------------------------------------------
 Reporter:  esizikov     |        Owner:  olly    
     Type:  enhancement  |       Status:  reopened
 Priority:  normal       |    Milestone:  1.2.x   
Component:  Library API  |      Version:  1.0.17  
 Severity:  normal       |   Resolution:          
 Keywords:               |    Blockedby:          
 Platform:  All          |     Blocking:          
-------------------------+--------------------------------------------------
Changes (by olly):

  * milestone:  => 1.2.x


Comment:

 I've no useful ideas what the problem with SWIG is I'm afraid.  You're
 doing some unusual hacking around with types here, which may be what is
 confusing it.

 > It's OK to put ABI changes into trunk only if I'll get it as a stable
 release in a near future

 I'm afraid at this point in the release cycle, you aren't likely to get
 ABI changes for a new feature into anything other than trunk.  There would
 have to a compelling reason and a clean patch ready for merging.  1.0.x is
 ABI stable, and we're at the release candidate stage for 1.2.0.  We're
 already months past where we were hoping to release 1.2.0, so we want to
 avoid changes which might cause further delays.

 > std::string::data() is a pointer to already prepared buffer of BUFSIZ
 size (see the line # 158 in the attachment v 0.1.1, so I'm using
 std::string as a self-contained buffer which will be freed immediately
 after going out of scopes.

 That's simply not a valid thing to do.  It might happen to work with the
 current version of your compiler, but that really means nothing if the
 language standard doesn't permit it.  You need to use a temporary buffer
 and create (or assign to) the string object from that.  It's annoying that
 you have to have an extra copying step, but that's life.

 > I didn't manage to build without copying create_s() function into my
 code - I would say it's a design issue of Xapian::Stem : why doesn't it a
 protected member of the class but an extern function? If I'm subclassing
 Xapian::Stem from outside the Xapian core/bindings (e.g. this extension
 module) I do not have it's implementation but only declaration. If you
 tell me how to get it to work without copying the code I'd gladly do it
 that way.

 Mostly just because of how the code evolved from Snowball's C support code
 I guess.  But you don't need these Snowball-related structures for your
 stemmer, so it's just pointless overhead to have them in there in your
 stemmer and to initialise them.

 Stepping back for a moment, I think a cleaner approach to allowing user-
 provided stemming algorithms would be:

   * Rename "Stem::Internal" to "Stem::!SnowballImplementation"
   * Create an API-visible Stem::Implementation class.
     + Change Xapian::Stem to hold this in its internal pointer.
     + Make "Stem::!SnowballImplementation" a subclass of this.
   * Add a new Xapian::Stem constructor to the API which takes
 "Stem::Implementation*".

 Then you can just create a !MyHunspellStem subclass of
 "Xapian::Stem::Implementation", and wrap it in a Xapian::Stem object to
 pass to the Xapian API.  No need to grub around with Xapian's internals,
 and have your code break when we change how the Snowball stemmers work.
 You can just return a std::string instead of messing around with the
 Snowball support code's odd interface.  Should be easier to wrap for SWIG.
 And no need to have both an internal and wrapper class for each user
 defined stemmer.

 Would that meet your needs?

 This looks to me to only involve upwardly compatible API and ABI changes,
 so could potentially even get backported to 1.0.x.  So marking for 1.2.x.

-- 
Ticket URL: <http://trac.xapian.org/ticket/448#comment:8>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list