[Xapian-tickets] [Xapian] #448: Allow usage of custom stemmers
Xapian
nobody at xapian.org
Fri Feb 19 03:35:01 GMT 2010
#448: Allow usage of custom stemmers
-------------------------+--------------------------------------------------
Reporter: esizikov | Owner: olly
Type: enhancement | Status: reopened
Priority: normal | Milestone: 1.2.x
Component: Library API | Version: 1.0.17
Severity: normal | Resolution:
Keywords: | Blockedby:
Platform: All | Blocking:
-------------------------+--------------------------------------------------
Changes (by olly):
* milestone: => 1.2.x
Comment:
I've no useful ideas what the problem with SWIG is I'm afraid. You're
doing some unusual hacking around with types here, which may be what is
confusing it.
> It's OK to put ABI changes into trunk only if I'll get it as a stable
release in a near future
I'm afraid at this point in the release cycle, you aren't likely to get
ABI changes for a new feature into anything other than trunk. There would
have to a compelling reason and a clean patch ready for merging. 1.0.x is
ABI stable, and we're at the release candidate stage for 1.2.0. We're
already months past where we were hoping to release 1.2.0, so we want to
avoid changes which might cause further delays.
> std::string::data() is a pointer to already prepared buffer of BUFSIZ
size (see the line # 158 in the attachment v 0.1.1, so I'm using
std::string as a self-contained buffer which will be freed immediately
after going out of scopes.
That's simply not a valid thing to do. It might happen to work with the
current version of your compiler, but that really means nothing if the
language standard doesn't permit it. You need to use a temporary buffer
and create (or assign to) the string object from that. It's annoying that
you have to have an extra copying step, but that's life.
> I didn't manage to build without copying create_s() function into my
code - I would say it's a design issue of Xapian::Stem : why doesn't it a
protected member of the class but an extern function? If I'm subclassing
Xapian::Stem from outside the Xapian core/bindings (e.g. this extension
module) I do not have it's implementation but only declaration. If you
tell me how to get it to work without copying the code I'd gladly do it
that way.
Mostly just because of how the code evolved from Snowball's C support code
I guess. But you don't need these Snowball-related structures for your
stemmer, so it's just pointless overhead to have them in there in your
stemmer and to initialise them.
Stepping back for a moment, I think a cleaner approach to allowing user-
provided stemming algorithms would be:
* Rename "Stem::Internal" to "Stem::!SnowballImplementation"
* Create an API-visible Stem::Implementation class.
+ Change Xapian::Stem to hold this in its internal pointer.
+ Make "Stem::!SnowballImplementation" a subclass of this.
* Add a new Xapian::Stem constructor to the API which takes
"Stem::Implementation*".
Then you can just create a !MyHunspellStem subclass of
"Xapian::Stem::Implementation", and wrap it in a Xapian::Stem object to
pass to the Xapian API. No need to grub around with Xapian's internals,
and have your code break when we change how the Snowball stemmers work.
You can just return a std::string instead of messing around with the
Snowball support code's odd interface. Should be easier to wrap for SWIG.
And no need to have both an internal and wrapper class for each user
defined stemmer.
Would that meet your needs?
This looks to me to only involve upwardly compatible API and ABI changes,
so could potentially even get backported to 1.0.x. So marking for 1.2.x.
--
Ticket URL: <http://trac.xapian.org/ticket/448#comment:8>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list