[Xapian-discuss] word context, numeric values, and characters
Olly Betts
olly at survex.com
Wed Dec 7 03:11:32 GMT 2005
On Tue, Dec 06, 2005 at 09:01:10PM -0600, Peter Karman wrote:
> For example, the html "<title>foo</title>" could be indexed as "Tfoo". A
> query for "title:foo" would be parsed with add_prefix("title","T") and that
> would generate a match for "Tfoo".
>
> Am I understanding that process correctly?
Yes, that's spot on.
> 2. add_value() and set_data() require char* arguments; there is no support
> for an int or other numeric value. How then does sorting work for numeric
> values?
(Actually, std::string not char* but you can pass a char* or const char*
and C++ will automatically convert...)
If you want to set a numeric value, you'll need to convert it to a
string first (although a convenience overload which handled this for you
might be handy, particularly for add_value).
Currently numeric sorting isn't supported directly, though if you
left-pad the values with zero or space to a fixed width you can get the
same effect with a string sort.
The plan is to allow a user-specified sort functor (similar in style to
Xapian::MatchDecider).
> 3. add_term() and add_posting() do not parse the passed char* string at
> all; it is indexes as-is. Any parsing (stemming, splitting into words on
> non-word characters) must happen before adding to the db. indextext.cc is
> one example in the omega package of text parsing prior to adding to the db.
Yes.
Cheers,
Olly
More information about the Xapian-discuss
mailing list