[Xapian-discuss] word context, numeric values, and characters

Olly Betts olly at survex.com
Wed Dec 7 03:11:32 GMT 2005


On Tue, Dec 06, 2005 at 09:01:10PM -0600, Peter Karman wrote:
> For example, the html "<title>foo</title>" could be indexed as "Tfoo". A 
> query for "title:foo" would be parsed with add_prefix("title","T") and that 
> would generate a match for "Tfoo".
> 
> Am I understanding that process correctly?

Yes, that's spot on.

> 2. add_value() and set_data() require char* arguments; there is no support 
> for an int or other numeric value. How then does sorting work for numeric 
> values?

(Actually, std::string not char* but you can pass a char* or const char*
and C++ will automatically convert...)

If you want to set a numeric value, you'll need to convert it to a
string first (although a convenience overload which handled this for you
might be handy, particularly for add_value).

Currently numeric sorting isn't supported directly, though if you
left-pad the values with zero or space to a fixed width you can get the
same effect with a string sort.

The plan is to allow a user-specified sort functor (similar in style to
Xapian::MatchDecider).

> 3. add_term() and add_posting() do not parse the passed char* string at 
> all; it is indexes as-is. Any parsing (stemming, splitting into words on 
> non-word characters) must happen before adding to the db. indextext.cc is 
> one example in the omega package of text parsing prior to adding to the db.

Yes.

Cheers,
    Olly



More information about the Xapian-discuss mailing list