[Xapian-discuss] word context, numeric values, and characters
peter at peknet.com
Wed Dec 7 03:01:10 GMT 2005
I have a few assumptions about Xapian's features that I'm seeking confirmation
about. I've read the API docs, some of the example .cc files, the mailing list
and wiki and want to know if I'm understanding correctly.
1. contextual data
The convention for storing contextual information about words (i.e., what tag
they appear in in HTML, or what field/column in a db) is to prefix the term with
a string, and then map that string using add_prefix() or add_boolean_prefix() in
constructing a query.
For example, the html "<title>foo</title>" could be indexed as "Tfoo". A query
for "title:foo" would be parsed with add_prefix("title","T") and that would
generate a match for "Tfoo".
Am I understanding that process correctly?
2. add_value() and set_data() require char* arguments; there is no support for
an int or other numeric value. How then does sorting work for numeric values?
3. add_term() and add_posting() do not parse the passed char* string at all; it
is indexes as-is. Any parsing (stemming, splitting into words on non-word
characters) must happen before adding to the db. indextext.cc is one example in
the omega package of text parsing prior to adding to the db.
Thanks in advance for clarifying.
Peter Karman . http://peknet.com/ . peter at peknet.com
More information about the Xapian-discuss