[Xapian-discuss] Best way to index relational table
Michael Schlenker
schlenk at isn-oldenburg.de
Mon Jul 24 15:31:53 BST 2006
Sebastian Araya wrote:
> Hi!
>
>
> I'm facing the following issue: I need to index with xapian a relational
> table, which has a few enumerated fields and two text fields:
>
> E1 E2 E3 T1 T2
> ------------------ ----------
> Enumerated Text
>
> I want to index the table in order to narrow by specific field (e.g. for a
> given text in T1 and by enumerated fields E2 and E3). For example, if E2 is
> author's code, E3 is theme's code and T1 is the chapter text, I want to issue:
>
> "there and back again author:tolkien theme:fantasy"
>
> I was playing with Lucene and there is an api call to index by term like
> Lucene::Keyword and Lucene::Text to index text (contents), so I can do the
> following:
>
> addField( Lucene::Keyword, 'author:tolkien' )
> addField( Lucene::Keyword, 'theme:fantasy' )
> addField( Lucene::Text, <textofthebook> )
>
> I think it is possible to do in Xapian to index termnames and termdata, but I
> don't found the right way... could you give an example or a little sample ??
Yes, you can do that easily with Xapian. Depending on your needs a
combination of a standard relational table for keywords and xapian for
the fulltext fields could be the best solution.
For the Tcl binding for example that translates to:
xapian::Document doc
doc add_term author:Tolkien
doc add_term theme:fantasy
# not sure if Lucene::Text only stores the text or actually indexes
# the text and breaks it down into terms
# this would simply store the fulltext, but not break it down into terms
# the examples dir has a proc to do indexing
doc add_data $textOfTheBook
You simply prefix your categories with a unique prefix, that is not used
in normal terms (in this example you would disallow : in normal terms,
uppercase prefixes are also a possibility if you lowercase all your
terms while indexing).
Michael
More information about the Xapian-discuss
mailing list