[Xapian-discuss] Re: Best way to index relational table

Michael Schlenker schlenker at isn-oldenburg.de
Mon Jul 24 16:56:50 BST 2006


Sebastian Araya schrieb:
> Michael Schlenker <schlenk <at> isn-oldenburg.de> writes:
> 
>> Sebastian Araya wrote:
>>> Hi!
>>>
>>>
>>>   I'm facing the following issue: I need to index with xapian a relational 
>>> table, which has a few enumerated fields and two text fields:
>>>
>>> E1	E2	E3	T1	T2
>>> ------------------	----------
>>>     Enumerated             Text
>>>
>>>   I want to index the table in order to narrow by specific field (e.g. for 
> a 
>>> given text in T1 and by enumerated fields E2 and E3). For example, if E2 is 
>>> author's code, E3 is theme's code and T1 is the chapter text, I want to 
> issue:
>>> "there and back again author:tolkien theme:fantasy"
>>>
>>>   I was playing with Lucene and there is an api call to index by term like 
>>> Lucene::Keyword and Lucene::Text to index text (contents), so I can do the 
>>> following:
>>>
>>>   addField( Lucene::Keyword, 'author:tolkien' )
>>>   addField( Lucene::Keyword, 'theme:fantasy' )
>>>   addField( Lucene::Text, <textofthebook> )
>>>
>>>   I think it is possible to do in Xapian to index termnames and termdata, 
> but I 
>>> don't found the right way... could you give an example or a little sample ??
>> Yes, you can do that easily with Xapian. Depending on your needs a
>> combination of a standard relational table for keywords and xapian for
>> the fulltext fields could be the best solution.
>>
>> For the Tcl binding for example that translates to:
>>
>> xapian::Document doc
>> doc add_term author:Tolkien
>> doc add_term theme:fantasy
>>
> Hello Michael,
> 
>   thanks for your quickly answer !!
> 
>   I'm working on php, so I will ask you a terrible beginner' questions about 
> tcl... when you issue:
> 
> doc add_term author:Tolkien
> doc add_term theme:fantasy
> doc add_data $textOfTheBook
> 
>   'author:Tolkien' and 'theme:fantasy' is treated as a string, right? 
Right. Tcl does not need "" when the string does not have whitespace.

And
> $textOfTheBook is variable which holds the text (ok?). 
Nearly. $textOfTheBook is the value of the variable not the variable
itself, PHP does not make the distinction between value and variable as
clear.

> So, when I want to 
> perform a search I could issue:
> 
> there and back again author:Tolkien theme:fantasy
> 
>   Now, suppouse that I need to narrow my search in two themes or categories, 
> like 'theme:fantasy' and 'theme:filology', so, how I can create the query ?? I 
> think this isn't workout:
> 
> $query = new_Query( Query_OP_AND, 'there and back again author:Tolkien 
> theme:fantasy OR theme:filology' );

> 
>   And the other question is: suppouse my entirely problem: R(E1,E2,E3,T1,T2), 
> now I need to specify a query in T1, in T2 and, T1 and T2. But T1 and T2 aren't 
> indexed with termnames. 
> 
>   Let'me explain again: E1, E2, E3 are enumerated fields (author: isbn: theme: 
> etc.) and T1 and T2 text fields (like chapter text and editor's commentaries). 
> So, how can I properly index this table in order to search in any text fields 
> (but without overlaps)?
Sure, i know what you mean.

I do something quite similar for a small xml file (a tcl packages list
at http://www.flightlab.com/~jenglish/gutter/ ):

You can search it here (not polished or anything, it is just a proof of
concept i need time to work on):
http://physnet.physik.uni-oldenburg.de/cgi-bin/gsearch.cgi

A commandline version of the search illustrates how the query is done,
see the part where add_boolean_prefix is called..., should be easy to
translate this to the php binding. (which i don't use although i have to
use php daily..., Tcl's just nicer and easier for many things)

proc openIndexDatabase {file} {
    xapian::Database xapiandb $file
    xapian::Stem estem "english"
    return xapiandb
}

proc closeIndexDatabase {db} {
    $db -delete
}

if {[llength $argv] == 0} {
  log_error "Empty commandline"
  exit 1
}
set db [openIndexDatabase $dbfile]
xapian::Enquire enquire $db
log_info "Commandline is $argv"
log_debug "Building query"
xapian::QueryParser qparse
set qp qparse
$qp set_database $db
$qp set_stemmer estem
$qp add_boolean_prefix "author" "A:"
$qp add_boolean_prefix "license" "L:"
$qp add_boolean_prefix "package" "P:"
$qp add_boolean_prefix "doc" "D:"
set query [$qp parse_query [join $argv]]

log_debug "Performing query [$query get_description]'"

enquire set_query $query
set matches [enquire get_mset 0 10]
log_info "[$matches get_matches_estimated] results found"

for {set i [$matches begin]} {![$i equals [$matches end]]} {$i next} {
        xapian::Document document [$i get_document]
        puts [format {ID %s %s%% [%s]} \
             [$i get_docid] [$i get_percent] [document get_data]]
}

closeIndexDatabase $db







More information about the Xapian-discuss mailing list