[Xapian-discuss] Re: Best way to index relational table
Michael Schlenker
schlenker at isn-oldenburg.de
Mon Jul 24 16:56:50 BST 2006
Sebastian Araya schrieb:
> Michael Schlenker <schlenk <at> isn-oldenburg.de> writes:
>
>> Sebastian Araya wrote:
>>> Hi!
>>>
>>>
>>> I'm facing the following issue: I need to index with xapian a relational
>>> table, which has a few enumerated fields and two text fields:
>>>
>>> E1 E2 E3 T1 T2
>>> ------------------ ----------
>>> Enumerated Text
>>>
>>> I want to index the table in order to narrow by specific field (e.g. for
> a
>>> given text in T1 and by enumerated fields E2 and E3). For example, if E2 is
>>> author's code, E3 is theme's code and T1 is the chapter text, I want to
> issue:
>>> "there and back again author:tolkien theme:fantasy"
>>>
>>> I was playing with Lucene and there is an api call to index by term like
>>> Lucene::Keyword and Lucene::Text to index text (contents), so I can do the
>>> following:
>>>
>>> addField( Lucene::Keyword, 'author:tolkien' )
>>> addField( Lucene::Keyword, 'theme:fantasy' )
>>> addField( Lucene::Text, <textofthebook> )
>>>
>>> I think it is possible to do in Xapian to index termnames and termdata,
> but I
>>> don't found the right way... could you give an example or a little sample ??
>> Yes, you can do that easily with Xapian. Depending on your needs a
>> combination of a standard relational table for keywords and xapian for
>> the fulltext fields could be the best solution.
>>
>> For the Tcl binding for example that translates to:
>>
>> xapian::Document doc
>> doc add_term author:Tolkien
>> doc add_term theme:fantasy
>>
> Hello Michael,
>
> thanks for your quickly answer !!
>
> I'm working on php, so I will ask you a terrible beginner' questions about
> tcl... when you issue:
>
> doc add_term author:Tolkien
> doc add_term theme:fantasy
> doc add_data $textOfTheBook
>
> 'author:Tolkien' and 'theme:fantasy' is treated as a string, right?
Right. Tcl does not need "" when the string does not have whitespace.
And
> $textOfTheBook is variable which holds the text (ok?).
Nearly. $textOfTheBook is the value of the variable not the variable
itself, PHP does not make the distinction between value and variable as
clear.
> So, when I want to
> perform a search I could issue:
>
> there and back again author:Tolkien theme:fantasy
>
> Now, suppouse that I need to narrow my search in two themes or categories,
> like 'theme:fantasy' and 'theme:filology', so, how I can create the query ?? I
> think this isn't workout:
>
> $query = new_Query( Query_OP_AND, 'there and back again author:Tolkien
> theme:fantasy OR theme:filology' );
>
> And the other question is: suppouse my entirely problem: R(E1,E2,E3,T1,T2),
> now I need to specify a query in T1, in T2 and, T1 and T2. But T1 and T2 aren't
> indexed with termnames.
>
> Let'me explain again: E1, E2, E3 are enumerated fields (author: isbn: theme:
> etc.) and T1 and T2 text fields (like chapter text and editor's commentaries).
> So, how can I properly index this table in order to search in any text fields
> (but without overlaps)?
Sure, i know what you mean.
I do something quite similar for a small xml file (a tcl packages list
at http://www.flightlab.com/~jenglish/gutter/ ):
You can search it here (not polished or anything, it is just a proof of
concept i need time to work on):
http://physnet.physik.uni-oldenburg.de/cgi-bin/gsearch.cgi
A commandline version of the search illustrates how the query is done,
see the part where add_boolean_prefix is called..., should be easy to
translate this to the php binding. (which i don't use although i have to
use php daily..., Tcl's just nicer and easier for many things)
proc openIndexDatabase {file} {
xapian::Database xapiandb $file
xapian::Stem estem "english"
return xapiandb
}
proc closeIndexDatabase {db} {
$db -delete
}
if {[llength $argv] == 0} {
log_error "Empty commandline"
exit 1
}
set db [openIndexDatabase $dbfile]
xapian::Enquire enquire $db
log_info "Commandline is $argv"
log_debug "Building query"
xapian::QueryParser qparse
set qp qparse
$qp set_database $db
$qp set_stemmer estem
$qp add_boolean_prefix "author" "A:"
$qp add_boolean_prefix "license" "L:"
$qp add_boolean_prefix "package" "P:"
$qp add_boolean_prefix "doc" "D:"
set query [$qp parse_query [join $argv]]
log_debug "Performing query [$query get_description]'"
enquire set_query $query
set matches [enquire get_mset 0 10]
log_info "[$matches get_matches_estimated] results found"
for {set i [$matches begin]} {![$i equals [$matches end]]} {$i next} {
xapian::Document document [$i get_document]
puts [format {ID %s %s%% [%s]} \
[$i get_docid] [$i get_percent] [document get_data]]
}
closeIndexDatabase $db
More information about the Xapian-discuss
mailing list