[Xapian-discuss] New to Xapian (coming from Lucene)

Jeff Anderson captvanhalen at gmail.com
Fri Apr 13 17:48:02 BST 2007


On 4/13/07, Alexander Lind <malte at webstay.org> wrote:
> Are you sure you are not talking about values here? You can establish a
> value at say number 0, which is always say a product_id (linking the
> item to something in a database perhaps). And then number 1 could be the
> price of an item. Number 2 could be the weight.
> The numbers are not boosts, they are just there to let you establish a
> structure of values. You can use these values to sort results by price,
> sizes, popularity or whatnot.

Yeah, that's extremely clunky for my needs. I'd much rather work an
API that allows me to specify product_id, price, and weight as words.
Not numbers. I know C++ has hashes. :)

This is where Perl could be used to provide a more useful interface to
Xapian. That is, someone (ahem :)) should write a CPAN module wrapper
around Search::Xapian that provides a bridge.

Being able to access data via keys is SOOO much more flexible (for the
user). Check out this usage of an API for creating an indexer script:

$indexer->spec_field(name => 'url', anaylyzed => 0);
$indexer->spec_field(name => 'title',   boost => 4);
$indexer->spec_field(name => 'isbn');

for my $product (@product) {

    my %product = Workman::product_detail->to_hash(story => $story);
    my $doc     = $indexer->new_doc;

    my $term = KinoSearch::Index::Term->new( 'isbn', $product->{url} );
    $indexer->delete_docs_by_term($term);

    $doc->set_value( url         => $product{url} );
    $doc->set_value( title       => $product{title} );
    $doc->set_value( isbn      => $product{isbn_10} );

    $indexer->add_doc($doc);

    print "Indexed $product{title}\n";
}

And now, when i need to display search results:

my $hits = $searcher->search( query => $query );

while ( my $hit = $hits->fetch_hit_hashref ) {
    printf "%s (%s)\n\n%s\n\n\n",
        $hit->{title},
        $hit->{url},
        $hit->{description},
    ;
}

I shouldn't be required to know that description is the third value,
or title is the first.

I think Xapian would greatly improve by providing such an API to the
users. Why do you think PHP is so much more popular than Perl? Because
it's easier to use. Period. Perhaps in the long run Xapian would be a
better solution, but i can be up and running with Kinosearch in the
time it took me to write this email. I'd still be writing plumbing
code to hook Xapian up to my site. :(

And all it takes is improving the API just a bit. Make new set_data()
and get_data() methods that take optional keys. Still want to use your
own JSON data structure? Fine ... don't supply a key.

I just don't see why the Xapian API wouldn't supply such. :(



-- 
jeffa



More information about the Xapian-discuss mailing list