[Xapian-discuss] some sample perl code
Dan Carpenter
error27 at gmail.com
Sun Jan 29 01:53:01 GMT 2006
I had a hard time finding sample perl code so I here is mine. Instead
of storing the real position it just stores the line number.
One thing that puzzled me is I'm not sure about the error handling on
set_data() and add_posting(). Wouldn't they normally return 0 on
failure so that you could say set_data() or warn "set_data() failed"?
I'm really impressed with xapian. I'm using the flint backend and
it's really fast.
regards,
dan carpenter
#=========================
#!/usr/bin/perl -w
use strict;
use Search::Xapian;
use File::Find;
my $DATABASE_DIR = '/home/dcarpenter/tmp/firm';
my $db = Search::Xapian::WritableDatabase->new($DATABASE_DIR,
Search::Xapian::DB_CREATE_OR_OPEN) or
die "can't create write-able db object: $!\n";
my $dir = shift;
if (!$dir) {
print "usage: index_data.pl <dir>\n";
exit(1);
}
my $file;
my $doc;
my $line;
my @words;
my $tmp;
my $count = 0;
find(\&index, $dir);
sub index {
# only index regular text files
return unless -T $_;
$file = $_;
$doc = Search::Xapian::Document->new()
or die "can't create doc object for $file: $!\n";
if ($doc->set_data("$File::Find::name")){
warn "can't set_data in doc object for $file: $!\n";
}
$line = 1;
open(FILE, $file);
while (<FILE>){
s/^\W+//;
s/\W+$//;
@words = split(/\W+/, $_);
foreach $tmp (@words){
if ($doc->add_posting($tmp, $line)){
warn "can't add word $tmp $line: $!\n";
}
}
$line++;
}
close(FILE);
$db->add_document($doc)
or warn "failed to add document: $file\n";
$count++;
if ($count%500 == 0){
print "$count files indexed\n";
}
}
print "Total: $count files indexed\n";
More information about the Xapian-discuss
mailing list