[Xapian-discuss] some sample perl code

Dan Carpenter error27 at gmail.com
Sun Jan 29 01:53:01 GMT 2006


I had a hard time finding sample perl code so I here is mine.  Instead
of storing the real position it just stores the line number.

One thing that puzzled me is I'm not sure about the error handling on
set_data() and  add_posting().  Wouldn't they normally return 0 on
failure so that you could say set_data() or warn "set_data() failed"?

I'm really impressed with xapian.  I'm using the flint backend and
it's really fast.

regards,
dan carpenter

#=========================
#!/usr/bin/perl -w

use strict;
use Search::Xapian;
use File::Find;

my $DATABASE_DIR = '/home/dcarpenter/tmp/firm';
my $db = Search::Xapian::WritableDatabase->new($DATABASE_DIR,
	     Search::Xapian::DB_CREATE_OR_OPEN) or
     die "can't create write-able db object: $!\n";

my $dir = shift;
if (!$dir) {
    print "usage:  index_data.pl <dir>\n";
    exit(1);
}

my $file;
my $doc;
my $line;
my @words;
my $tmp;
my $count = 0;
find(\&index, $dir);
sub index {
    # only index regular text files
    return unless -T $_;

    $file = $_;
    $doc = Search::Xapian::Document->new()
	or die "can't create doc object for $file: $!\n";
    if ($doc->set_data("$File::Find::name")){
        warn "can't set_data in doc object for $file: $!\n";
    }

    $line = 1;
    open(FILE, $file);
    while (<FILE>){
	s/^\W+//;
	s/\W+$//;
	@words = split(/\W+/, $_);	
	foreach $tmp (@words){
	    if ($doc->add_posting($tmp, $line)){
		warn "can't add word $tmp $line: $!\n";
	    }
	}
	$line++;
    }
    close(FILE);

    $db->add_document($doc)
	or warn "failed to add document: $file\n";

    $count++;
    if ($count%500 == 0){
	print "$count files indexed\n";
    }
}

print "Total:  $count files indexed\n";



More information about the Xapian-discuss mailing list