[Xapian-discuss] How to Retrieve content of the document?
Rohit
76.rohit at gmail.com
Thu Apr 21 10:24:03 BST 2011
Hi,
I have just started using xapian and I may sound like a noob. I want to know
how i can access the content of the document retrieved while searching. I
have used the code found on this mailing list itself to index my database.
#!/usr/bin/perl -w
use strict;
use Search::Xapian;
use File::Find;
my $DATABASE_DIR = '/home/rohit/Desktop/SET/DB';
my $db = Search::Xapian::WritableDatabase->new($DATABASE_DIR,
Search::Xapian::DB_CREATE_OR_OPEN) or
die "can't create write-able db object: $!\n";
my $dir = shift;
if (!$dir) {
print "usage: index_data.pl <dir>\n";
exit(1);
}
my $file;
my $doc;
my $line;
my @words;
my $tmp;
my $count = 0;
find(\&index, $dir);
sub index {
# only index regular text files
return unless -T $_;
$file = $_;
$doc = Search::Xapian::Document->new()
or die "can't create doc object for $file: $!\n";
if ($doc->set_data("$File::Find::name")){
warn "can't set_data in doc object for $file: $!\n";
}
$line = 1;
open(FILE, $file);
while (<FILE>){
s/^\W+//;
s/\W+$//;
@words = split(/\W+/, $_);
foreach $tmp (@words){
if ($doc->add_posting($tmp, $line)){
warn "can't add word $tmp $line: $!\n";
}
}
$line++;
}
close(FILE);
print $doc->values_begin()->get_value();
$db->add_document($doc)
or warn "failed to add document: $file\n";
$count++;
if ($count%500 == 0){
print "$count files indexed\n";
}
}
print "Total: $count files indexed\n";
I then used the following script to search for a query:
#!/usr/bin/perl -w
use strict;
use Search::Xapian;
my $DATABASE_DIR = '/home/rohit/Desktop/SET/DB';
my $db = Search::Xapian::WritableDatabase->new($DATABASE_DIR,
Search::Xapian::DB_OPEN) or
die "can't create write-able db object: $!\n";
my $enq = $db->enquire( 'steel');
printf "Running query '%s'\n", $enq->get_query()->get_description();
my @matches = $enq->matches(0, 10);
print scalar(@matches) . " results found\n";
foreach my $match ( @matches ) {
my $doc = $match->get_document();
printf "ID %d %d%% [ %s ] \n", $match->get_docid(),
$match->get_percent(), $doc->get_data();
}
This returns to me 8 documents which I know is the correct answer becuase I
have made a search engine which gives me the same results. The problem is i
only get the document numbers(ids) but not the content. the
$doc->get_data(); is supposed to give me the content if i am not mistaken.
It isnt doing so. Any help would be appreciated.
my output looks like this:
Running query 'Xapian::Query(steel)'
10 results found
ID 312 100% [ /home/rohit/Desktop/SET/HW/HW1/cranfieldDocs/cranfield0859 ]
ID 1712 100% [ /home/rohit/Desktop/SET/HW/HW1/cranfieldDocs/cranfield0859 ]
ID 513 75% [ /home/rohit/Desktop/SET/HW/HW1/cranfieldDocs/cranfield1178 ]
ID 1913 75% [ /home/rohit/Desktop/SET/HW/HW1/cranfieldDocs/cranfield1178 ]
ID 931 69% [ /home/rohit/Desktop/SET/HW/HW1/cranfieldDocs/cranfield0891 ]
ID 2331 69% [ /home/rohit/Desktop/SET/HW/HW1/cranfieldDocs/cranfield0891 ]
ID 648 68% [ /home/rohit/Desktop/SET/HW/HW1/cranfieldDocs/cranfield1025 ]
ID 2048 68% [ /home/rohit/Desktop/SET/HW/HW1/cranfieldDocs/cranfield1025 ]
ID 27 63% [ /home/rohit/Desktop/SET/HW/HW1/cranfieldDocs/cranfield0856 ]
ID 1427 63% [ /home/rohit/Desktop/SET/HW/HW1/cranfieldDocs/cranfield0856 ]
Instead of [ /home/rohit/Desktop/SET/HW/HW1/cranfieldDocs/cranfield0856 ] I
want the content of doc no 856.
Thanks,
Rohit.
More information about the Xapian-discuss
mailing list