[Xapian-discuss] How to Retrieve content of the document?

Rohit 76.rohit at gmail.com
Thu Apr 21 10:24:03 BST 2011


Hi,
I have just started using xapian and I may sound like a noob. I want to know
how i can access the content of the document retrieved while searching. I
have used the code found on this mailing list itself to index my database.

#!/usr/bin/perl -w

use strict;
use Search::Xapian;
use File::Find;


my $DATABASE_DIR = '/home/rohit/Desktop/SET/DB';
my $db = Search::Xapian::WritableDatabase->new($DATABASE_DIR,
         Search::Xapian::DB_CREATE_OR_OPEN) or
     die "can't create write-able db object: $!\n";

my $dir = shift;
if (!$dir) {
    print "usage:  index_data.pl <dir>\n";
    exit(1);
}

my $file;
my $doc;
my $line;
my @words;
my $tmp;
my $count = 0;
find(\&index, $dir);
sub index {
    # only index regular text files
    return unless -T $_;

    $file = $_;
    $doc = Search::Xapian::Document->new()
    or die "can't create doc object for $file: $!\n";
    if ($doc->set_data("$File::Find::name")){
        warn "can't set_data in doc object for $file: $!\n";
    }

    $line = 1;
    open(FILE, $file);
    while (<FILE>){
    s/^\W+//;
    s/\W+$//;
    @words = split(/\W+/, $_);
    foreach $tmp (@words){
        if ($doc->add_posting($tmp, $line)){
        warn "can't add word $tmp $line: $!\n";
        }
    }
    $line++;
    }
    close(FILE);
print $doc->values_begin()->get_value();

    $db->add_document($doc)
    or warn "failed to add document: $file\n";

    $count++;
    if ($count%500 == 0){
    print "$count files indexed\n";
    }
}

print "Total:  $count files indexed\n";


I then used the following script to search for a query:
#!/usr/bin/perl -w

use strict;
use Search::Xapian;

my $DATABASE_DIR = '/home/rohit/Desktop/SET/DB';
my $db = Search::Xapian::WritableDatabase->new($DATABASE_DIR,
         Search::Xapian::DB_OPEN) or
     die "can't create write-able db object: $!\n";
 my $enq = $db->enquire( 'steel');

  printf "Running query '%s'\n", $enq->get_query()->get_description();

  my @matches = $enq->matches(0, 10);

  print scalar(@matches) . " results found\n";

  foreach my $match ( @matches ) {
    my $doc = $match->get_document();
    printf "ID %d %d%% [ %s ] \n", $match->get_docid(),
$match->get_percent(), $doc->get_data();
  }

This returns to me 8 documents which I know is the correct answer becuase I
have made a search engine which gives me the same results. The problem is i
only get the document numbers(ids) but not the content. the
$doc->get_data(); is supposed to give me the content if i am not mistaken.
It isnt doing so. Any help would be appreciated.

my output looks like this:
Running query 'Xapian::Query(steel)'
10 results found
ID 312 100% [ /home/rohit/Desktop/SET/HW/HW1/cranfieldDocs/cranfield0859 ]
ID 1712 100% [ /home/rohit/Desktop/SET/HW/HW1/cranfieldDocs/cranfield0859 ]
ID 513 75% [ /home/rohit/Desktop/SET/HW/HW1/cranfieldDocs/cranfield1178 ]
ID 1913 75% [ /home/rohit/Desktop/SET/HW/HW1/cranfieldDocs/cranfield1178 ]
ID 931 69% [ /home/rohit/Desktop/SET/HW/HW1/cranfieldDocs/cranfield0891 ]
ID 2331 69% [ /home/rohit/Desktop/SET/HW/HW1/cranfieldDocs/cranfield0891 ]
ID 648 68% [ /home/rohit/Desktop/SET/HW/HW1/cranfieldDocs/cranfield1025 ]
ID 2048 68% [ /home/rohit/Desktop/SET/HW/HW1/cranfieldDocs/cranfield1025 ]
ID 27 63% [ /home/rohit/Desktop/SET/HW/HW1/cranfieldDocs/cranfield0856 ]
ID 1427 63% [ /home/rohit/Desktop/SET/HW/HW1/cranfieldDocs/cranfield0856 ]


Instead of [ /home/rohit/Desktop/SET/HW/HW1/cranfieldDocs/cranfield0856 ]  I
want the content of doc no 856.



Thanks,
Rohit.


More information about the Xapian-discuss mailing list