[Xapian-discuss] RE : PHP XapianTermIterator/XapianPositionIterator usage
Menard, Daniel
Daniel.Menard at ehesp.fr
Mon Jan 18 10:53:30 GMT 2010
> I've been digging around trying to find sample usage of
> XapianTermIterator/XapianPositionIterator in PHP.
> [...]
> Can someone provide a basic PHP example of how to:
>
> //pseudo code
> $position_iterator = new XapianPositionIterator();
> $term_iterator = new XapianTermIterator();
>
> foreach $term ($position_iterator)
> foreach $position ($term_iterator($term))
> ...
Hello,
Currently, Xapian does not support native PHP iterators, but using those
supplied by Xapian is easy. Documentation is here:
http://xapian.org/docs/bindings/php/
But, I think that you're missing a level: you can iterate all terms from
the database (termlist), you can get all documents ID for a particuliar
term (postings) and then get all positions of this term in this
document, but AFAIK you can't get directly all positions for a term.
Perhaps the following PHP code which work for me can be useful:
<?php
require_once '/path/to/your/xapian.php'; // adjust the path
$database=new XapianDatabase('/path/to/your/database'); // adjust the
path
dumpTerms($database, '10:test', 100);
function dumpTerms(XapianDatabase $database, $start=false, $max=10)
{
echo "<pre>\n"; // just in cas this script is ran from the web
// Get "all terms" iterators from the database
$terms = $database->allterms_begin(); // XapianTermIterator
$endTerms = $database->allterms_end(); // XapianTermIterator
// Skip iterator to $start or the first term after
if (false !== $start) $terms->skip_to($start);
// First loop: dump terms
$nb = 0;
while (! $terms->equals($endTerms))
{
// No more than $max terms
if ($nb > $max) break;
// Get some info about the current term
$term = $terms->get_term();
printf
(
"term=%s, freq=%d\n",
$term,
$terms->get_termfreq() // # of docs containing this term
);
// Second loop: dump IDs of documents containing this term
$docs = $database->postlist_begin($term); // PostingIterator
$endDocs = $database->postlist_end($term); // PostingIterator
while (! $docs->equals($endDocs))
{
printf
(
'- doc ID=%d, doc length=%d, wdf=%d, positions=',
$docs->get_docid(),
$docs->get_doclength(), // total number of terms in this
doc
$docs->get_wdf() // # of occurences of this term
in this doc
);
// Third loop : dump positions for this particuliar
(term+document)
$positions = $docs->positionlist_begin(); //
PositionIterator
$endPositions = $docs->positionlist_end(); //
PositionIterator
while (! $positions->equals($endPositions))
{
printf
(
'%d, ',
$positions->get_termpos()
);
// Next pos
$positions->next();
}
echo "\n";
// Next doc
$docs->next();
}
// Next term
++$nb;
$terms->next();
}
}
?>
Cheers,
Daniel
More information about the Xapian-discuss
mailing list