[Xapian-discuss] RE : PHP XapianTermIterator/XapianPositionIterator usage

Menard, Daniel Daniel.Menard at ehesp.fr
Mon Jan 18 10:53:30 GMT 2010


> I've been digging around trying to find sample usage of
> XapianTermIterator/XapianPositionIterator in PHP.
> [...]
> Can someone provide a basic PHP example of how to:
> 
> //pseudo code
> $position_iterator = new XapianPositionIterator();
> $term_iterator     = new XapianTermIterator();
> 
> foreach $term ($position_iterator)
>     foreach $position ($term_iterator($term))
> ...


Hello,

Currently, Xapian does not support native PHP iterators, but using those
supplied by Xapian is easy. Documentation is here:
http://xapian.org/docs/bindings/php/

But, I think that you're missing a level: you can iterate all terms from
the database (termlist), you can get all documents ID for a particuliar
term (postings) and then get all positions of this term in this
document, but AFAIK you can't get directly all positions for a term.

Perhaps the following PHP code which work for me can be useful:

<?php
require_once '/path/to/your/xapian.php'; // adjust the path

$database=new XapianDatabase('/path/to/your/database'); // adjust the
path
dumpTerms($database, '10:test', 100);

function dumpTerms(XapianDatabase $database, $start=false, $max=10)
{
    echo "<pre>\n"; // just in cas this script is ran from the web

    // Get "all terms" iterators from the database
    $terms = $database->allterms_begin();   // XapianTermIterator
    $endTerms = $database->allterms_end();  // XapianTermIterator

    // Skip iterator to $start or the first term after
    if (false !== $start) $terms->skip_to($start);

    // First loop: dump terms
    $nb = 0;
    while (! $terms->equals($endTerms))
    {
        // No more than $max terms
        if ($nb > $max) break;
        
        // Get some info about the current term
        $term = $terms->get_term();
        printf
        (
            "term=%s, freq=%d\n",
            $term,
            $terms->get_termfreq()  // # of docs containing this term
        );

        // Second loop: dump IDs of documents containing this term
        $docs = $database->postlist_begin($term); // PostingIterator
        $endDocs = $database->postlist_end($term); // PostingIterator
        while (! $docs->equals($endDocs))
        {
            printf
            (
                '- doc ID=%d, doc length=%d, wdf=%d, positions=',
                $docs->get_docid(),
                $docs->get_doclength(), // total number of terms in this
doc
                $docs->get_wdf()        // # of occurences of this term
in this doc
            );
            
            // Third loop : dump positions for this particuliar
(term+document)
            $positions = $docs->positionlist_begin(); //
PositionIterator
            $endPositions = $docs->positionlist_end(); //
PositionIterator
            while (! $positions->equals($endPositions))
            {
                printf
                (
                    '%d, ',
                    $positions->get_termpos()
                );

                // Next pos
                $positions->next();
            }
            echo "\n";
            
            // Next doc
            $docs->next();
        }
        
        // Next term
        ++$nb;
        $terms->next();
    }
}
?>

Cheers,

Daniel



More information about the Xapian-discuss mailing list