[Xapian-discuss] term iterators

Olly Betts olly at survex.com
Tue Dec 14 16:04:52 GMT 2004


On Tue, Dec 14, 2004 at 12:36:53PM -0300, Georges Dupret wrote:
> I need to iterate over pairs of terms to compute the term correlation
> matrix. My first attempt was:
> 
> for(term1 = db.allterms_begin(); term1 != db.allterms_end(); ++term1){
>  for(term2 = term1 + 1; term2 != db.allterms_end(); ++term2){	
> 	...
> 	}
>   }
> 
> this doesn't work because term1 + 1 is not defined, so I did
> 
> for(term1 = db.allterms_begin(); term1 != db.allterms_end(); ++term1){
> term2 = term1;
> ++term2;
>  for(; term2 != db.allterms_end(); ++term2){	
> 	...
> 	}
>   }
> 
> and to my surprise, incrementing term2 incremented as well term1. Is
> this what is really intended?

Yes.  TermIterators have the semantics of STL input iterators.  If you
copy and increment, using the old iterator gives undefined behaviour
(at present I believe you'll always get both incremented, but that
might change in future).

>Finally, I solved the problem with
> 
> for(term1 = db.allterms_begin(); term1 != db.allterms_end(); ++term1){
>       term2 = db.allterms_begin();    
>       term2.skip_to(*term1);
>       if(term2 == db.allterms_end()){
> 	cerr << "term2 end of list while term1 is '" << *term1 << "'\n";
> 	exit(1);
>       }
>       else
> 	++term2;
>       for(; term2 != db.allterms_end(); ++term2){	
> 	...
> 	}
>   }

That looks about right.  Perhaps we should offer a "clone" method which
creates a separate iterator using allterms_begin() and skip_to().

> Note that if term2 is not set to db.allterms_begin(), the code crashes.
> 
> Is there a more elegant way to iterate over pairs of terms?

If you're manipulating the terms a lot, you could pull them out into a
vector or something first, then use that.  For just iterating twice, I
suspect it's not worthwhile, and sucking everything into memory works
less well if the database is too big, whereas iterating from the disk
table probably degrades more gracefully.

Cheers,
    Olly



More information about the Xapian-discuss mailing list