<div dir="ltr"><i>I think that perhaps the best way to map this into Xapian is for each<br>
Lucene "segment" to be handled as a database in Xapian, and use the<br>
multi-database support to search them together</i><div class="gmail_extra"><br></div><div class="gmail_extra">Yes, there's two choices at the beginning:<br></div><div class="gmail_extra">1. Using multi-database.<br></div>
<div class="gmail_extra">2. Treat lucene database as a single database.<br></div><div class="gmail_extra">Finally, I choose 2. It's a long time ago, I am not quiet sure why this decision is made, maybe:<br></div><div class="gmail_extra">
1. We can handle multiple lucene databases.<br></div><div class="gmail_extra">2. I am not sure if multi-database can meet the requirements, such as:<br></div><div class="gmail_extra"> Getting a doc_freq(how many documents contains the term) of a particular term, actuallly, I want<br>
</div><div class="gmail_extra"> get sum of doc_freq of a particular term in all lucene segments, I am not sure xapian multi-database do it this way.<br><br></div><div class="gmail_extra">Do you think multi-database is a better way to handle lucene database?<br>
<br><br><i>But we already handle merging allterms lists for multiple databases.</i><br></div><div class="gmail_extra">If term lists are merged, I think it is the most appropriate way to solve this issue.<br></div><div class="gmail_extra">
<br><div class="gmail_quote">2013/10/31 Olly Betts <span dir="ltr"><<a href="mailto:olly@survex.com" target="_blank">olly@survex.com</a>></span><br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
[Replying to xapian-devel, as I think a wider audience would be useful]<br>
<div><br>
On Mon, Oct 21, 2013 at 11:24:51PM +0800, jiangwen jiang wrote:<br>
> yes, it's less efficient. Lucene database has multiple segments, each<br>
> segment can treat as a independent database. The same term may exists in >=<br>
> 1 segments.<br>
<br>
</div>Sorry for taking a while to respond - I've been both busy and mulling<br>
this over.<br>
<br>
I think that perhaps the best way to map this into Xapian is for each<br>
Lucene "segment" to be handled as a database in Xapian, and use the<br>
multi-database support to search them together.<br>
<br>
That's likely to need some adjustments to the multi-database support,<br>
but I think otherwise we'll end up duplicating a lot of that machinery<br>
in the Lucene backend anyway.<br>
<br>
I've not looked at the Lucene file structure with this in mind yet<br>
though - do you see any obvious problems with this approach?<br>
<div><br>
> Xapian::TermIterator it = db_in.allterms_begin();<br>
> This method traverse all terms in the first segment, then the second<br>
> segment, until the last segment.<br>
<br>
</div>Iteration over all terms should return the terms in sorted order (by<br>
byte value) and without duplicates, neither of which is achieved by<br>
handling each segment in turn like this. But we already handle merging<br>
allterms lists for multiple databases.<br>
<br>
Cheers,<br>
Olly<br>
</blockquote></div><br></div></div>