[Xapian-tickets] [Xapian] #181: Optional Termlist Table
Xapian
nobody at xapian.org
Fri Sep 11 12:25:23 BST 2009
#181: Optional Termlist Table
---------------------------+------------------------------------------------
Reporter: olly | Owner: olly
Type: enhancement | Status: assigned
Priority: high | Milestone: 1.2.0
Component: Backend-Chert | Version: SVN trunk
Severity: minor | Resolution:
Keywords: | Blockedby: 363
Platform: All | Blocking:
---------------------------+------------------------------------------------
Changes (by olly):
* milestone: 1.1.4 => 1.2.0
Old description:
> The termlist table should be optional - without it, documents can't be
> deleted or replaced, and query expansion couldn't work, but most other
> things could be made to work.
>
> Things which will now work:
>
> - Database::alldocs_begin() - no longer uses the termlist.
> - Database::get_doclength() - uses the document lengths stored in the
> postlist.
> - Determining the percentage scores when the top document doesn't match
> all the query terms no longer uses the termlist (was #363).
>
> Things which currently use it:
>
> - Enquire::matching_terms_begin() - we could record this information
> during the match, though it might be hard to do without a speed penalty.
> - WritableDatabase::delete_document() - we could allow this with inexact
> statistics like how lucene does (#388).
> - WritableDatabase::replace_document() if the document exists already
> (again, possible with inexact statistics).
>
> Things which just wouldn't work:
>
> - Database::termlist_begin()
> - Document::termlist_begin()
> - Document::termlist_count()
> - Enquire::get_eset()
New description:
The termlist table should be optional - without it, documents can't be
deleted or replaced, and query expansion couldn't work, but most other
things could be made to work.
Things which will now work:
- Database::alldocs_begin() - no longer uses the termlist.
- Database::get_doclength() - uses the document lengths stored in the
postlist.
- Determining the percentage scores when the top document doesn't match
all the query terms no longer uses the termlist (was #363).
Things which currently use it:
- Enquire::matching_terms_begin() - we could record this information
during the match, though it might be hard to do without a speed penalty.
- WritableDatabase::delete_document() - we could allow this with inexact
statistics like how lucene does (#388).
- WritableDatabase::replace_document() if the document exists already
(again, possible with inexact statistics).
- At least currently, chert stores the list of which values are used in
the termlist table, so things like iterating the values in a document
require it. Not sure if this argues for putting this data elsewhere or
not.
Things which just wouldn't work:
- Database::termlist_begin()
- Document::termlist_begin()
- Document::termlist_count()
- Enquire::get_eset()
--
Comment:
I've added support for chert databases without a termlist table in r13488.
xapian-check handles them, as does xapian-compact (and trying to merge
databases when some have termlists and some don't generates output without
a termlist and a message explaining this).
Currently the only way to create such a database is to create a chert
database and do "rm termlist.*".
There's also no explicit test coverage for this yet.
But with what is now in place, we can add better test coverage and support
for generating such databases via the API in 1.2.x and they'll work with
1.2.0. So I'm updating the milestone.
--
Ticket URL: <http://trac.xapian.org/ticket/181#comment:16>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list