[Xapian-tickets] [Xapian] #181: Optional Termlist Table

Xapian nobody at xapian.org
Fri Sep 11 12:25:23 BST 2009


#181: Optional Termlist Table
---------------------------+------------------------------------------------
 Reporter:  olly           |        Owner:  olly     
     Type:  enhancement    |       Status:  assigned 
 Priority:  high           |    Milestone:  1.2.0    
Component:  Backend-Chert  |      Version:  SVN trunk
 Severity:  minor          |   Resolution:           
 Keywords:                 |    Blockedby:  363      
 Platform:  All            |     Blocking:           
---------------------------+------------------------------------------------
Changes (by olly):

  * milestone:  1.1.4 => 1.2.0


Old description:

> The termlist table should be optional - without it, documents can't be
> deleted or replaced, and query expansion couldn't work, but most other
> things could be made to work.
>
> Things which will now work:
>
>  - Database::alldocs_begin() - no longer uses the termlist.
>  - Database::get_doclength() - uses the document lengths stored in the
> postlist.
>  - Determining the percentage scores when the top document doesn't match
> all the query terms no longer uses the termlist (was #363).
>
> Things which currently use it:
>
>  - Enquire::matching_terms_begin() - we could record this information
> during the match, though it might be hard to do without a speed penalty.
>  - WritableDatabase::delete_document() - we could allow this with inexact
> statistics like how lucene does (#388).
>  - WritableDatabase::replace_document() if the document exists already
> (again, possible with inexact statistics).
>
> Things which just wouldn't work:
>
>  - Database::termlist_begin()
>  - Document::termlist_begin()
>  - Document::termlist_count()
>  - Enquire::get_eset()

New description:

 The termlist table should be optional - without it, documents can't be
 deleted or replaced, and query expansion couldn't work, but most other
 things could be made to work.

 Things which will now work:

  - Database::alldocs_begin() - no longer uses the termlist.
  - Database::get_doclength() - uses the document lengths stored in the
 postlist.
  - Determining the percentage scores when the top document doesn't match
 all the query terms no longer uses the termlist (was #363).

 Things which currently use it:

  - Enquire::matching_terms_begin() - we could record this information
 during the match, though it might be hard to do without a speed penalty.
  - WritableDatabase::delete_document() - we could allow this with inexact
 statistics like how lucene does (#388).
  - WritableDatabase::replace_document() if the document exists already
 (again, possible with inexact statistics).
  - At least currently, chert stores the list of which values are used in
 the termlist table, so things like iterating the values in a document
 require it.  Not sure if this argues for putting this data elsewhere or
 not.

 Things which just wouldn't work:

  - Database::termlist_begin()
  - Document::termlist_begin()
  - Document::termlist_count()
  - Enquire::get_eset()

--

Comment:

 I've added support for chert databases without a termlist table in r13488.

 xapian-check handles them, as does xapian-compact (and trying to merge
 databases when some have termlists and some don't generates output without
 a termlist and a message explaining this).

 Currently the only way to create such a database is to create a chert
 database and do "rm termlist.*".

 There's also no explicit test coverage for this yet.

 But with what is now in place, we can add better test coverage and support
 for generating such databases via the API in 1.2.x and they'll work with
 1.2.0.  So I'm updating the milestone.

-- 
Ticket URL: <http://trac.xapian.org/ticket/181#comment:16>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list