[Xapian-tickets] [Xapian] #742: Xapian should provide a way to securely remove a document from the database

Xapian nobody at xapian.org
Fri Dec 2 21:13:13 GMT 2016


#742: Xapian should provide a way to securely remove a document from the database
---------------------------+------------------
        Reporter:  dkg     |      Owner:  olly
            Type:  defect  |     Status:  new
        Priority:  normal  |  Milestone:
       Component:  Other   |    Version:
        Severity:  normal  |   Keywords:
      Blocked By:          |   Blocking:
Operating System:  All     |
---------------------------+------------------
 currently, if i remove a document from a xapian index, the indexed terms
 remain in the db, but are marked as part of the freelist.

 This means that removal of a document is "insecure" in the sense that if
 someone gained access to the index after message deletion, they could
 recover information about the document by inspecting the contents of the
 freelist.

 There may be other traces of a document that are retained in the index as
 well: for example, on IRC, olly mentioned:

 > oh, there's one awkward thing in the backend stuff -- dividing keys get
 created in the branch levels based on the leaf level keys around where the
 block is split

 Some of these fixes may be easier to do than others.

 For example, it might be pretty easy to zero blocks when they're returned
 to the freelist, but it might be harder to deal with the dividing keys.
 It's still worth fixing the easy parts, even if some harder challenges
 remain.

 Another way to think about the problem is one of "index reproducibility"
 -- if an index contains exactly the same set of documents as another
 index, a byte-for-byte identical data store on disk is the ideal.  Any
 divergence from that ideal leaks some information about documents that
 have been added to the database in the past, and then subseqently removed.

 It's possible that any of these fixes incur a cost that some people are
 reluctant to pay (e.g. they're not concerned about the confidentiality of
 any of their indexed documents, or they're confident in the long-term
 confidentiality of the index itself for other reasons).  So it seems
 likely that the feature needs to be optional.  Whether the choice of
 feature is opt-in or opt-out; and whether the choice is made done on a
 per-deletion basis, or a per-database basis, or a per-xapian-session
 basis, i don't know.

 I'm happy to review API proposals if that'd be useful.

--
Ticket URL: <https://trac.xapian.org/ticket/742>
Xapian <https://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list