[Xapian-tickets] [Xapian] #742: Xapian should provide a way to securely remove a document from the database

Xapian nobody at xapian.org
Thu Dec 8 20:22:09 GMT 2016


#742: Xapian should provide a way to securely remove a document from the database
--------------------+-------------------------
 Reporter:  dkg     |             Owner:  olly
     Type:  defect  |            Status:  new
 Priority:  normal  |         Milestone:
Component:  Other   |           Version:
 Severity:  normal  |        Resolution:
 Keywords:          |        Blocked By:
 Blocking:          |  Operating System:  All
--------------------+-------------------------

Comment (by olly):

 I think there are five parts to this, in approximately increasing
 difficulty (and also roughly decreasing benefit - the first three items
 are both pretty key though):

  * When a block is compacted to put all the freespace together, we should
 zero out the freespace after the compact as it may contain traces of the
 items in the block.  I'm not sure if this is something to always do, but
 it's clearly not horrendously expensive. [`GlassTable::compact()`]
  * Blocks are copy on write.  When an item is removed but the block isn't
 emptied, a copy of that block is made, and the directory at the start of
 the block is updated to remove the pointer to the block.  We should also
 zero out the space it occupied.  I'm not sure if this is something to
 always do, but it's clearly not horrendously expensive.
 [`GlassTable::delete_leaf_item()`] and [`GlassTable::delete_branch_item`]
  * When we copy on write, the old version of the block is added to the
 freelist as being "freed by revision $new_revision".  Such blocks need to
 be zeroed out at some point, if they've not been reused.  However, we
 can't just zero out when added to the freelist - at that point
 $new_revision hasn't been committed yet, and even after commit, readers
 may be using the old revision.  This is related to
 `DatabaseModifiedError`, and the plan to support MVCC to help avoid that.
 I think this may need a "secure commit" option which goes through blocks
 on the freelist and zeros them out (except that it needs to set the
 revision in the zeroed blocks to a new revision).  This seems like
 something that we can't really just always do, since there seems to need
 to be some choice as to when.
  * When an item at the start or end of a leaf block is deleted, the
 dividing key ought to be recomputed, as it may leak some information (at
 least for tables where the key is computed from a term rather than just a
 document id).  I '''think''' it doesn't need recomputing if other items in
 the block are deleted, because even though the dividing key might have
 originally have been computed based on those, it could also have been
 computed from the key which got inserted between that item and the
 dividing key.  Recomputing dividing keys may be worthwhile to do sometimes
 in general - it's something I've wondered before.  It might be worth doing
 anyway - when a block is COWed, the parent block also needs to be COWed to
 update the pointer to the child (but the parent COW is typically done once
 for multiple updates to its children).  If you want to experiment to see
 what the dividing keys reveal, then this command will show all of them for
 a table: `xapian-check somedb/postlist f|grep ' --> \['` (`postlist` and
 `position` are the interesting ones).
  * As you say, there may be more subtle information leaks via the "shape"
 of the DB.  That's hardest to deal with, except by simply compacting the
 database (running compact after each commit would actually address all
 your concerns, though it's quite a big hammer).  But running compact
 regularly would make a lot of sense as it would limit how long any leaked
 information could persist for.  Some of my plans for the next backend
 would probably help here too (I'm thinking that mass updates would get
 applied in a [https://en.wikipedia.org/wiki/Log-structured_merge-tree
 LSMT]-like manner - that doesn't given an identical DB for a given doc
 set, but it should reduce the variance significantly).

 I think this has to be a per Database setting, which can only be enabled
 at creation or right after compaction (or perhaps enabling it goes through
 and zeros out stuff, etc but that's probably comparable work to
 compacting).  I don't see how this can work on a per-deletion basis or a
 per-session basis - if you've not been zeroing from creation, then there
 could be copies of the sensitive data all over the place.

--
Ticket URL: <https://trac.xapian.org/ticket/742#comment:1>
Xapian <https://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list