[Xapian-tickets] [Xapian] #742: Xapian should provide a way to securely remove a document from the database
Xapian
nobody at xapian.org
Thu Dec 8 20:22:09 GMT 2016
#742: Xapian should provide a way to securely remove a document from the database
--------------------+-------------------------
Reporter: dkg | Owner: olly
Type: defect | Status: new
Priority: normal | Milestone:
Component: Other | Version:
Severity: normal | Resolution:
Keywords: | Blocked By:
Blocking: | Operating System: All
--------------------+-------------------------
Comment (by olly):
I think there are five parts to this, in approximately increasing
difficulty (and also roughly decreasing benefit - the first three items
are both pretty key though):
* When a block is compacted to put all the freespace together, we should
zero out the freespace after the compact as it may contain traces of the
items in the block. I'm not sure if this is something to always do, but
it's clearly not horrendously expensive. [`GlassTable::compact()`]
* Blocks are copy on write. When an item is removed but the block isn't
emptied, a copy of that block is made, and the directory at the start of
the block is updated to remove the pointer to the block. We should also
zero out the space it occupied. I'm not sure if this is something to
always do, but it's clearly not horrendously expensive.
[`GlassTable::delete_leaf_item()`] and [`GlassTable::delete_branch_item`]
* When we copy on write, the old version of the block is added to the
freelist as being "freed by revision $new_revision". Such blocks need to
be zeroed out at some point, if they've not been reused. However, we
can't just zero out when added to the freelist - at that point
$new_revision hasn't been committed yet, and even after commit, readers
may be using the old revision. This is related to
`DatabaseModifiedError`, and the plan to support MVCC to help avoid that.
I think this may need a "secure commit" option which goes through blocks
on the freelist and zeros them out (except that it needs to set the
revision in the zeroed blocks to a new revision). This seems like
something that we can't really just always do, since there seems to need
to be some choice as to when.
* When an item at the start or end of a leaf block is deleted, the
dividing key ought to be recomputed, as it may leak some information (at
least for tables where the key is computed from a term rather than just a
document id). I '''think''' it doesn't need recomputing if other items in
the block are deleted, because even though the dividing key might have
originally have been computed based on those, it could also have been
computed from the key which got inserted between that item and the
dividing key. Recomputing dividing keys may be worthwhile to do sometimes
in general - it's something I've wondered before. It might be worth doing
anyway - when a block is COWed, the parent block also needs to be COWed to
update the pointer to the child (but the parent COW is typically done once
for multiple updates to its children). If you want to experiment to see
what the dividing keys reveal, then this command will show all of them for
a table: `xapian-check somedb/postlist f|grep ' --> \['` (`postlist` and
`position` are the interesting ones).
* As you say, there may be more subtle information leaks via the "shape"
of the DB. That's hardest to deal with, except by simply compacting the
database (running compact after each commit would actually address all
your concerns, though it's quite a big hammer). But running compact
regularly would make a lot of sense as it would limit how long any leaked
information could persist for. Some of my plans for the next backend
would probably help here too (I'm thinking that mass updates would get
applied in a [https://en.wikipedia.org/wiki/Log-structured_merge-tree
LSMT]-like manner - that doesn't given an identical DB for a given doc
set, but it should reduce the variance significantly).
I think this has to be a per Database setting, which can only be enabled
at creation or right after compaction (or perhaps enabling it goes through
and zeros out stuff, etc but that's probably comparable work to
compacting). I don't see how this can work on a per-deletion basis or a
per-session basis - if you've not been zeroing from creation, then there
could be copies of the sensitive data all over the place.
--
Ticket URL: <https://trac.xapian.org/ticket/742#comment:1>
Xapian <https://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list