[Xapian-discuss] Encrypted Database Files
Michael Schlenker
schlenk at uni-oldenburg.de
Tue Jan 24 15:46:00 GMT 2006
David Blewett schrieb:
> Hi all:
>
> I'm considering using Xapian to index email messages in an IMAP server
> I'm writing. Is it possible to encrypt the databases stored on disk, so
> that someone cannot recover their contents?
It all depends on your threat analysis.
Its surely possible to encrypt a file or directory on disk using one of
the loopback encryption filesystem on modern unix systems and or similar
technologies on windows. This basically protects your data when your
system fails and you have to ship the machine to external untrusted
service personal.
If the attacker can read the process memory while the process is running
(root kits, debuggers, creative root users with access to the /proc
filesystem) you are in trouble anyway. You can delay the progress of an
attacker by obscuring the layout of structures and algorithms used, but
a sophisticated attacker can penetrate this. Inspect your favorit virus
or trojan or copy protection/DRM scheme for this kind of coding.
> What I would like to do is when a message is received, send it through
> Xapian to be indexed. Then encrypt the contents and store it. When I run
> a search through Xapian, all I need is some sort of ID so I can retrieve
> the message and decrypt it. I don't want someone to be able to use the
> Xapian database to reconstruct the messages indexed. Is this possible?
You could always filter the stemmed and indexed terms through an
encryption system like AES or DES in ECB (electronic code book) mode
(because you don't have the positional information needed for the other
modes while searching) and store the encrypted stemmed words instead of
the clear text words. This would prevent reconstruction of the messages
from the index as long as the key stays safe. For searching you would
have to encrypt the query after stemming.
You will see a performance hit especially while indexing, because
encryption isn't a cheap operation. Additionally your index will
probably be larger as you are probably using a block cipher which has a
fixed block length and your words may not fill full blocks and have to
be padded with null for encryption. (IIRC AES has 128-bit blocks, so all
words not aligned to a multiple of length eight (for single byte
charsets) will need extra space...).
Michael
More information about the Xapian-discuss
mailing list