[Xapian-discuss] Encrypted Database Files

Michael Schlenker schlenk at uni-oldenburg.de
Tue Jan 24 15:46:00 GMT 2006


David Blewett schrieb:
> Hi all:
> 
> I'm considering using Xapian to index email messages in an IMAP server 
> I'm writing. Is it possible to encrypt the databases stored on disk, so 
> that someone cannot recover their contents?

It all depends on your threat analysis.

Its surely possible to encrypt a file or directory on disk using one of 
the loopback encryption filesystem on modern unix systems and or similar 
technologies on windows. This basically protects your data when your 
system fails and you have to ship the machine to external untrusted 
service personal.

If the attacker can read the process memory while the process is running 
(root kits, debuggers, creative root users with access to the /proc 
filesystem) you are in trouble anyway. You can delay the progress of an 
attacker by obscuring the layout of structures and algorithms used, but 
a sophisticated attacker can penetrate this. Inspect your favorit virus 
or trojan or copy protection/DRM scheme for this kind of coding.

> What I would like to do is when a message is received, send it through 
> Xapian to be indexed. Then encrypt the contents and store it. When I run 
> a search through Xapian, all I need is some sort of ID so I can retrieve 
> the message and decrypt it. I don't want someone to be able to use the 
> Xapian database to reconstruct the messages indexed. Is this possible?  

You could always filter the stemmed and indexed terms through an 
encryption system like AES or DES in ECB (electronic code book) mode 
(because you don't have the positional information needed for the other 
modes while searching) and store the encrypted stemmed words instead of 
the clear text words. This would prevent reconstruction of the messages 
from the index as long as the key stays safe. For searching you would 
have to encrypt the query after stemming.

You will see a performance hit especially while indexing, because 
encryption isn't a cheap operation. Additionally your index will 
probably be larger as you are probably using a block cipher which has a 
fixed block length and your words may not fill full blocks and have to 
be padded with null for encryption. (IIRC AES has 128-bit blocks, so all 
words not aligned to a multiple of length eight (for single byte 
charsets) will need extra space...).

Michael










More information about the Xapian-discuss mailing list