[Xapian-devel] Reading a password-protected PDF

Olly Betts olly at survex.com
Tue Mar 5 06:23:38 GMT 2013


On Wed, Feb 27, 2013 at 03:06:29PM +0800, Zaim Zuhuri wrote:
> I was wondering if it is possible for xapian to read a password-protected
> PDF.
[...]
> 2. all PDF is set with the same password.
> 3. only the content of the PDF is encrypted, not the metadata.
> 
> If it is possible could you guys point me in the right direction.

Xapian runs pdftotext to extract text from PDF files, so the question
really is "can pdftotext read a password-protected PDF?"

Looking at pdftotext --help, I see:

  -opw <string>     : owner password (for encrypted files)
  -upw <string>     : user password (for encrypted files)

Not sure what the difference is, but I'd try both and see which works.

So I'd try creating a simple wrapper script so when omindex runs
pdftotext it runs your wrapper instead, which runs pdftotext with
extra command line arguments:

#!/bin/sh
exec /usr/bin/pdftotext -upw 'secret-password' "$@"

Save that as (say) /home/zaim/pdftotext-wrapper/pdftotext, then make it
executable and add that directory to PATH before you run omindex:

chmod a+x /home/zaim/pdftotext-wrapper/pdftotext

env PATH="/home/zaim/pdftotext-wrapper:$PATH" omindex [...]

Cheers,
    Olly



More information about the Xapian-devel mailing list