[Xapian-devel] Reading a password-protected PDF
Olly Betts
olly at survex.com
Tue Mar 5 06:23:38 GMT 2013
On Wed, Feb 27, 2013 at 03:06:29PM +0800, Zaim Zuhuri wrote:
> I was wondering if it is possible for xapian to read a password-protected
> PDF.
[...]
> 2. all PDF is set with the same password.
> 3. only the content of the PDF is encrypted, not the metadata.
>
> If it is possible could you guys point me in the right direction.
Xapian runs pdftotext to extract text from PDF files, so the question
really is "can pdftotext read a password-protected PDF?"
Looking at pdftotext --help, I see:
-opw <string> : owner password (for encrypted files)
-upw <string> : user password (for encrypted files)
Not sure what the difference is, but I'd try both and see which works.
So I'd try creating a simple wrapper script so when omindex runs
pdftotext it runs your wrapper instead, which runs pdftotext with
extra command line arguments:
#!/bin/sh
exec /usr/bin/pdftotext -upw 'secret-password' "$@"
Save that as (say) /home/zaim/pdftotext-wrapper/pdftotext, then make it
executable and add that directory to PATH before you run omindex:
chmod a+x /home/zaim/pdftotext-wrapper/pdftotext
env PATH="/home/zaim/pdftotext-wrapper:$PATH" omindex [...]
Cheers,
Olly
More information about the Xapian-devel
mailing list