[Xapian-tickets] [Xapian] #780: Support for epub (was: Support for epub and .pkl file)
Xapian
nobody at xapian.org
Tue Apr 23 05:26:22 BST 2019
#780: Support for epub
----------------------------------------+-----------------------------
Reporter: jugnu | Owner: jugnu
Type: enhancement | Status: reopened
Priority: normal | Milestone: 1.5.0
Component: Omega | Version: 1.4.11
Severity: normal | Resolution:
Keywords: omega file support omindex | Blocked By:
Blocking: | Operating System: Linux
----------------------------------------+-----------------------------
Description changed by jugnu:
Old description:
> Git PR for issue : https://github.com/xapian/xapian/pull/235
>
> EPUB ------
>
> Skipping - unknown MIME type 'application/epub+zip'
> Skipping - unknown MIME type 'application/zip'
>
> .pkl (less priority) -------
>
> I was just playing around with different formats on my computer. I found
> and tested .pkl formatted. file command shows it to be 8086 relocatable
> (Microsoft). Also I ran omindex with .pkl inside, and it fails :
>
> {{{
> iamglass: Skipping - unknown MIME type 'application/octet-stream'
> polarities.pkl: Skipping - unknown MIME type 'application/octet-stream'
> Exception: DatabaseError: Modifications failed (DatabaseError: Error
> reading block 0 (Protocol error)), and couldn't open at the old revision:
> Error reading block 0.
> }}}
New description:
Git PR for issue : https://github.com/xapian/xapian/pull/235
EPUB ------
Skipping - unknown MIME type 'application/epub+zip'
Skipping - unknown MIME type 'application/zip'
Current situation is the omindex is able to index the epub files. However,
there is work needed to perfectly parse the metadeta information correctly
such as overall title, individual chapters, authors etc. There are many
other parts within index_file.cc which does the basic indexing but lacks
perfect metadata parsing. For instance, the formats can be found through
searching for : // FIXME: Implement support for metadata.
To do :
1. Also tests are needed to ensure that epub supports is generalized
across different generation of epubs, their file directory structures
etc..
2. Correct metadata parsing
--
--
Ticket URL: <https://trac.xapian.org/ticket/780#comment:11>
Xapian <https://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list