[Xapian-tickets] [Xapian] #743: omindex: delay libmagic checks
Xapian
nobody at xapian.org
Tue Dec 6 23:18:20 GMT 2016
#743: omindex: delay libmagic checks
---------------------------+------------------------
Reporter: olly | Owner: olly
Type: defect | Status: new
Priority: normal | Milestone: 1.4.x
Component: Omega | Version: git master
Severity: normal | Keywords:
Blocked By: | Blocking:
Operating System: All |
---------------------------+------------------------
Currently omindex's logic is:
* map the extension to a mime type
* if "ignore" or "skip" move on to next file
* if no mapping, call libmagic to get a mime type
* if libmagic doesn't recognise the file, move on to next file
* check file size (requires `stat()` call, which we have avoided so far if
the file system returns `d_type` from `readdir()`)
* if 0 or > max_size then move on to next file
* create `Document` object and set up a little
* check timestamps from `stat()` and the DB for an existing entry and move
on to next file if this has been indexed and hasn't changed
* check for failed entry in DB and move on if we already tried and failed
(needs file size and last mod from `stat()`)
The ordering here isn't ideal - in particular:
* The probing done by libmagic is potentially fairly expensive since it
has to open and read the start of the file, so we should avoid calling
libmagic if another cheap check which doesn't need the mime type could
reject the file (e.g. size, possibly timestamps if we can uncouple those
checks from the check for the existing DB entry). If we have a mapping
checking the mimetype for "ignore" or "skip" is still a cheap early check.
* We create and setup the `Document` object a bit early (though this
shouldn't be very expensive).
--
Ticket URL: <https://trac.xapian.org/ticket/743>
Xapian <https://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list