[Xapian-tickets] [Xapian] #595: Allow omega to index Atom feed MIME type
Xapian
nobody at xapian.org
Sat Apr 14 01:05:53 BST 2012
#595: Allow omega to index Atom feed MIME type
-------------------------+--------------------------------------------------
Reporter: mihaibivol | Owner: olly
Type: enhancement | Status: assigned
Priority: normal | Milestone: 1.2.10
Component: Omega | Version: SVN trunk
Severity: minor | Keywords:
Blockedby: | Platform: All
Blocking: |
-------------------------+--------------------------------------------------
Changes (by olly):
* status: new => assigned
* milestone: => 1.2.10
Comment:
OK, I've applied your patch in r16494, along with a few tweaks, and added
a ChangeLog entry in r16495. Thanks for your contribution to Xapian!
I'll attach a patch with just the extra stuff, but here's a quick
breakdown:
* Update copyright headers.
* Fix indentation to 4 space indent, tab-filled with 8 space wide tabs.
* Refactor slightly to avoid the need for text_copy.
* Parse type="html" as utf-8 rather than iso-8859-1 - the RFC doesn't
seem to discuss this, but the XML will usually be utf-8 (we should
probably use the same charset as the Atom file uses now I think about it,
I'll look at that...)
* Parse the HTML as if the charset came from a meta tag, to avoid it
throwing an exception if there's a different character set specified in a
meta tag in it (perhaps we should honour such an override, but we
definitely don't want to die with an uncaught exception which is what
happened before).
* {{{keywords = keywords + ' ' + new_keyword;}}} will create a couple of
unnecessary temporary string objects with most compilers - using {{{+=}}}
or {{{append}}} avoids that.
* If {{{type}}} isn't specified, we should assume {{{type=text}}} rather
than using whatever the last specified type we saw on an earlier element
was.
* Renamed {{{is_escaped}}} to {{{is_ignored}}}
* Remove unneeded {{{#include <iostream>}}}
* We have a preferred include order: {{{<config.h>}}} first, then the
header corresponding to the source file, then other headers from our code
in alphabetical order, then standard/system headers.
* Updated documentation.
* Added atomparsetest with some automated tests.
Marking to backport for 1.2.10.
If you have any good ideas about the HTML charset handling, let me know.
--
Ticket URL: <http://trac.xapian.org/ticket/595#comment:4>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list