[Xapian-tickets] [Xapian] #595: Allow omega to index Atom feed MIME type

Xapian nobody at xapian.org
Sat Apr 14 01:05:53 BST 2012


#595: Allow omega to index Atom feed MIME type
-------------------------+--------------------------------------------------
 Reporter:  mihaibivol   |       Owner:  olly     
     Type:  enhancement  |      Status:  assigned 
 Priority:  normal       |   Milestone:  1.2.10   
Component:  Omega        |     Version:  SVN trunk
 Severity:  minor        |    Keywords:           
Blockedby:               |    Platform:  All      
 Blocking:               |  
-------------------------+--------------------------------------------------
Changes (by olly):

  * status:  new => assigned
  * milestone:  => 1.2.10


Comment:

 OK, I've applied your patch in r16494, along with a few tweaks, and added
 a ChangeLog entry in r16495.  Thanks for your contribution to Xapian!

 I'll attach a patch with just the extra stuff, but here's a quick
 breakdown:

  * Update copyright headers.
  * Fix indentation to 4 space indent, tab-filled with 8 space wide tabs.
  * Refactor slightly to avoid the need for text_copy.
  * Parse type="html" as utf-8 rather than iso-8859-1 - the RFC doesn't
 seem to discuss this, but the XML will usually be utf-8 (we should
 probably use the same charset as the Atom file uses now I think about it,
 I'll look at that...)
  * Parse the HTML as if the charset came from a meta tag, to avoid it
 throwing an exception if there's a different character set specified in a
 meta tag in it (perhaps we should honour such an override, but we
 definitely don't want to die with an uncaught exception which is what
 happened before).
  * {{{keywords = keywords + ' ' + new_keyword;}}} will create a couple of
 unnecessary temporary string objects with most compilers - using {{{+=}}}
 or {{{append}}} avoids that.
  * If {{{type}}} isn't specified, we should assume {{{type=text}}} rather
 than using whatever the last specified type we saw on an earlier element
 was.
  * Renamed {{{is_escaped}}} to {{{is_ignored}}}
  * Remove unneeded {{{#include <iostream>}}}
  * We have a preferred include order: {{{<config.h>}}} first, then the
 header corresponding to the source file, then other headers from our code
 in alphabetical order, then standard/system headers.
  * Updated documentation.
  * Added atomparsetest with some automated tests.

 Marking to backport for 1.2.10.

 If you have any good ideas about the HTML charset handling, let me know.

-- 
Ticket URL: <http://trac.xapian.org/ticket/595#comment:4>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list