[Xapian-tickets] [Xapian] #599: The Omega HTML parser resets contents if a further <body> tag is found
Xapian
nobody at xapian.org
Wed May 16 07:27:17 BST 2012
#599: The Omega HTML parser resets contents if a further <body> tag is found
--------------------+-------------------------------------------------------
Reporter: medoc | Owner: olly
Type: defect | Status: new
Priority: normal | Milestone:
Component: Omega | Version:
Severity: normal | Keywords:
Blockedby: | Platform: All
Blocking: |
--------------------+-------------------------------------------------------
In myhtmlparse.cc around line 81, the omega HTML handler resets the
current content each time an opening <body> tag is found.
Some very malformed HTML files contain several opening <body> tags, and
resetting on further occurrences loses content.
At least Firefox and Opera ignore further <body> tags. Incidentally they
also just ignore closing </body> and </html> tags.
Noticed through a reported Recoll issue (Recoll uses the Omega parser
mostly unmodified), and changed locally.
--
Ticket URL: <http://trac.xapian.org/ticket/599>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list