[Xapian-tickets] [Xapian] #761: Implement symbol layout tree from presentation mathml expression

Xapian nobody at xapian.org
Thu May 10 19:36:34 BST 2018


#761: Implement symbol layout tree from presentation mathml expression
---------------------------+--------------------
        Reporter:  gp1308  |      Owner:  gp1308
            Type:  task    |     Status:  new
        Priority:  normal  |  Milestone:
       Component:  Other   |    Version:
        Severity:  normal  |   Keywords:
      Blocked By:          |   Blocking:
Operating System:  All     |
---------------------------+--------------------
 This ticket is to discuss the implementation of symbol layout tree.

 Some background information:

 ** Presentation MathML **

 Presentation MathML is one of the formats to represent math expression in
 documents. Presentation elements are broadly classified into two types:-

 * Token elements
     - mi, mo, mn: these elements correspond to a visible symbol ( like
 number, identifier text, operator(+,/,%) etc.

 * Layout schemata
     - mrow, mfrac, msqrt, mroot, mfenced: these elements are used to
 represent fractions, radicals or group subexpressions.
     - msub, msup, msubsup, munder, mover, mmultiscripts: these elements
 are used to represent script over base.
     - mtable, mtr, mtd: these elements correspond to tables, matrices, and
 vectors.

 ** Symbol layout tree **

 Generally, math expression is a group of symbols (integer, operators,
 summation, integral etc) written on a horizontal line and special
 structure like subscript, superscript, limits on integral, summation
 written on top/bottom.
 The tree is built by traversing from left to right, starting with the
 first symbol. It will be a deep tree with branches representing script or
 radical index.

 Each node in a tree represents either a symbol or grouping construct like
 a table, vector, matrix or parenthesized expression.

 Every node is assigned a label. A label has two parts - node_type and
 value. Node type can be an integer, operator, variable, matrix etc. It
 reflects the value stored in the node. For example, to represent integer 2
 in symbol tree, a node is created with the label `N!2`.

 Every edge represents a spatial relationship between two adjacent symbols.
 For example, if edge type is `next` means two symbols are adjacent on a
 horizontal line, `above` means parent node is base and child node is
 superscript.

 Complete details on symbol layout tree can be found in the wiki:
 https://github.com/guruhegde/xapian-gsoc-diary/blob/master/docs/slt.rst/
 (link to be updated at later point)

 Implementation:

 After considering various options about parsing MathML, I feel it is
 better to implement from our own rather than use the existing XML parser.
 Having studied //rapidxml//(XML parser) code and //MyHtmlParser//(from
 Omega), I felt it can be realized in the time slot allocated.

 Question:

 * Interface of indexing math expression - Do we provide a new interface in
 TermGenerator class (for ex. index_math) or build new API class like
 MathTermGenerator? Please suggest if there is any other way to do it.

 Another option in my mind is if `TermGenerator.index_text` interface is
 used for indexing, if `<math>` term is detected, then text until `</math>`
 term is considered as math expression and input them to math index module.
 (I guess we use `UtfIterator`, so iterator is passed to math index module)

--
Ticket URL: <https://trac.xapian.org/ticket/761>
Xapian <https://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list