[Xapian-tickets] [Xapian] #752: Segmentation fault in matcher/queryoptimiser

Xapian nobody at xapian.org
Tue Aug 8 04:55:00 BST 2017


#752: Segmentation fault in matcher/queryoptimiser
---------------------+-------------------------------
 Reporter:  rsto     |             Owner:  olly
     Type:  defect   |            Status:  assigned
 Priority:  normal   |         Milestone:  1.4.5
Component:  Matcher  |           Version:  git master
 Severity:  normal   |        Resolution:
 Keywords:           |        Blocked By:
 Blocking:           |  Operating System:  All
---------------------+-------------------------------
Changes (by olly):

 * status:  new => assigned
 * version:   => git master
 * component:  Other => Matcher
 * milestone:   => 1.4.5


Old description:

> As posted on the xapian-devel mailing list on July 31, 2017:
>
> Since a couple of weeks we are experiencing occasional segmentation
> faults within Xapian 1.5. We can't reproduce the crashes, but we have
> strong hints that they are due to memory corruption. We have narrowed
> down our root cause analysis to phrase searches on multi-databases that
> fail on reading the `hint` field in the`QueryOptimiser`class [1].
>
> We'd appreciate any hints on how to fix this. I've written up our
> findings and solution attempts below. Should we post this on trac?
>
> Our findings so far
> ================
>
> In a core dump we see that calling the `open_nearby_postlist` function
> on the `hint` variable [2] falls of a cliff, resulting in a segfault:
>
> {{{
>     (gdb) bt 2
>     #0  0x000000000001eaa1 in ?? ()
>     #1  0x00007fa19d09231f in LocalSubMatch::open_post_list
>     (this=0x13527d0, term=..., wqf=1, factor=1,
>         need_positions=<optimized out>, in_synonym=<optimized out>,
>         qopt=0x7ffe66370940, lazy_weight=false)
>         at matcher/localsubmatch.cc:289
> }}}
>
> the line at localsubmatch.cc:289 is
>
> {{{
>     pl = hint->open_nearby_postlist(term);
> }}}
>
> Unfortunately, the compiler had optimised a lot of debugging information
> away. Still, it's clear where the system crashed.
>
> Following a similar crash and re-running the query we could not
> reproduce the crash, but valgrind catched a read-after-free on the
> `hint` field (full valgrind log attached):
>
> {{{
>     ==2265126== Invalid read of size 8
>     ==2265126==    at 0x9A6B313:
>     LocalSubMatch::open_post_list(std::string const&, unsigned int,
>     double, bool, bool, QueryOptimiser*, bool) (localsubmatch.cc:289)
> }}}
>
> which got free'd in this codepath:
>
> {{{
>     ==2265126==  Address 0xb3fdfd0 is 0 bytes inside a block of size 216
>     free'd
>     ==2265126==    at 0x4C2A360: operator delete(void*) (in
>     /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>     ==2265126==    by 0x99C0B75: operator() (queryinternal.cc:65)
>     ==2265126==    by 0x99C0B75:
> for_each<__gnu_cxx::__normal_iterator<Xapian::PostingIterator::Internal**,
>     std::vector<Xapian::PostingIterator::Internal*> >,
>     delete_ptr<Xapian::PostingIterator::Internal> > (stl_algo.h:3755)
>     ==2265126==    by 0x99C0B75: Xapian::Internal::Context::reset()
>     (queryinternal.cc:155)
>     ==2265126==    by 0x99C481F:
>     Xapian::Internal::QueryWindowed::postlist_windowed(Xapian::Query::op,
>     Xapian::Internal::AndContext&, QueryOptimiser*, double) const
>     (queryinternal.cc:1668)
> }}}
>
> which is triggered by this condition in queryinternal.cc#L1665 ([2]);
>
> {{{
>         if (!qopt->db.has_positions()) {
>         // No positions in this subdatabase so this matches nothing,
>         // which means the whole andcontext matches nothing.
>         ctx.reset();
>         return;
>     }
> }}}
>
> How to fix this
> ============
> Here is what I believe is happening:
>
> We are using subdatabases (all glass) for caching recent database
> additions, and some of these do not contain positional information. This
> makes the internal query code reset the AND-context, which in effect
> frees its postlist. One of the postlist entries is still pointed at by
> the `hint` field of QueryOptimiser from a previous submatch, and the
> next call to `get_hint_postlist` in queryoptimiser.h#L106 references the
> bogus address.
>
> A fix to avoid this is simple: just reset the QueryOptimiser hint field
> when free'ing the context postlist. I've written a one-liner for this
> here [3]. But I'm not yet convinced that's all there is: could the hint
> field be used already somewhere else? Should we probably keep it and
> make QueryOptimiser take ownership?
>
> [1]
> https://github.com/xapian/xapian/blob/master/xapian-
> core/matcher/queryoptimiser.h#L51
> [2]
> https://github.com/xapian/xapian/blob/master/xapian-
> core/api/queryinternal.cc#L1665
> [3]
> https://github.com/rsto/xapian/commit/3e7d65b25eef00347f5c764af5ff4d770433ac9b
> [4]
> https://github.com/xapian/xapian/blob/master/xapian-
> core/matcher/queryoptimiser.h#L106

New description:

 As posted on the xapian-devel mailing list on July 31, 2017:

 Since a couple of weeks we are experiencing occasional segmentation
 faults within Xapian 1.5. We can't reproduce the crashes, but we have
 strong hints that they are due to memory corruption. We have narrowed
 down our root cause analysis to phrase searches on multi-databases that
 fail on reading the `hint` field in the`QueryOptimiser`class [1].

 We'd appreciate any hints on how to fix this. I've written up our
 findings and solution attempts below. Should we post this on trac?

 = Our findings so far =

 In a core dump we see that calling the `open_nearby_postlist` function
 on the `hint` variable [2] falls of a cliff, resulting in a segfault:

 {{{
     (gdb) bt 2
     #0  0x000000000001eaa1 in ?? ()
     #1  0x00007fa19d09231f in LocalSubMatch::open_post_list
     (this=0x13527d0, term=..., wqf=1, factor=1,
         need_positions=<optimized out>, in_synonym=<optimized out>,
         qopt=0x7ffe66370940, lazy_weight=false)
         at matcher/localsubmatch.cc:289
 }}}

 the line at localsubmatch.cc:289 is

 {{{
     pl = hint->open_nearby_postlist(term);
 }}}

 Unfortunately, the compiler had optimised a lot of debugging information
 away. Still, it's clear where the system crashed.

 Following a similar crash and re-running the query we could not
 reproduce the crash, but valgrind catched a read-after-free on the
 `hint` field (full valgrind log attached):

 {{{
     ==2265126== Invalid read of size 8
     ==2265126==    at 0x9A6B313:
     LocalSubMatch::open_post_list(std::string const&, unsigned int,
     double, bool, bool, QueryOptimiser*, bool) (localsubmatch.cc:289)
 }}}

 which got free'd in this codepath:

 {{{
     ==2265126==  Address 0xb3fdfd0 is 0 bytes inside a block of size 216
     free'd
     ==2265126==    at 0x4C2A360: operator delete(void*) (in
     /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
     ==2265126==    by 0x99C0B75: operator() (queryinternal.cc:65)
     ==2265126==    by 0x99C0B75:
 for_each<__gnu_cxx::__normal_iterator<Xapian::PostingIterator::Internal**,
     std::vector<Xapian::PostingIterator::Internal*> >,
     delete_ptr<Xapian::PostingIterator::Internal> > (stl_algo.h:3755)
     ==2265126==    by 0x99C0B75: Xapian::Internal::Context::reset()
     (queryinternal.cc:155)
     ==2265126==    by 0x99C481F:
     Xapian::Internal::QueryWindowed::postlist_windowed(Xapian::Query::op,
     Xapian::Internal::AndContext&, QueryOptimiser*, double) const
     (queryinternal.cc:1668)
 }}}

 which is triggered by this condition in queryinternal.cc#L1665 ([2]);

 {{{
         if (!qopt->db.has_positions()) {
         // No positions in this subdatabase so this matches nothing,
         // which means the whole andcontext matches nothing.
         ctx.reset();
         return;
     }
 }}}

 = How to fix this =

 Here is what I believe is happening:

 We are using subdatabases (all glass) for caching recent database
 additions, and some of these do not contain positional information. This
 makes the internal query code reset the AND-context, which in effect
 frees its postlist. One of the postlist entries is still pointed at by
 the `hint` field of `QueryOptimiser` from a previous submatch, and the
 next call to `get_hint_postlist` in queryoptimiser.h#L106 references the
 bogus address.

 A fix to avoid this is simple: just reset the `QueryOptimiser` hint field
 when free'ing the context postlist. I've written a one-liner for this
 here [3]. But I'm not yet convinced that's all there is: could the hint
 field be used already somewhere else? Should we probably keep it and
 make `QueryOptimiser` take ownership?

 [1]
 https://github.com/xapian/xapian/blob/master/xapian-
 core/matcher/queryoptimiser.h#L51
 [2]
 https://github.com/xapian/xapian/blob/master/xapian-
 core/api/queryinternal.cc#L1665
 [3]
 https://github.com/rsto/xapian/commit/3e7d65b25eef00347f5c764af5ff4d770433ac9b
 [4]
 https://github.com/xapian/xapian/blob/master/xapian-
 core/matcher/queryoptimiser.h#L106

--

--
Ticket URL: <https://trac.xapian.org/ticket/752#comment:1>
Xapian <https://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list