Glass cursor crash on empty table when doing Honey compact

Bron Gondwana brong at fastmailteam.com
Fri Sep 4 11:57:41 BST 2020


Hi,

This is a fun one!  I've been playing with the following pattern:

* multiple read-only databases being compacted into one.
* combined metadata to add to the destination database which needs to be calculated (language counts and indexer versions)

This is in cyrus imapd.  Existing code is here:

https://github.com/cyrusimap/cyrus-imapd/blob/4376a87773f709516105b84e3cf12b7d55775a55/imap/xapian_wrap.cpp#L522

It does the compact, then opens the target database to fiddle the metadata.  Obviously, this isn't allowed with Honey, so I decided to do the following instead:

* create a new WritableDatabase and write in the metadata I want with a magic prefix on it.
* update the Xapian::Compactor subclass with the magic to pick the value with the prefix and return that (including empty to remove it).  That code is here:

https://github.com/cyrusimap/cyrus-imapd/commit/5ff26ec64197efa568c56fc2189f3f05647b8db2

It works fine still when writing to Glass, but is crashes when writing to Honey:

Program received signal SIGSEGV, Segmentation fault.
Glass::LeafItem_wr::form_key (key_="", this=0x5555555d9308) at backends/glass/glass_table.h:255
255 set_key_len(key_len);
(gdb) bt
#0  Glass::LeafItem_wr::form_key (key_="", this=0x5555555d9308) at backends/glass/glass_table.h:255
#1  GlassTable::form_key (this=this at entry=0x5555555d92d8, key="") at backends/glass/glass_table.cc:1250
#2  0x00007ffff733bfdb in GlassCursor::find_entry_ge (this=this at entry=0x55555558fea0, key="") at backends/glass/glass_cursor.cc:257
#3  0x00007ffff73810e8 in GlassCursor::rewind (this=0x55555558fea0) at /usr/include/c++/9/bits/char_traits.h:300
#4  HoneyCompact::PositionCursor<GlassTable const&>::PositionCursor (offset_=<optimised out>, in=0x5555555d92d8, this=0x55555558fea0)
    at backends/honey/honey_compact.cc:1662
#5  HoneyCompact::merge_positions<HoneyTable, GlassTable const> (out=0x5555555b31c0, inputs=std::vector of length 2, capacity 2 = {...},
    offset=std::vector of length 2, capacity 2 = {...}) at backends/honey/honey_compact.cc:1747
#6  0x00007ffff737367b in HoneyDatabase::compact (compactor=0x7ffff7fc8828 <xapian_compact_dbs::comp>,
    destdir=0x5555555905e0 "/tmpfs/cas/1027490101/search2/c/user/cassandane/xapian.NEW", fd=-1, source_backend=<optimised out>,
    sources=std::vector of length 2, capacity 2 = {...}, offset=std::vector of length 2, capacity 2 = {...}, compaction=Xapian::Compactor::FULLER,
    flags=<optimised out>, last_docid=<optimised out>) at backends/honey/honey_compact.cc:2325
#7  0x00007ffff72ac649 in Xapian::Database::compact_ (this=this at entry=0x7fffffffddf0, output_ptr=output_ptr at entry=0x7fffffffdf60, fd=fd at entry=0, flags=<optimised out>,
    flags at entry=1290, block_size=block_size at entry=0, compactor=compactor at entry=0x7ffff7fc8828 <xapian_compact_dbs::comp>) at /usr/include/c++/9/bits/basic_string.h:2300
#8  0x00007ffff7f6884a in Xapian::Database::compact (compactor=..., block_size=0, flags=1290, output="/tmpfs/cas/1027490101/search2/c/user/cassandane/xapian.NEW",
    this=0x7fffffffddf0) at /usr/local/cyruslibs-v33/include/xapian-1.5/xapian/database.h:818


The issue is, 'p' is NULL:

(gdb) p in
$4 = (const GlassTable *) 0x5555555d92d8
(gdb) p *in
$5 = {tablename = 0x7ffff74b33e9 "position", revision_number = 1, item_count = 0, block_size = 8192, flags = -1, faked_root_block = true, sequential = true,
  handle = -1, level = 0, root = 0, kt = {<Glass::LeafItem_base<unsigned char*>> = {p = 0x0}, <No data fields>}, buffer = 0x0, free_list = {revision = 0,
    first_unused_block = 0, fl = {n = 0, c = 0}, fl_end = {n = 0, c = 0}, flw = {n = 0, c = 0}, flw_appending = false, p = 0x0, pw = 0x0},
  name = "/tmpfs/cas/1027490101/search2/c/user/cassandane/xapian.NEW.META/position <http://xapian.new.meta/position>.", seq_count = 0, changed_n = 0, changed_c = 0, max_item_size = 0,
  Btree_modified = false, full_compaction = false, writable = false, cursor_created_since_last_modification = true, cursor_version = 0, changes_obj = 0x0, C = {{
      data = 0x0, c = -1, rewrite = false}, {data = 0x0, c = -1, rewrite = false}, {data = 0x0, c = -1, rewrite = false}, {data = 0x0, c = -1, rewrite = false}, {
      data = 0x0, c = -1, rewrite = false}, {data = 0x0, c = -1, rewrite = false}, {data = 0x0, c = -1, rewrite = false}, {data = 0x0, c = -1, rewrite = false}, {
      data = 0x0, c = -1, rewrite = false}, {data = 0x0, c = -1, rewrite = false}}, split_p = 0x0, compress_min = 0, comp_stream = {compress_strategy = 0, out_len = 0,
    out = 0x0, deflate_zstream = 0x0, inflate_zstream = 0x0}, lazy = true, last_readahead = 4294967295, offset = 0}

Because the position table is empty!  I'm not sure why 'kt' isn't being initialised.  I patched around it on my testbed as follows:

diff --git a/xapian-core/backends/glass/glass_cursor.cc b/xapian-core/backends/glass/glass_cursor.cc
index c20ff1ab04f8..07c2a58cb7c8 100644
--- a/xapian-core/backends/glass/glass_cursor.cc
+++ b/xapian-core/backends/glass/glass_cursor.cc
@@ -106,6 +106,10 @@ bool
GlassCursor::next()
{
     LOGCALL(DB, bool, "GlassCursor::next", NO_ARGS);
+    if (B->empty()) {
+ is_after_end = true;
+ RETURN(false);
+    }
     Assert(!is_after_end);
     if (B->cursor_version != version) {
// find_entry() will call rebuild().
@@ -144,6 +148,10 @@ bool
GlassCursor::find_entry(const string &key)
{
     LOGCALL(DB, bool, "GlassCursor::find_entry", key);
+    if (B->empty()) {
+        is_after_end = true;
+        RETURN(false);
+    }
     if (B->cursor_version != version) {
rebuild();
     }
@@ -190,6 +198,10 @@ void
GlassCursor::find_entry_lt(const string &key)
{
     LOGCALL_VOID(DB, "GlassCursor::find_entry_lt", key);
+    if (B->empty()) {
+        is_after_end = true;
+        return;
+    }
     if (!find_entry(key)) {
// The entry wasn't found, so find_entry() left us on the entry before
// the one we asked for and we're done.
@@ -213,6 +225,10 @@ bool
GlassCursor::find_exact(const string &key)
{
     LOGCALL(DB, bool, "GlassCursor::find_exact", key);
+    if (B->empty()) {
+        is_after_end = true;
+        RETURN(false);
+    }
     is_after_end = false;
     is_positioned = false;
     if (rare(key.size() > GLASS_BTREE_MAX_KEY_LEN)) {
@@ -238,6 +254,10 @@ bool
GlassCursor::find_entry_ge(const string &key)
{
     LOGCALL(DB, bool, "GlassCursor::find_entry_ge", key);
+    if (B->empty()) {
+        is_after_end = true;
+        RETURN(false);
+    }
     if (B->cursor_version != version) {
rebuild();
     }

But I'm not sure if the right approach is to patch the cursor to not try and build a key if there's no data, or if kt should be initialised.

Or indeed, if the Honey compactor should check for empty tables and just skip them without trying to create a cursor!

What I think I would really like is an API to pass a hash of key/value metadata items into db.compact rather than having to do this workaround, but either way it's bad if we crash on an empty database!

Cheers,

Bron.

--
  Bron Gondwana, CEO, Fastmail Pty Ltd
  brong at fastmailteam.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20200904/be8d2e4f/attachment-0001.htm>


More information about the Xapian-devel mailing list