[Xapian-discuss] Feature request: Ligthen pressure on backup
Jesper Krogh
jesper at krogh.cc
Mon Mar 24 06:07:38 GMT 2008
Hi.
This i a small feature request for Xapian. Currently I have a
xapian-database with >5m records, the files fills around 124GB in the
Xapian catalog. With a few "quite large" files:
# du -sh *
0 flintlock
4.0K iamflint
1000K position.baseA
63G position.DB
716K postlist.baseA
624K postlist.baseB
45G postlist.DB
8.0K record.baseA
385M record.DB
240K termlist.baseA
15G termlist.DB
12K value.baseB
696M value.DB
(And it is my impression that I have a quite small record.DB-file)
The idea comes from PostgreSQL's filesystem layout, it has a (probably
historic) filesize of 2GB, but it helps the backup significantly.
This layout, gives some "challenges" to backup systems since the daily
incremental runs basically now has to backup the complete set => 124GB
even if only a single new document has been merged.
The suggesting would be to split the files in several smaller files. I
know that the algorithms for searching the binary trees probably would
be a bit more complex, but it could result in that changes only touches
a subset of the files, thus letting the backup proceed easier.
Another solution could be to let Xapian query several databases and
"merge" the result. Then I could make a new database each day and merge
once a week (or another timepattern that would fit the purpose).
Other suggestions are welcome.
Thanks.
Jesper
--
Jesper
More information about the Xapian-discuss
mailing list