02 February 2025

Backing up Sqlite Database with Borg and De-duplication

I have a reasonably large sqlite database that is fine with daily backups as losing days worth of data would not be terrible. I'd also like to keep 5 to 7 days of backups, "just in case".

The current backup strategy is to create a date stamped file for each day, gzip, delete the oldest and sync the lot to an S3 like store.

This works, but each daily backup is mostly a duplicate of the previous, and the backups are not encrypted.

So I thought it would be interesting to see how Borg Backup behaves with this sort of backup.

The database in question is about 325MB gzipped, and 1.8GB uncompressed.

Borg and Gzipped Databases

For my first try, I took 5 daily gzipped database backups, and added them to borg in turn:

borg create --stats /home/sodonnell/Downloads/backup/compressed_dbs::1st ./current
...
Utilization of max. archive size: 0%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:              340.64 MB            320.21 MB            320.21 MB
All archives:              340.64 MB            320.21 MB            320.22 MB

                       Unique chunks         Total chunks
Chunk index:                     146                  146

Then I added the next 4 databases:

------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:              342.98 MB            322.51 MB            322.51 MB
All archives:                1.71 GB              1.61 GB              1.61 GB

                       Unique chunks         Total chunks
Chunk index:                     667                  667
------------------------------------------------------------------------------

Notice there are 667 total and unique chunks, so with these gzipped files, Borg is not able to de-duplicate any of the backups, even though the databases are mostly identical.

Borg and Uncompressed Databases

Next I repeated the test, adding each database in turn without gzipping them first. After adding the same 5 DBs the stats looked like:

------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:                1.93 GB            553.39 MB              8.00 MB
All archives:                9.62 GB              2.76 GB            590.26 MB

                       Unique chunks         Total chunks
Chunk index:                     787                 3699
------------------------------------------------------------------------------

This looks much better. The compressed size is less than with gzip, but deduplication is working. The total size is 590MB vs 1.6GB compared to the previous attempt.

Experimenting With Compression

The above test used the default compression, which is LZ4 - fast but not great compression rates. Borg supports different compression types, so I performed a few tests with zstd, starting with level 15:

borg create --stats --compression zstd,15 /home/sodonnell/Downloads/backup/uncompressed_zstd15::1st ./current
------------------------------------------------------------------------------
Repository: /home/sodonnell/Downloads/backup/uncompressed_zstd15
Archive name: 1st
Archive fingerprint: 05bc3fb4666357daa2da63094213e933b7104bd7c29fb60e8e5c5db2e66fde85
Time (start): Wed, 2025-01-29 21:45:03
Time (end):   Wed, 2025-01-29 21:48:01
Duration: 2 minutes 57.69 seconds
Number of files: 1
Utilization of max. archive size: 0%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:                1.92 GB            225.38 MB            225.38 MB
All archives:                1.92 GB            225.38 MB            225.41 MB

                       Unique chunks         Total chunks
Chunk index:                     747                  747
------------------------------------------------------------------------------

Repeating the test at level 10:

borg create --stats --compression zstd,10 /home/sodonnell/Downloads/backup/uncompressed_zstd15::1st ./current
------------------------------------------------------------------------------
Repository: /home/sodonnell/Downloads/backup/uncompressed_zstd15
Archive name: 1st
Archive fingerprint: dd26f83b534330b4a4444c42e5f74c6a390a280d1ebbc833a63c016c8e2c3407
Time (start): Wed, 2025-01-29 21:48:42
Time (end):   Wed, 2025-01-29 21:49:51
Duration: 1 minutes 9.46 seconds
Number of files: 1
Utilization of max. archive size: 0%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:                1.92 GB            225.64 MB            225.64 MB
All archives:                1.92 GB            225.64 MB            225.67 MB

                       Unique chunks         Total chunks
Chunk index:                     721                  721
------------------------------------------------------------------------------

And then level 5:

borg create --stats --compression zstd,5 /home/sodonnell/Downloads/backup/uncompressed_zstd15::1st ./current
Enter passphrase for key /home/sodonnell/Downloads/backup/uncompressed_zstd15:
------------------------------------------------------------------------------
Repository: /home/sodonnell/Downloads/backup/uncompressed_zstd15
Archive name: 1st
Archive fingerprint: 4f7f10988eef17bc97f3052efe42d2641686b7f08e3ead4352a4872ed6b4a27b
Time (start): Wed, 2025-01-29 21:50:43
Time (end):   Wed, 2025-01-29 21:51:19
Duration: 35.35 seconds
Number of files: 1
Utilization of max. archive size: 0%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:                1.92 GB            267.63 MB            267.63 MB
All archives:                1.92 GB            267.63 MB            267.66 MB

                       Unique chunks         Total chunks
Chunk index:                     751                  751
------------------------------------------------------------------------------

So level 15 took nearly 3 minutes compressing to 225MB.

Level 10 was faster at 1m 9s, also compressing to 225MB.

Level 5, took only 38 seconds and compressed to 267MB, which is significantly faster for not much gain in size.

Compression and the Next File

Borg operates by splitting a file into chunks and then hashing each chunk. Each hash is checked against previous hashes to see if it already exists. Only if it does not exist, is it compressed and encrypted. Therefore, even with expensive compression, adding a second mostly duplicate file should be much faster. Adding the second database copy at level 15 compression:

------------------------------------------------------------------------------
Repository: /home/sodonnell/Downloads/backup/uncompressed_zstd15
Archive name: 2nd
Archive fingerprint: 7d2e0b837ba60ccea38c69c4d45b23b0ebebae4aa7d5aeed3bf1995bdfd0b5cd
Time (start): Wed, 2025-01-29 22:40:52
Time (end):   Wed, 2025-01-29 22:41:04
Duration: 11.99 seconds
Number of files: 1
Utilization of max. archive size: 0%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:                1.92 GB            225.62 MB              2.78 MB
All archives:                3.84 GB            450.85 MB            228.08 MB

                       Unique chunks         Total chunks
Chunk index:                     725                 1428
------------------------------------------------------------------------------

So 12 seconds compared to 3 minutes for the first file into the archive.

MySQL Dump

Keeping with the subject of database backups, a basic way to backup mysql is using mysqldump, which writes to stdout. Borg can create an archive with a single file reading from stdin, eg:

mysqldump <options> | borg create --compression zsdt,5 --stdin-name 'mysql-dump.sql' repo::archive -

Conclusion

Despite the sqlite databases being mostly identical, gzipping them before adding to Borg yields files which have no duplication and results in the largest archives.

For my database (mostly new inserts, few if any deletes), de-duplication works very well, and even level 5 zstd compression beats gzip. Even using expensive compression only hurts on the first file. Later nearly duplicate files are stored much more quickly.

I suspect mileage will vary depending on how much the database changes between backups, so running a few tests for a specific database would help yield the best compression and way to use Borg for a particular use case.