Skip to content
This repository has been archived by the owner on Mar 9, 2019. It is now read-only.

database file size not updating? #308

Closed
ghost opened this issue Feb 18, 2015 · 5 comments
Closed

database file size not updating? #308

ghost opened this issue Feb 18, 2015 · 5 comments

Comments

@ghost
Copy link

ghost commented Feb 18, 2015

First off, thanks for putting together a great DB. I've been using it on several projects and like it a lot.

I did notice today that I have two databases with the same number of rows: 30,974
I was amazed to see that their file sizes are exactly the same: 134,217,728 bytes

However, they contain a different number of buckets and different values (although the keys are the same).

I also noticed that after deleting tens of thousands of key/value pairs from the database with the following code, the file size remained unchanged.

db.Update(func(tx *bolt.Tx) error {
  b := tx.Bucket([]byte("bucketName"))
  err := b.Delete([]byte("keyToDelete"))
  return err
})

I double checked the database and it's only returning the correct quantity of key/value pairs when iterating with a ForEach, so the delete did appear to work.

Before deleting the key/value pairs from each database, I moved them to a backup. The two backup files for the two databases are different files sizes: 67,108,864 and 16,777,216 bytes

So, it looks like it's writing different volumes, but I expected the file sizes of the original databases to be reduced by those respective amounts.

Maybe this isn't an issue, but I'm just curious as to why the file sizes are remaining the same after the deletes and why two different databases would have the same number of bytes.

Thanks!

@benbjohnson
Copy link
Member

Good questions. Bolt works by allocating 4KB pages and organizing those into a B+tree. It starts at the beginning of the file and keeps allocating more pages at the end as it needs them. Because Bolt is copy-on-write, when a page is updated, its contents get copied to a new page and the old one is freed. So as pages are freed they can be reused the next time Bolt needs to allocate.

For example, let's look at some sweet ASCII art visualization. When you first create a Bolt database it creates a 1MB file which provides space for some metadata pages (M), a page for storing free pages (F), and some actual data (D). The rest of the pages are unallocated ( ):

|M|M|F|D| | | | | |

When you update your data page, it'll allocate a new page and copy over old data and make any updates there. It'll then free the old page.

|M|M|F| |D| | | | |

As you start writing more data (and not just updating existing pages), you start to allocate more pages:

|M|M|F| |D| | | | |
        ▼
|M|M|F|D|D| | | | |
        ▼
|M|M|F|D|D|D| | | |
        ▼
|M|M|F|D|D|D|D| | |
        ▼
|M|M|F|D|D|D|D|D| |
        ▼
|M|M|F|D|D|D|D|D|D|

Finally when you've exhausted your original 1MB, Bolt will remap the database to give you 2MB. (It keeps doubling until it gets to 1GB and then it goes up in 1GB increments)

|M|M|F|D|D|D|D|D|D| | | | | | | | | |

That's why your databases are the same size. Bolt bumps the data file size up in fixed size increments. This used to not be the case but there's was an edge case uncovered where file data needed to have the file size sync so we needed to truncate (#284).


The reason why the database doesn't shrink is because of data can be allocated anywhere and live for any period of time. For example, once you fill up your 1MB data file like this:

|M|M|F|D|D|D|D|D|D|

And you remove some data that causes a page to be removed, your deleted page could be anywhere so your database looks like this:

|M|M|F|D| |D|D|D|D|

Bolt can't truncate the file because the free page is in the middle of the file. We could try to compact pages from the end over to the beginning but that gets complicated as we'd need to update all the references to that page in parent pages.

I'd like to add a bolt compact tool to the Bolt CLI but honestly there hasn't really been a need for it. It would work pretty simply -- it just rewrites a database in key order to a new database. That would create a compacted database.


tl;dr - the database size doesn't shrink & the database grows in fixed sizes. :)

@GregIngelmo
Copy link

Great explanation, this certainly solidified how it works for me.

@ghost
Copy link
Author

ghost commented Feb 18, 2015

Super helpful, thanks for explaining it :)

@rakoo
Copy link

rakoo commented Mar 29, 2015

Maybe it could be possible to truncate the file up to the next power of 2 that includes all data ? i.e if the end of the file is full of empty pages, it should be quick and easy to truncate the underlying file ? The power of 2 is still important so that the file keeps a size that is consistent with when it grows as per your explanation (so when the file size is more than 1 GiB, truncate by chunks of 1 GiB)

@MJDSys
Copy link
Contributor

MJDSys commented Mar 29, 2015

I don't think that case would occur very often @rakoo. Instead, could bolt punch holes in the file where pages are empty? Maybe using a special call, to avoid latency issues in the normal case? That way an empty pages can be reclaimed by the filesystem, even if they are in the middle of the file.

@benbjohnson, would that be accepted if someone (not necessarialy me with my current time budget) made a PR with it? Linux, at least, has fallocate with FALLOC_FL_PUNCH_HOLE: http://man7.org/linux/man-pages/man2/fallocate.2.html . Not sure if Mac OSX/Windows has a similar system call.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants