31 May 2012

NSMutableDictionary Growth Patterns

While doing a code review, I made the suggestion to another engineer that he use [NSMutableDictionary dictionaryWithCapacity:] when creating a dictionary with a known size. I realized shortly after that that I didn’t have a very good understanding of the behavior of the class. I knew that it would resize itself periodically as items were added, but when and how often? When specifying a capacity, how many items will it be able to hold before the next resize? I couldn’t find the answers anywhere online, so I decided to figure them out myself.

Approach

I needed to find an easy way to determine when a dictionary had been resized. I knew I could use Instruments to measure it, but tying Instruments’s output to specific dictionary sizes would be more manual than I’d prefer. Instead, I relied on the fact that an NSDictionary’s keys are ordered based on how they’re hashed. After being rehashed, chances are that the order of the keys will be different. This approach can only pick up positive signals, but as the number of keys grows, the likelihood of not detecting a rehash drops pretty dramatically.

My testing code can be found here.

When?

The first question was “when does the dictionary resize itself?”. This was easy enough to check. I added items one at a time, and checked if my “did resize” condition for the keys was met.

[4,7,12,20,33,53,86,119,156,238,391,673,1066,1733,2796,4544,7392]

When the dictionary reached one of the above item counts, it resized itself. It was tempting to try to calculate the scaling factor. It seemed feasible since these resizes likely happen once the count exceeds some threshold determined by the real size and some load factor. All I needed to do was consider the ratio of consecutive numbers in the above series. Unfortunately, NSMutableDictionary is not a concrete class, but rather a class cluster. There are any number of private implementations of that interface with potentially wildly different characteristics sitting behind it. For now, I’ll settle with “it probably starts with space for 4 items” and “it grows by, very roughly, a factor of 1.62”.

How Much?

My next question was regarding initializing with some capacity. The docs only told me that it

Initializes a newly allocated mutable dictionary, allocating enough memory to hold numItems entries.

Does it allocate just enough, or well more? By changing just one line in my test code, I was able to get the sizes at which it would resize.

[4,7,12,20,33,53,86,119,156,238,391,673,1066,1733,2796,4544,7392]

Unexpected.

It seems these resize points are fixed. Initializing with a capacity of of anything from 7 to 11 appears to result in dictionaries of the same actual capacity, not resizing until item 12. Perhaps that makes the most sense. I admittedly don’t know a ton about data structures.

Time to fix that.

Comments? Thoughts? Contact me on twitter or via email.