UUIDs are unique, but they’re hard to handle. Being 128 bits they aren’t very popular with embedded databases and you often need to have special structures to handle them in-memory. Storing them as strings won’t be a good use storage . Moreover SQLite doesn’t seem to store 128-bit entities very efficiently.
We need something that can fit in 64 bits. Most programming languages have been able to handle 64 bits just fine, having native 64 data types since decades ago.
Long I’ve been fascinated with Twitter’s Snowflake. No, not the data warehouse design pattern snowflake. Not the winter ones either. Early this decade Twitter published the method on how the generated identifier values for tweets that fits within a 63-bit integer. What’s so great about it? Because it’s Twitter with the zillions of tweets they need to handle and still have it fit inside a normal integer is mind boggling for me. More often than not, developers are lazy (me included) and just opt to use a UUID instead. Unfortunately UUIDs being 128-bit data type is often hard to handle. For a long time Core Data didn’t have a UUID type and using it means the pains of needing to use BLOB (binary large object) types and have all the potential issues with those types, including not playing well with indexes and primitive types. Or worse, store it as a string and use 36 bytes for something that only needs 16 bytes to keep.
In short, Snowflake is a somewhat sequential identifiers consisting of three parts:
- A timestamp.
- A worker number.
- A sequence number.
My biggest need for this is mainly for use with Core Data. Mainly because Core Data objects doesn’t get a permanent identifier until they are saved into persistent store. They have an
objectID attribute that is supposed to be unique. In reality this attribute only becomes unique after the object gets written to disk and SQLite has created a primary key for it — that is for SQLite-backed persistent store. This makes it really hard to use it as an ID to reference on for UI State Preservation and Restoration purposes. Hence another ID is often required which needs to present when the object is instantiated and remains the same for its entire lifecycle.
I’ve recently made a Swift-based implementation of the Snowflake ID generator:
Miniflake. I’ve made it available to all current Apple computers, big and small — from watchOS, iOS, tvOS, and macOS.
I’ve modified Twitter’s Snowflake schema a bit and made the sequence number more significant than the worker ID field. This makes the identifiers more sortable and should fit better to uses in the Apple ecosystem that is skewed more towards single-user-multiple-devices scenario.
The following is my implementation of Snowflake:
- The 41 most significant bits make up the timestamp portion. This is the number of milliseconds since a custom epoch. These bits would be good for over 69 years and having an epoch that is nearer to the time the first ID is created would further mitigate it from overflowing.
- The next 12 bits is a sequence number. This field is likely to be zero unless when there’s a need to generate more than one value for a given millisecond. Hence each worker can theoretically generate 4096 distinct identifier values per millisecond.
- Finally the least-significant 10 bits are used to identify the worker which doles out these IDs. Typically this is the instance number of the associated microservice that creates it.
In my implementation, generator instances are to be associated with a thread or a Core Data context. Ideally there should be one that gets associated with a serial dispatch queues as well. But things aren’t that straightforward with dispatch queues. A dispatch queue may use other queues for actual execution of tasks. In turn these other queues may not always be one of the system-global queues. In other case, pretty much all of these dispatch queues operate on a relatively small number of thread pools. Hence attaching the generator instances as a thread-local storage would be a good bet.
Most use cases should suffice by calling either one of these extension methods:
The above methods will return the next 64-bit identifier value that you would need to associate with your data object. Use the
NSManagedObjectContext extension when you are using the ID value inside
awakeFromInsert to setup the initial values of a Core Data object. Otherwise use the extension attached to
Thread.currentThread instead. Both of these extension methods will lazily create a generator instance and attach it to the thread or Core Data context.
However you can use the either the
InProcessFlakeMaker class directly if you have more esoteric uses.
For server-based scenarios you can use the Swift class
FlakeMaker. Instantiating it requires an instance number which should be a positive integer less than 1024. Any number greater than that would be modulo’ed to 12 bits. Your entire landscape must call-in synchronously to one of these ID generators to make sure that there won’t be any duplicate IDs. Of course you would only need this if you can’t have a central database that generates those identifier values in the first place.
For on-device scenarios, you should use the
InProcessFlakeMaker class. It takes care of assigning instance numbers for you. These instance numbers are assigned randomly to minimize clashes. Take care to not have more than 1024 instances live at any given moment or your application will crash with a
fatalError. The two
nextFlakeID() extension methods uses this class, hence you need to make sure that you don’t have more than 1024 threads or
NSManagedObjectContext instances lying around.
Randomization of instance numbers is to minimize the chance of ID number clashing in case more than one device accesses a synchronized data store. In any case these clashes should be rare since typically a synchronized store is owned by one person. It can only happen if the same person creates the same record type within less than one millisecond of one another and gets (un)-lucky enough such that both generators have the same instance number. In short the possibility of a clash is a less than one-in-a-million — or you’ll likely to get one when you reach two million concurrent users modifying the same data store, as per Nyquist Theorem. In any case, you should already have server-based logic that takes care of this well before you reach your first million concurrent users.
InProcessFlakeMaker are thread safe. Each thread should get its own instance because ID generation modifies internal state and has no locks to maximize throughput. Nevertheless the two
nextFlakeID() extension methods will take care of this by creating dedicated instances for their respective objects.
FlakeMaker protects both against sequence number overflows and time moving backward. The latter is usually caused by network time synchronization, should the system time moves too fast, NTP could cause it to go back by a few seconds.
FlakeMaker would ensure that the ID values it generates keep on increasing. Similarly if the sequence number portion goes over what 12 bits can hold, the logic would bump the time field by one millisecond. Have a look at the
nextValue() method to see how this works.
The code is available in my Github account as MiniFlake. You can also consume it from Cocoapods.
- Xcode 9.2
- Swift 4.0
- macOS 10.12+
- iOS 10.3+
- tvOS 10.2+
- watchOS 3.2+
MIT or Simplified BSD (they’re basically the same, really), because I know how unusable GPL is with regards to software for Apple computers
That’s all for now. Enjoy!
0 thoughts on “Swift Snowflake ID Generator”
You must log in to post a comment.