Search code examples
pythoncassandratimeuuidclustering-key

Cassandra TimeUUID flood file descriptor when use uuid in default


I have Cassandra model as

import uuid
from cassandra.cqlengine import columns
from cassandra.cqlengine.models import Model

class MyModel(Model):
    ...
    ...
    created_at = columns.TimeUUID(primary_key=True,
                         clustering_order='DESC',
                         default=uuid.uuid1)
    ...
    ...

Recentrly app hit the uuid1 creation doesn't close files - hits file descriptor limit. I try to find the solution, but seems what options I think might be not work

  • Replace uuid1 in default with uuid4, but TimeUUID need time part in it, and only uuid1 provide that.
  • Relace uuid1 with cassandra.util.uuid_from_time(time.time()), when check the code for both uuid1 and uuid_from_time, both are looks same, so that also not solve the problem.

Last option is to replace TimeUUID with Timestamp type, but this created_at column is primary_key and clustering_order, so dont know I can do that or not.

My column family has already 1,000,000+ data, so I cant just drop them.

I also want to know, what is the advantage of using TimeUUID instead of timestamp ?


Solution

  • Are you certain you're hitting the libuuid issue you linked? Your code snippet shows the standard library uuid, which probably doesn't have that issue. Is it possible there's a different file descriptor leak in your program?

    If it is libuuid, the easiest course would be to use the standard library implementation. If speed is a major concern for you, you might look into building a different version of libuuid to use with python-libuuid. I tried this one quickly and didn't notice any file descriptors leaking: http://www.ossp.org/pkg/lib/uuid/

    I also want to know, what is the advantage of using TimeUUID instead of timestamp ?

    You won't be able to change the type of the column on your existing table, but to answer your question: TimeUUID is usually used to avoid collisions where multiple events could be written in the same timestamp value.