Search code examples
cassandranormalizationcqlcql3denormalization

Mixing column types in Cassandra / wide rows


I am trying to learn how to implement a feed in cassandra (think twitter). I want to use wide rows to store all the posts made by a user. I am thinking about adding user information or statistical information in the same row (num of posts, last post date, user name, etc.).

My question is: is name, age, etc. "field name" stored in column? Or those wide rows only store the column-name and values specified? Am I wasting disk space? Am I compromising performance somehow?

Thanks!

-- TABLE CREATION

CREATE TABLE user_feed (
    owner_id int,
    name text,
    age int,
    posted_at timestamp,
    post_text text,
    PRIMARY KEY (owner_id, posted_at)
);

-- INSERTING THE USER

insert into user_feed (owner_id, name, age, posted_at) values (1, 'marc', 36, 0);

-- INSERTING USER POSTS

insert into user_feed (owner_id, posted_at, post_text) values (1, dateof(now()), 'first post!');
insert into user_feed (owner_id, posted_at, post_text) values (1, dateof(now()), 'hello there');
insert into user_feed (owner_id, posted_at, post_text) values (1, dateof(now()), 'i am kind of happy');

-- GETTING THE FEED

select * from user_feed where owner_id=1 and posted_at>0;

-- RESULT

 owner_id | posted_at                | age  | name | post_text
----------+--------------------------+------+------+--------------------
        1 | 2014-07-04 12:01:23+0000 | null | null |        first post!
        1 | 2014-07-04 12:01:23+0000 | null | null |        hello there
        1 | 2014-07-04 12:01:23+0000 | null | null | i am kind of happy

-- GETTING USER INFO - ONLY USER INFO IS POSTED_AT=0

select * from user_feed where owner_id=1 and posted_at=0;

-- RESULT

 owner_id | posted_at                | age  | name | post_text
----------+--------------------------+------+------+--------------------
        1 | 1970-01-01 00:00:00+0000 |   36 | marc |               null

Solution

  • What about making them static?

    A static column is the same in all partition key and since your partition key is the id of the owner you could avoid wasting space and retrieve the user informations in any query.

    CREATE TABLE user_feed (
        owner_id int,
        name text static,
        age int static,
        posted_at timestamp,
        post_text text,
        PRIMARY KEY (owner_id, posted_at)
    );
    

    Cheers, Carlo