Search code examples
cassandradatastax-node-driver

Cassandra nodejs driver, how to update data correctly


I am new to Cassandra and I don't quite know if my data model is correct. I have tried to create it based on the queries I want to make in my application. I want to create and update book objects and I want to find books by author and by publish date. I am using the DataStax Node.js Driver for Cassandra (using Typescript) and here is my schema so far:

CREATE TABLE IF NOT EXISTS books_by_author (
    author_id UUID,
    book_id UUID,
    book_name TEXT,
    date_published TIMESTAMP,
    PRIMARY KEY (author_id, date_published);

CREATE TABLE IF NOT EXISTS books (
    book_id uuid PRIMARY KEY,
    book_name text,
    book_description TEXT,
    date_published TIMESTAMP,
    author_id uuid,
    author_name TEXT,
 + many more columns for book details);

Making author_id and date_published as primary key I was able to make queries with the nodejs driver and with help from the DataStax documentation:

const q = cassandra.mapping.q;

const results = await this.bookMapper.find(
          {
            authorId: '1', datePublished: q.and(q.gte(start), q.lte(end)), // given timerange for publish date, works fine
          },
          docInfo,
          options);

The above code works well; I can get the list of books by author and by specifying a date range when publised. The bookMapper is mapping both tables (books_by_author, books) so I am using it to make all my DB queries.

Then I ran into issues. I created a book in my application but I gave it the wrong publish date and I would like to change that. So, to see how it could be done, I created a unit test that saves a book to the DB, then tries to use bookMapper.update to update the book's datePublished property. Here's some pseudo code on what I tried to achieve:

const bookId = '123uuid';

const existingBook = new Book({
    id: bookId,
    name: 'The Book',
    datePublished: '2020-07-03T13:00:00.000Z',
    description: 'Book description',
    author: {
      id: '1',
      name: 'A. Author',
    }
});
... // insert existingBook to DB and read book details from DB using bookMapper.get({bookId})

const modifiedBook = new Book({
    id: bookId,
    name: 'The Book',
    datePublished: '2020-07-02T13:00:00.000Z', // modified publish date
    description: 'Modified book description', // modified the book description as well
    author: {
      id: '1',
      name: 'A. Author',
    }
});

await this.bookMapper.update(modifiedBook); // update the book

await this.bookMapper.get({bookId}); // returns the book with data from existingBook, not modifiedBook

await this.bookMapper.find(
          {
            authorId: '1', datePublished: q.and(q.gte(start), q.lte(end)),
          },
          docInfo,
          options); 
// query with author id, returns a list of 2 books, both the existingBook and modifiedBook ??

As you can see, the update actually created a new book row to the DB and now I have 2 books instead of 1. And I have no idea what is the correct way of updating that data. I tried to use batching:

let changes = [];
changes.push(this.bookMapper.batching.remove(exisitingBook));
changes.push(this.bookMapper.batching.insert(modifiedBook));
await this.mapper.batch(changes);

const book = await this.bookMapper.get({bookId});
--> book is null!

Using batching to remove and insert seems to work so that remove is the last call to DB, it doesn't matter in which order I add those statements to my changes array, and it removes the book causing my last get statement to return null.

I wanted to use batching to make the operation atomic. I don't want to end up in a situation where I first delete the existing book and then insert the new book in separate DB calls without batching because if some error occurs after delete but before insert, then I will have lost my book data from the DB.

My question: What is the correct way to update the book data when the updated property happens to be part of a primary key? Thank you.


Solution

  • This is a well known "feature" of Cassandra - in the batch the both statements are getting the same timestamp, so the DELETE operation wins over the INSERT. The only solution to fix that is to explicitly set timestamps for every operation, with timestamp for DELETE lower than INSERT. I'm not Node.js developer, so it how it should be looking in pseudo-code/CQL (Node.js mapper should support setting custom timestamp on statements):

    TS=currentTimestampInMicroseconds
    BEGIN BATCH
    DELETE FROM table USING TIMESTAMP TS-1 WHERE PK = ... US;
    INSERT INTO table (....) VALUES (....) USING TIMESTAMP TS;
    APPLY BATCH;