Search code examples
javasolrriakbatching

Riak inserting Lists


How do I go about inserting a list of maps in Riak? I'd also be able to query list via Solr.

Here is my target data model:

{  
   "id":"fa0b758cf8de4a40a54f215563bb483c",
   "version":"g2wAAAABaAJtAAAADN4oTJvAfuENAAE8fWEBag==",
   "creator":"ADMIN",
   "creatorAppId":"RIAK_QA_APP1",
   "creation":1470350862095,
   "customerId":"68af96dae60ccac4e6",
   "customerName":"John Appleseed",
   "orders":[  
      {  
         "orderId":"238dhu38ehj",
         "orderType":"sporting",
         "orderDescription":"Baseball Batt",
         "dateOfPurchase":"5470354262012",
         "delivery":"2 day express",
         "processTimeMS":56,
         "customerAttributes":[  
            {  
               "key":"lastSessionDuration",
               "value":"1 day"
            },
            {  
               "key":"memberSince",
               "value":"1470350862095"
            }
         ]
      },
      {  
         "orderId":"9sdjh349hn",
         "orderType":"furniture",
         "orderDescription":"Sectional Couch",
         "dateOfPurchase":"0970354262087",
         "delivery":"Overnight",
         "processTimeMS":78,
         "customerAttributes":[  
            {  
               "key":"lastSessionDuration",
               "value":"1 day"
            },
            {  
               "key":"memberSince",
               "value":"1470350862095"
            }
         ]
      },
      {  
         "orderId":"1009shdj473",
         "orderType":"gaming",
         "orderDescription":"FIFA 2016 - XBox One",
         "dateOfPurchase":"1470354342013",
         "delivery":"UPS Ground",
         "processTimeMS":68,
         "customerAttributes":[  
            {  
               "key":"lastSessionDuration",
               "value":"1 day"
            },
            {  
               "key":"memberSince",
               "value":"1470350862095"
            }
         ]
      }
   ]
}

The current data model persists each order item separately resulting in a high number of Riak writes. This becomes a bottleneck when our messaging system starts pushing through thousands of messages/sec. So the intent here (sort of POC) is to consolidate all order items per customer into an "orders" list and persists as a single resource..similar to a batch.

On that note, does Riak support any kind of batch insert? I was unable to find a solution so I'm kind of manually doing this by merging the data.


Solution

  • You don't have to do anything special to store complex data structures in Riak. The Java client support serialization to JSON (Jackson for that matter, so you can use Jackson capabilities, for example annotations).

    I would suggest modeling your data with DTOs and just sending them via the Riak's StoreValue command. Code like this will work:

    public class CustomerData {
        private String id;
        private String version;
        // other customer fields
        private List<Order> orders;
    }
    
    public class Order {
        private String orderId;
        private String orderType;
        // other order fields
    }
    
    CustomerData data = ...;
    Location location = new Location(new Namespace(BUCKET_TYPE, BUCKET_NAME), key);
    StoreValue storeCommand = new StoreValue.Builder(data).withLocation(location).build();
    riakClient.execute(storeCommand);
    

    Alternatively, you can model this as Map's and List's of Object's.

    Keep in mind though, that if you'll want to update, add or delete individual orders, or change customer date, you will have to read an entire list, change an item, and then write the entire list back, potentially increasing the chance of a conflict (concurrent updates to the same key).

    Solr definitely supports complex field structures, multi-valued and dynamic fields. Have a look at Creating Search Schemas. I need to know what part of your data you want searchable to give you an example.

    And no, Riak does not support batch insert.

    UPDATE: A custom (non-generic) Solr schema for indexing by order IDs in your case should include

    <field name="orders.orderId" type="string" indexed="true" stored="false" multiValued="true" />