Search code examples
pythonpandasnumpygoogle-cloud-datastoregoogle-cloud-bigtable

Is Bigtable or Datastore more suited to storing and using financial data for online applications?


I'm creating a stock analysis web application. I want to store financial data for multiple stocks. Then I want to use a stock screener on them. This screener involves retrieving multiple stocks from the backend and performing a technical indicator test on them. Stocks that pass the indicator test will be returned to the user. Let's say i want to store a pandas.dataframe for exampleStock:

          open    high      low   close    volume
date                                                 
2017-08-01  247.46  247.50  246.716  247.32  55050401
2017-08-02  247.47  247.60  246.370  247.44  47211216
2017-08-03  247.31  247.34  246.640  246.96  40855997
2017-08-04  247.52  247.79  246.970  247.41  60191838
2017-08-07  247.49  247.87  247.370  247.87  31995021
....

I have been using DataStore. I create entities for each stock setting the key as the stocks symbol. I use a model like this:

from google.appengine.ext import ndb

class Stocks(ndb.Model):
    dates  = ndb.StringProperty(repeated=True)
    open   = ndb.FloatProperty(repeated=True)
    high   = ndb.FloatProperty(repeated=True)
    low    = ndb.FloatProperty(repeated=True)
    close  = ndb.FloatProperty(repeated=True)
    volume = ndb.FloatProperty(repeated=True)

Then I retrieve multiple entities to loop over with the techncial indicator check:

import numpy

listOfStocks = ndb.get_multi(list_of_keys)
for stock in listOfStocks:
  doIndicatorCheck(numpy.array(stock.close))

I want to make a query for stocks, do the indicator check and then return results to the user as fast as possible. Should I be using Bigtable for this or Datastore is fine? If Datastore is fine is this the ideal way to do it?

Thanks in advance.


Solution

  • Disclosure: I am a product manager for Cloud Bigtable.

    If you plan to have a large amount of financial data, covering the entire stock market, Cloud Bigtable is a good choice: it scales to terabytes and petabytes, and you can get low-latency responses to your requests, it is already in use in financial, risk and anti-fraud applications, and natively supports time series via its third dimension. See this blog post and video on how FIS used Cloud Bigtable for their bid on the SEC CAT project.

    That said, Cloud Bigtable is strongly consistent in a single cluster, but eventually-consistent if you use replication, so you have to keep that in mind. If your users expect strong consistency, your options are:

    • use a single cluster instance (replication only within a single zone)
    • if you use cross-zone replication, route requests to a single cluster via application profiles
    • consider using a different system which provides strong consistency

    Firestore will provide a serverless document database with strong consistency for your applications, so you should consider Firestore if that is important for your use case.

    If you want to be able to run SQL queries on your data, consider:

    Hope this helps!