I'm creating a stock analysis web application. I want to store financial data for multiple stocks. Then I want to use a stock screener on them. This screener involves retrieving multiple stocks from the backend and performing a technical indicator test on them. Stocks that pass the indicator test will be returned to the user. Let's say i want to store a pandas.dataframe for exampleStock:
open high low close volume
date
2017-08-01 247.46 247.50 246.716 247.32 55050401
2017-08-02 247.47 247.60 246.370 247.44 47211216
2017-08-03 247.31 247.34 246.640 246.96 40855997
2017-08-04 247.52 247.79 246.970 247.41 60191838
2017-08-07 247.49 247.87 247.370 247.87 31995021
....
I have been using DataStore. I create entities for each stock setting the key as the stocks symbol. I use a model like this:
from google.appengine.ext import ndb
class Stocks(ndb.Model):
dates = ndb.StringProperty(repeated=True)
open = ndb.FloatProperty(repeated=True)
high = ndb.FloatProperty(repeated=True)
low = ndb.FloatProperty(repeated=True)
close = ndb.FloatProperty(repeated=True)
volume = ndb.FloatProperty(repeated=True)
Then I retrieve multiple entities to loop over with the techncial indicator check:
import numpy
listOfStocks = ndb.get_multi(list_of_keys)
for stock in listOfStocks:
doIndicatorCheck(numpy.array(stock.close))
I want to make a query for stocks, do the indicator check and then return results to the user as fast as possible. Should I be using Bigtable for this or Datastore is fine? If Datastore is fine is this the ideal way to do it?
Thanks in advance.
Disclosure: I am a product manager for Cloud Bigtable.
If you plan to have a large amount of financial data, covering the entire stock market, Cloud Bigtable is a good choice: it scales to terabytes and petabytes, and you can get low-latency responses to your requests, it is already in use in financial, risk and anti-fraud applications, and natively supports time series via its third dimension. See this blog post and video on how FIS used Cloud Bigtable for their bid on the SEC CAT project.
That said, Cloud Bigtable is strongly consistent in a single cluster, but eventually-consistent if you use replication, so you have to keep that in mind. If your users expect strong consistency, your options are:
Firestore will provide a serverless document database with strong consistency for your applications, so you should consider Firestore if that is important for your use case.
If you want to be able to run SQL queries on your data, consider:
Hope this helps!