Search code examples
pythondatabasealgorithmic-trading

How to set up data collection for small-scale algorithmic trading software


This is a question on a conceptual level.

I'm building a piece of small-scale algorithmic trading software, and I am wondering how I should set up the data collection/retrieval within that system. The system should be fully autonomous.

Currently my algorithm that I want to trade live is doing so on a very low frequency, however I would like to be able to trade with higher frequency in the future and therefore I think that it would be a good idea to set up the data collection using a websocket to get real time trades straight away. I can aggregate these later if need be.

My first question is: considering the fact that the data will be real time, can I use a CSV-file for storage in the beginning, or would you recommend something more substantial?

In any case, the data collection would proceed as a daemon in my application.

My second question is: are there any frameworks available to handle real-time incoming data to keep the database constant while the rest of the software is querying it to avoid conflicts?

My third and final question is: do you believe it is a wise approach to use a websocket in this case or would it be better to query every time data is needed for the application?


Solution

  • CSV is a nice exchange format, but as it is based on a text file, it is not good for real-time updates. Only my opinion but I cannot imagine a reason to prefere that to database.

    In order to handle real time conflicts, you will later need a professional grade database. PostgreSQL has the reputation of being robust, MariaDB is probably a correct choice too. You could use a liter database in development mode like SQLite, but beware of the slight differences: it is easy to write something that will work on one database and will break on another one. On another hand, if portability across databases is important, you should use at least 2 databases: one at development time and a different one at integration time.

    A question to ask yourself immediately is whether you want a relational database or a noSQL one. Former ensures ACID (Atomicity, Consistency, Isolation, Durability) transations, the latter offers greater scalability.