Search code examples
csvmachine-learningmachine-learning-model

Alternate methods to supply data for machine learning (Other than using CSV files)


I have a question which is relating to machine learning application in real world. It might be sounds stupid lol.

I've been self study machine learning for a while and most of the exercise was using the csv file as data source (both processed and raw). I would like to ask is there any other methods other than import csv file to channel/supply data for machine learning?

Example: Streaming Facebook/ Twitter live feed's data for machine learning in real-time, rather than collect old data and stored them into CSV file.


Solution

  • The data source can be anything. Usually, it's provided as a CSV or JSON file. But in the real world, say you have a website such as Twitter, as you're mentioning, you'd be storing your data in a rational DB such as SQL databases, and for some data you'd be putting them in an in-memory cache.

    You can basically utilize both of these to retrieve your data and process it. The thing here is when you have too much data to fit in the memory, you can't really just query everything and process it, in that case, you'll be utilizing some smart algorithms to process data in chunks.

    Good thing about some databases such as SQL is that they provide you with a set of functions that you can invoke right in your SQL script to efficiently calculate some data. For example you can get a sum of a column across the whole table or something using SUM() function SQL, which allows for efficient and easy data manipulation