Search code examples
blockchainethereum

How can I retrieve the entire old ethereum blockchain data for data processing


I want to get the "entire" ethereum blockchain data, not just from a few sets of smart contracts. By data I mean, transaction details including the generated logs.

I can get real-time data using Infura, but it's pretty much impossible to fetch all the old data, it would simply cost too much because I would simply have to do too many network requests.

I need the old data because I am trying to make an indexed database out of the "append-only" ethereum transaction data so that I can easily query it.

To be more precise, I would like to retrieve all NFT(ERC721, ERC1155) transfer transactions and their logs. So that I can do the following queries and much more: all the NFT owned by a particular wallet, transfer histories of a particular NFT token.


Solution

  • Two solutions I have discovered.

    1. Just like @Mikko has mentioned, you can run your own node. And it seemed not be as complex as I have expected. You can search for "geth" and then simply connect this node to your web3 library, just like connecting to Infura.

    But I have not tried this and found a much better solution.

    1. Google cloud Bigquery's public data set has all the old ethereum data. Bigquery is Google's data warehouse service, where you can use simple SQL to query your data. It adds new data every day. I have already tested some simple queries from its console and the result was good.

    I am planning to fetch all the old data I need from bigquery and store it in my own database and afterwards get real time data from infura. Now that I dont have to fetch all the old data from infura, the price becomes very affordable.