Search code examples
ethereumblockchainevm

Nodes in blockchain


I'm confused to understand nodes in blockchain. Here are my questions:

  1. Not sure but I heard that a node is software. So, is all the chain data stored on the client side? The client might not be able to store so much data, or am I wrong?
  2. How do they synchronize? what if a node is offline when a transaction happens?

I already have searched Stackoverflow for similar questions but there weren't accepted good answers

Thanks


Solution

  • In this context, nodes are members of a peer-to-peer (P2P) network. All of them are hierarchically equal - unlike the client-server model for example, where the client only initiates requests and the server only responds to them.

    For example with Ethereum, all computers (nodes) that are connected to the network are able to

    • broadcast transactions to other nodes
    • validate blocks incoming from other nodes
    • propose new blocks (in according with the rules of the Ethereum network, e.g. need to have staked at least 32 ETH)
    • and receive/send other types of messages.

    What you possibly heard in connection with the word "client", is a "node client software". This is for example Go Ethereum or EthereumJS or other software that can connect your computer (node) to the Ethereum network, listen for incoming messages (e.g. new blocks), and broadcast messages (e.g. new transactions that you want to include in a future block).

    Running your own node is not easy. If you're planning to develop a simple app that leverages blockchain, you can just connect to a 3rd party node provider (most of them have limited free plans, and less limited paid plans) and communicate with it over RPC API or using RPC wrapper libraries such as web3js and ethers.js.


    Not sure but I heard that a node is software.

    Node is a member of a network, but you need a software (node client sw) in order to communicate with other members of the network.

    So, is all the chain data stored on the client side?

    Each node holds the same data.

    To be more specific - all nodes hold the latest state (e.g. current balance of all addresses), and some opt in to also hold the archival state (e.g. balances of all addresses at all blocks prior to the current one).

    The client might not be able to store so much data, or am I wrong?

    The current state currently takes about 1.2 TB (chart). The archival state is currently about 15 TB (chart). It might not be manageable on a regular laptop, but businesses with large enough infrastructure are still able to store this amount of data fairly easily.

    Having said that, there are some initiatives for sharding the current state between multiple nodes (the node could hold less than the full 1.2 TB and could ask others for the remaining data when needed), but they are still mostly in the research and proof-of-concept phase.

    How do they synchronize?

    Each node has a list of "bootstrap nodes" where it initially connects to, and asks for "their neighbors" - other nodes that this bootstrap node knows of. Then it asks these neighbors for the list of their neighbors, ... until it reaches a limit number of known neighbors (configurable in your node client software).

    Once your node receives a message (e.g. that there is a new block), it's supposed to rely this message to its neighbors.

    This communication standard is called DevP2P.

    what if a node is offline when a transaction happens?

    When the node comes back online, it asks its neighbors for the latest state, and updates its own database.