Search code examples
ethereumgo-ethereum

geth does not persist trie node data from memory to disk on ungraceful system restart


Issue: geth 1.8.22 starts mining from one of the first blocks instead of the last one on system reboot.

What we have

We have 3 synced private geth nodes using PoA(clique).

What happened

One day(a week ago) we had issues with our hosting provider so we had to restart 2 out of 3 nodes(each node is on separate VPS). Current block is 4 000 000. When node 1 and node 2 were restarted they started mining from block 372 instead of the last one 4 000 000.

Why it happened (my guess)

Geth 1.8.22 keeps some data with trie node data in RAM instead of a disk. On graceful node shutdown(for example from console) this trie node data is saved to hard drive from RAM. On forced system shutdown(for example from hosting admin panel) trie node data does not have time to be saved on a hard drive. We had our nodes running for 6 months without any reboot so I think that this trie node data was kept in RAM for the whole time and it was vanished on system reboot(though we still have node 3 which is up and running).

Logs

Here are the logs when I'm trying to run the backup version of one of the nodes:

vladimir@comp:~/Public/projects/ethereum/repro-geth-bug/geth-linux-amd64-1.8.22-7fa3509e$ ./geth --datadir ../opt/ethereum/data/ --networkid 1515 --unlock 0xd6ee38421e1713dd50e888c6d689b82953946bc3 --password ../opt/ethereum/unlock_password --port 30306 --mine
INFO [11-21|17:06:25.374] Maximum peer count                       ETH=25 LES=0 total=25
INFO [11-21|17:06:25.374] Starting peer-to-peer node               instance=Geth/v1.8.22-stable-7fa3509e/linux-amd64/go1.11.5
INFO [11-21|17:06:25.374] Allocated cache and file handles         database=/home/vladimir/Public/projects/ethereum/repro-geth-bug/opt/ethereum/data/geth/chaindata cache=512 handles=2048
INFO [11-21|17:06:26.550] Initialised chain configuration          config="{ChainID: 1515 Homestead: 1 DAO: <nil> DAOSupport: false EIP150: 2 EIP155: 3 EIP158: 3 Byzantium: 4 Constantinople: 5  ConstantinopleFix: <nil> Engine: clique}"
INFO [11-21|17:06:26.550] Initialising Ethereum protocol           versions="[63 62]" network=1515
WARN [11-21|17:06:26.579] Head state missing, repairing chain      number=4073749 hash=9bfb53…56d503
INFO [11-21|17:07:45.179] Rewound blockchain to past state         number=371     hash=102018…d91947
INFO [11-21|17:07:45.180] Loaded most recent local header          number=4073749 hash=9bfb53…56d503 td=8147499 age=2d5h43m
INFO [11-21|17:07:45.180] Loaded most recent local full block      number=371     hash=102018…d91947 td=743     age=7mo3w6d
INFO [11-21|17:07:45.180] Loaded most recent local fast block      number=4073749 hash=9bfb53…56d503 td=8147499 age=2d5h43m
INFO [11-21|17:07:45.180] Loaded local transaction journal         transactions=3 dropped=3
INFO [11-21|17:07:45.180] Regenerated local transaction journal    transactions=0 accounts=0
WARN [11-21|17:07:45.180] Blockchain not empty, fast sync disabled 
INFO [11-21|17:07:45.623] New local node record                    seq=6 id=e8c5a9e8848d4e30 ip=127.0.0.1 udp=30306 tcp=30306
INFO [11-21|17:07:45.623] Started P2P networking                   self=enode://9647000ba2579dd529574b49f472f029839a09257c1bc3ade5135cbbb5f3ceaf1237aff5b6b947d2fa4f218fa24858dc2767bd4b78e082b04c9d013c1482cfa6@127.0.0.1:30306
INFO [11-21|17:07:45.624] IPC endpoint opened                      url=/home/vladimir/Public/projects/ethereum/repro-geth-bug/opt/ethereum/data/geth.ipc
INFO [11-21|17:07:46.192] Unlocked account                         address=0xd6ee38421e1713dD50E888c6D689B82953946bC3
INFO [11-21|17:07:46.192] Transaction pool price threshold updated price=1000000000
INFO [11-21|17:07:46.192] Transaction pool price threshold updated price=1000000000
INFO [11-21|17:07:46.192] Etherbase automatically configured       address=0xd6ee38421e1713dD50E888c6D689B82953946bC3
INFO [11-21|17:07:46.192] Commit new mining work                   number=372     sealhash=685e15…2c52df uncles=0 txs=0 gas=0 fees=0 elapsed=75.951µs
INFO [11-21|17:07:46.192] Successfully sealed new block            number=372     sealhash=685e15…2c52df hash=0c60ef…f29e6b elapsed=385.27µs
INFO [11-21|17:07:46.192] 🔨 mined potential block                  number=372     hash=0c60ef…f29e6b
INFO [11-21|17:07:46.193] Commit new mining work                   number=373     sealhash=337ae5…2b4704 uncles=0 txs=0 gas=0 fees=0 elapsed=222.362µs
INFO [11-21|17:07:47.962] Mapped network port                      proto=tcp extport=30306 intport=30306 interface="UPNP IGDv1-IP1"
INFO [11-21|17:07:48.391] Mapped network port                      proto=udp extport=30306 intport=30306 interface="UPNP IGDv1-IP1"
INFO [11-21|17:07:49.625] New local node record                    seq=7 id=e8c5a9e8848d4e30 ip=128.71.103.50 udp=30306 tcp=30306
INFO [11-21|17:07:51.001] Successfully sealed new block            number=373     sealhash=337ae5…2b4704 hash=b67668…81f164 elapsed=4.807s
INFO [11-21|17:07:51.001] 🔨 mined potential block                  number=373     hash=b67668…81f164
INFO [11-21|17:07:51.002] Commit new mining work                   number=374     sealhash=c0e9f6…628d51 uncles=0 txs=0 gas=0 fees=0 elapsed=1.434ms
INFO [11-21|17:07:56.001] Successfully sealed new block            number=374     sealhash=c0e9f6…628d51 hash=77aae2…9c44e8 elapsed=4.998s
INFO [11-21|17:07:56.001] 🔨 mined potential block                  number=374     hash=77aae2…9c44e8
INFO [11-21|17:07:56.003] Commit new mining work                   number=375     sealhash=6f7db7…adca12 uncles=0 txs=0 gas=0 fees=0 elapsed=1.305ms
^CINFO [11-21|17:07:58.483] Got interrupt, shutting down... 
INFO [11-21|17:07:58.483] IPC endpoint closed                      url=/home/vladimir/Public/projects/ethereum/repro-geth-bug/opt/ethereum/data/geth.ipc
INFO [11-21|17:07:58.483] Writing cached state to disk             block=374 hash=77aae2…9c44e8 root=e16e04…e93be1
INFO [11-21|17:07:58.483] Persisted trie from memory database      nodes=0 size=0.00B time=7.185µs gcnodes=0 gcsize=0.00B gctime=0s livenodes=1 livesize=0.00B
INFO [11-21|17:07:58.483] Writing cached state to disk             block=373 hash=b67668…81f164 root=e16e04…e93be1
INFO [11-21|17:07:58.483] Persisted trie from memory database      nodes=0 size=0.00B time=2.571µs gcnodes=0 gcsize=0.00B gctime=0s livenodes=1 livesize=0.00B
INFO [11-21|17:07:58.484] Writing cached state to disk             block=247 hash=7b422a…5f9a62 root=e16e04…e93be1
INFO [11-21|17:07:58.484] Persisted trie from memory database      nodes=0 size=0.00B time=2.784µs gcnodes=0 gcsize=0.00B gctime=0s livenodes=1 livesize=0.00B
INFO [11-21|17:07:58.484] Blockchain manager stopped 
INFO [11-21|17:07:58.484] Stopping Ethereum protocol 
INFO [11-21|17:07:58.484] Ethereum protocol stopped 
INFO [11-21|17:07:58.484] Transaction pool stopped 
INFO [11-21|17:07:58.497] Database closed                          database=/home/vladimir/Public/projects/ethereum/repro-geth-bug/opt/ethereum/data/geth/chaindata

How to fix

The 1st thing that comes to mind is to restart geth nodes(gracefully) via cron everyday so that nodes persist trie node data on the disk.

How to handle UNgraceful system shutdown so that geth node persists data and keeps mining from the latest block on restart?


Solution

  • Please check the full answer: https://github.com/ethereum/go-ethereum/issues/20383#issuecomment-558107815

    In short:

    • geth persists data after 1 hour worth of block processing
    • if your network is super light (i.e. mostly empty blocks), it takes a very very long time until blocks are flushed from memory to hard drive
    • currently there is no way to configure the period of persistency rounds in geth

    Solution: restart geth periodically so it saves data from RAM to hard drive