Search code examples
ipfs

How recreate a hash digest of a multihash in IPFS


Assuming I'm adding data to IPFS like this:

$ echo Hello World | ipfs add

This will give me QmWATWQ7fVPP2EFGu71UkfnqhYXDYH566qy47CnJDgvs8u - a CID which is a Base58 encoded Multihash.

Converting it to Base16, tells me that the hash digest for what IPFS has added is a SHA2-256 hash:

12 - 20 - 74410577111096cd817a3faed78630f2245636beded412d3b212a2e09ba593ca
<hash-type> - <hash-length> - <hash-digest>

I know that IPFS doesn't just hash the data, but actually serializes it as Unixfs protobuf first and then puts that in a dag.

I'd like to demystify, how to get to the 74410577111096cd817a3faed78630f2245636beded412d3b212a2e09ba593ca but I'm not really sure how to get hold of the created dag that holds the Unixfs protobuf with the data.

For example I can write the serialized raw data to disk and inspect it with a protobuf decoder:

$ ipfs block get QmWATWQ7fVPP2EFGu71UkfnqhYXDYH566qy47CnJDgvs8u > /tmp/block.raw
$ protoc --decode_raw < /tmp/block.raw

This will give me the serialized data in a readable format:

1 {
  1: 2
  2: "Hello World\n"
  3: 12
}

However, piping that through SHA-256 still gives me a different hash, which makes sense because IPFS puts the protobuf in a dag and multihashes that one.

$ protoc --decode_raw < /tmp/block.raw | shasum -a 256

So I decided to figure out how to get hold of that dag node, hash it myself to get to the hash I'm looking for.

I was hoping using ipfs dag get QmWATWQ7fVPP2EFGu71UkfnqhYXDYH566qy47CnJDgvs8u will give me a multihash that can then be decoded, but it turns out it returns some other data hash that I don't know how to inspect:

$ ipfs dag get QmWATWQ7fVPP2EFGu71UkfnqhYXDYH566qy47CnJDgvs8u
$ {"data":"CAISDEhlbGxvIFdvcmxkChgM","links":[]}

Any ideas on how to decode data from here?

UPDATE

data is a Base64 representation of the original data: https://github.com/ipfs/go-ipfs/issues/4115


Solution

  • The hash you're looking for is the hash of the output of ipfs block get QmWATWQ7fVPP2EFGu71UkfnqhYXDYH566qy47CnJDgvs8u. IPFS hashes the encoded value.

    Instead of running:

    protoc --decode_raw < /tmp/block.raw | shasum -a 256
    

    Just run:

    shasum -a 256 < /tmp/block.raw
    

    but it turns out it returns some other data hash that I don't know how to inspect

    That's because we currently use a protobuf inside of a protobuf. The outer protobuf has the structure {Data: DATA, Links: [{Name: ..., Size: ..., Hash: ...}]}.

    In:

    1 {
      1: 2
      2: "Hello World\n"
      3: 12
    }
    

    The 1 { ... } part is the Data field of the outer protobuf. However, protoc --decode_raw *recursively* decodes this object so it decodes theData` field to:

    • Field 1 (DataType): 2 (File)
    • Field 2 (Data): "Hello World\n"
    • Field 3 (Filesize): 12 (bytes)

    For context, the relevant protobuf definitions are:

    Outer:

    // An IPFS MerkleDAG Link
    message PBLink {
    
      // multihash of the target object
      optional bytes Hash = 1;
    
      // utf string name. should be unique per object
      optional string Name = 2;
    
      // cumulative size of target object
      optional uint64 Tsize = 3;
    }
    
    // An IPFS MerkleDAG Node
    message PBNode {
    
      // refs to other objects
      repeated PBLink Links = 2;
    
      // opaque user data
      optional bytes Data = 1;
    }
    

    Inner:

    message Data {
        enum DataType {
            Raw = 0;
            Directory = 1;
            File = 2;
            Metadata = 3;
            Symlink = 4;
            HAMTShard = 5;
        }
    
        required DataType Type = 1;
        optional bytes Data = 2;
        optional uint64 filesize = 3;
        repeated uint64 blocksizes = 4;
    
        optional uint64 hashType = 5;
        optional uint64 fanout = 6;
    }
    
    message Metadata {
        optional string MimeType = 1;
    }