Message digest in a base64 encoded signed attributes DER structure

I have the following ASN1 ASN.1 dump

SET (4 elem)
  SEQUENCE (2 elem)
    OBJECT IDENTIFIER 1.2.840.113549.1.9.3 contentType (PKCS #9)
    SET (1 elem)
      OBJECT IDENTIFIER 1.2.840.113549.1.7.1 data (PKCS #7)
  SEQUENCE (2 elem)
    OBJECT IDENTIFIER 1.2.840.113549.1.9.5 signingTime (PKCS #9)
    SET (1 elem)
      UTCTime 2021-05-26 19:03:42 UTC
  SEQUENCE (2 elem)
    OBJECT IDENTIFIER 1.2.840.113549.1.9.52 cmsAlgorithmProtection (RFC 6211)
    SET (1 elem)
      SEQUENCE (2 elem)
        SEQUENCE (2 elem)
          OBJECT IDENTIFIER 2.16.840.1.101.3.4.2.1 sha-256 (NIST Algorithm)
          NULL
        [1] (2 elem)
          OBJECT IDENTIFIER 1.2.840.113549.1.1.11 sha256WithRSAEncryption (PKCS #1)
          NULL
  SEQUENCE (2 elem)
    OBJECT IDENTIFIER 1.2.840.113549.1.9.4 messageDigest (PKCS #9)
    SET (1 elem)
      OCTET STRING (32 byte) E2BB4AD28C95B99E9EDEF70662AFE825AF477680F4867B59833AA05313D8F4C0

and I understand that the OCTET STRING is the messageDigest(hash sha-256) of what I am trying to sign. Which in this case is a PDF document using PDFBOX the code I'm using to sign is the following

public byte[] signPKCS7(InputStream content) throws IOException,SignedBytesException {
        try {
            if (SigUtils.checkCertificateUsage((X509Certificate) certificateChain[0])) {
                CMSSignedDataGenerator signGenerator = new CMSSignedDataGenerator();
                X509Certificate userCert = (X509Certificate) this.certificateChain[0];
                ContentSigner mySigner = new CustomSigner(invoke,String.valueOf(userCert.getSerialNumber()),sad);
                signGenerator.addSignerInfoGenerator(
                        new JcaSignerInfoGeneratorBuilder(new JcaDigestCalculatorProviderBuilder().build())
                                .build(mySigner, userCert));
                signGenerator.addCertificates(new JcaCertStore(Arrays.asList(certificateChain)));
                CMSProcessableInputStream msg = new CMSProcessableInputStream(content);
                CMSSignedData signedData = signGenerator.generate(msg, false);
                return signedData.getEncoded();
            } else {
                throw new Exception("Unable to sign pdf. Certificate usage not appropiate for request");
            }
        } catch (GeneralSecurityException | CMSException | OperatorCreationException e) {
            logger.error(e.getMessage());
            throw new RuntimeException("unable to sign pdf!", e);
        }
    }

I have also calculated the sha-256 of the document I am trying to sign and the result is the following

0622971147486E1900037EFF229D921D14F5B51AAC7171729B2B66F81CDF6585

So my question is, is the message digest from the ANS1 the same as the one I calculated? And if so how do I reach that result as when I'm going through the ASN1 structure with the following code I have not been able to get the same result

private byte[] getMessageDigest(byte[] signatures) throws IOException {
        ASN1InputStream input = new ASN1InputStream(signatures);
        byte[] bytesToSign = null;
        ASN1Primitive p;
        while ((p = input.readObject()) != null) {
            if (p instanceof ASN1Set) {
                ASN1Set set = ASN1Set.getInstance(p);
                ASN1Sequence asn1 = ASN1Sequence.getInstance(set.getObjectAt(3));
                ASN1Set setOcter = ASN1Set.getInstance(asn1.getObjectAt(1));
                ASN1OctetString octstr = ASN1OctetString.getInstance(setOcter.getObjectAt(0));
                bytesToSign = octstr.getOctets();
            }
        }
        return bytesToSign;
    }

and the using the following code to convert the bytes to hex

private  String bytesToHex(byte[] bytes) {
        char[] hexChars = new char[bytes.length * 2];
        for (int j = 0; j < bytes.length; j++) {
            int v = bytes[j] & 0xFF;
            hexChars[j * 2] = HEX_ARRAY[v >>> 4];
            hexChars[j * 2 + 1] = HEX_ARRAY[v & 0x0F];
        }
        return new String(hexChars);
    }

I get the following result

E2BB4AD28C95B99E9EDEF70662AFE825AF477680F4867B59833AA05313D8F4C0

Which is the OCTET STRING of the ASN1 dump but its not the hash of the document. And that Octet String is always changing so I can assume its actually not a regular message digest. So what exactly is it and am I able to get the sha-256 of the content I'm sending to sign

Solution

In Short

The document hash is not calculated from the original PDF you want to sign. That PDF first is prepared for signing by applying certain changes, and then the hash is calculated from this prepared PDF except a placeholder gap in it prepared to later house the signature container.

In Detail

To create an integrated PDF signature, certain changes have to be applied to the PDF:

The holder of the to-be-integrated signature is an AcroForm form field in the PDF. If the PDF does not contain an empty, unused signature field (or no existing field shall be used), a new signature field has to be added to the PDF.
A signature form field may have a visualization, a widget annotation, which represents the signature on some page of the document itself. If such a visualization is desired, a matching annotation has to be added to the PDF.
Information describing the mode and other details of signing have to be added to the PDF. Thus, the value of the chosen signature field has to be set to a new dictionary object in the PDF with these signature details; there are two special entries here, the ByteRange and the Contents. Both are set to blank values of appropriate size for starters.
A marker is added to the PDF root AcroForm object indicating that the PDF is signed.

With these additions the PDF is stored. Thereafter the position of the Contents value in the file is fixed and the blank value of the ByteRange value is patched to an array of four integers, the start offset and size of the file segment before the Contents value and the start offset and size of the file segment thereafter.

Then the bytes of these segments of the file are hashed and a CMS signature container signing this document hash is generated which in turn is injected into the Contents value.

In your case the hash you find in the to-be-signed attributes,

E2BB4AD28C95B99E9EDEF70662AFE825AF477680F4867B59833AA05313D8F4C0

is the hash over those two segments of the prepared file which almost always will differ from the hash over the original PDF, like in your case where that is

0622971147486E1900037EFF229D921D14F5B51AAC7171729B2B66F81CDF6585