How to encapsulate the H.264 bitstream of video file in C++

I'm trying to convert a video file (.mp4) to a Dicom file.
I have succeeded to do it by storing single images (one per frame of the video) in the Dicom,
but the result is a too large file, it's not good for me.
Instead I want to encapsulate the H.264 bitstream as it is stored in the video file, into the Dicom file.
I've tried to get the bytes of the file as follows:

std::ifstream inFile(file_name, std::ifstream::binary);

inFile.seekg(0, inFile.end);
std::streampos length = inFile.tellg();
inFile.seekg(0, inFile.beg);

std::vector<unsigned char> bytes(length);

inFile.read((char*)&bytes[0], length);

but I think I have missed something like encapsulating for the read bytes because the result Dicom file was a black image.

In python I would use pydicom.encaps.encapsulate function for this purpose:
https://pydicom.github.io/pydicom/dev/reference/generated/pydicom.encaps.encapsulate.html

with open(videofile, 'rb') as f:
    dataset.PixelData = encapsulate([f.read()])

Is there anything in C ++ that is equivalent to the encapsulate function?
or any different way to get the encapsulated pixel data of video at one stream and not frame by frame?

This is the code of initializing the Dcmdataset, using the bytes extracted:

VideoFileStream* vfs = new VideoFileStream();
vfs->setFilename(file_name);
if (!vfs->open())
    return false;

DcmDataset* dataset = new DcmDataset();
dataset->putAndInsertOFStringArray(DCM_SeriesInstanceUID, dcmGenerateUniqueIdentifier(new char[100], SITE_SERIES_UID_ROOT));
dataset->putAndInsertOFStringArray(DCM_SOPInstanceUID, dcmGenerateUniqueIdentifier(new char[100], SITE_INSTANCE_UID_ROOT));
dataset->putAndInsertOFStringArray(DCM_StudyInstanceUID, dcmGenerateUniqueIdentifier(new char[100], SITE_STUDY_UID_ROOT));
dataset->putAndInsertOFStringArray(DCM_MediaStorageSOPInstanceUID, dcmGenerateUniqueIdentifier(new char[100], SITE_UID_ROOT));
dataset->putAndInsertString(DCM_MediaStorageSOPClassUID, UID_VideoPhotographicImageStorage);
dataset->putAndInsertString(DCM_SOPClassUID, UID_VideoPhotographicImageStorage);
dataset->putAndInsertOFStringArray(DCM_TransferSyntaxUID, UID_MPEG4HighProfileLevel4_1TransferSyntax);
dataset->putAndInsertOFStringArray(DCM_PatientID, "987655");
dataset->putAndInsertOFStringArray(DCM_StudyDate, "20050509");
dataset->putAndInsertOFStringArray(DCM_Modality, "ES");
dataset->putAndInsertOFStringArray(DCM_PhotometricInterpretation, "YBR_PARTIAL_420");
dataset->putAndInsertUint16(DCM_SamplesPerPixel, 3);
dataset->putAndInsertUint16(DCM_BitsAllocated, 8);
dataset->putAndInsertUint16(DCM_BitsStored, 8);
dataset->putAndInsertUint16(DCM_HighBit, 7);
dataset->putAndInsertUint16(DCM_Rows, vfs->height());
dataset->putAndInsertUint16(DCM_Columns, vfs->width());
dataset->putAndInsertUint16(DCM_CineRate, vfs->framerate());
dataset->putAndInsertUint16(DCM_FrameTime, 1000.0 * 1 / vfs->framerate());
const Uint16* arr = new Uint16[]{ 0x18,0x00, 0x63, 0x10 };  
dataset->putAndInsertUint16Array(DCM_FrameIncrementPointer, arr, 4);
dataset->putAndInsertString(DCM_NumberOfFrames, std::to_string(vfs->numFrames()).c_str());
dataset->putAndInsertOFStringArray(DCM_FrameOfReferenceUID, dcmGenerateUniqueIdentifier(new char[100], SITE_UID_ROOT));
dataset->putAndInsertUint16(DCM_PixelRepresentation, 0);
dataset->putAndInsertUint16(DCM_PlanarConfiguration, 0);
dataset->putAndInsertOFStringArray(DCM_ImageType, "ORIGINAL");
dataset->putAndInsertOFStringArray(DCM_LossyImageCompression, "01");
dataset->putAndInsertOFStringArray(DCM_LossyImageCompressionMethod, "ISO_14496_10");
dataset->putAndInsertUint16(DCM_LossyImageCompressionRatio, 30);
dataset->putAndInsertUint8Array(DCM_PixelData, (const Uint8 *)bytes.data(), length);

DJ_RPLossy repParam;
dataset->chooseRepresentation(EXS_MPEG4HighProfileLevel4_1, &repParam);
dataset->updateOriginalXfer();

DcmFileFormat fileformat(dataset); 
OFCondition status = fileformat.saveFile("C://temp//videoTest", EXS_LittleEndianExplicit);

Solution

The trick is to redirect the value of the attribute PixelData to a file stream. With this, the video is loaded in chunks and on demand (i.e. when the attribute is accessed). But you have to create the whole structure explicitly, that is:

The Pixel Data element
The Pixel Sequence with...
...the offset table
...a single item containing the contents of the MPEG file

Code

// set length to the size of the video file
DcmInputFileStream dcmFileStream(videofile.c_str(), 0);
DcmPixelSequence* pixelSequence = new DcmPixelSequence(DCM_PixelSequenceTag));
DcmPixelItem* offsetTable = new DcmPixelItem(DCM_PixelItemTag);
pixelSequence->insert(offsetTable);
DcmPixelItem* frame = new DcmPixelItem(DCM_PixelItemTag);
frame->createValueFromTempFile(dcmFileStream.newFactory(), OFstatic_cast(Uint32, length), EBO_LittleEndian);
pixelSequence->insert(frame);
DcmPixelData* pixelData = new DcmPixeldata(DCM_PixelData);
pixelData->putOriginalRepresentation(EXS_MPEG4HighProfileLevel4_1, nullptr, pixelSequence);
dataset->insert(pixelData, true);
DcmFileFormat fileformat(dataset); 
OFCondition status = fileformat.saveFile("C://temp//videoTest");

Note that you "destroy" the compression if you save the file in VR Implicit Little Endian.

As mentioned above and obvious in the code, the whole MPEG file is wrapped into a single item in the PixelData. This is DICOM conformant but you may want to encapsulate single frames each in one item.

Note : No error handling presented here