Lua - Ability to encode JPEG using DCT (discrete cosine transform)

I’m looking to find a library and/or guide that would allow me to encode an image with DCT (discrete cosine transform ) so I can place it in a basic 1.0 pdf file. (FYI I’m using https://git.catseye.tc/pdf.lua/ to create the pdf.

I’ve search the internet for something’s but couldn’t find anything is anyone on SO aware of something using Lua to encode an JPEG with DCT..

Update:

Based on feedback, here’s some additional information on my ask

If you open up a PDF file, the stored JPEG data will appear in the XObject image. Here is an example.

14 0 obj
<<
/Intent/RelativeColorimetric
/Type/XObject
/ColorSpace/DeviceGray
/Subtype/Image
/Name/X
/Width 2988
/BitsPerComponent 8
/Length 134030
/Height 2286
/Filter/DCTDecode
>>
stream (binary data) endstream

The /Type shows that this is an image. The key section is the /Filter value – DCTDecode , which indicates a JPEG (JPX shows a JPEG2000) which also works. The data i need is to go between stream and endstream.

I’m looking for help in how I can get an image converted into the DCT format needed..

Solution

The prime difference for DCT/JPEG in PDF is that the JPEG in a PDF must be "baseline" much as it was in 1992 see also (https://ia801003.us.archive.org/5/items/pdf320002008/PDF32000_2008.pdf#page=42) and that's what MS paint (or any command driven graphics app) will save as "simple" JPEG (not any exotic type) so here on the right is the everyday JPEG from MS Paint conversion from PNG or any other complex format, and here is the exact same /DCTdecode object when imported by a PDF writer, on the left.

So if we export the image from the PDF we will get the Jpeg (not the source PNG). How to check they are identical is copy and paste or use extractor. So the image.jpg used for my command-line wrap as a PDF is 5,757 bytes the extracted from PDF image is 5,757 bytes, thus we can expect a match.

Check they are the identical binary files (what goes in, comes out, very rare for a PDF):

C:\Apps\Programming\pdf demo>fc /B input.jpg extracted.jpg
Comparing files input.jpg and EXTRACTED.JPG
FC: no differences encountered

So to make a page PDF from an image you simply need a header:

%PDF-1.7
%ANSI

1 0 obj <</Type/Catalog/Pages 2 0 R>> endobj
2 0 obj <</Type/Pages/Count 1/Kids [ 3 0 R ]>> endobj
3 0 obj <</Type/Page/MediaBox [ 0 0 841.5 594.75 ]/Rotate 0/Resources 4 0 R/Contents 5 0 R/Parent 2 0 R>> endobj
4 0 obj <</XObject <</Img1 6 0 R>>>> endobj
5 0 obj <</Length 61>>
stream
1 0 0 -1 -0 594.75 cm 841.5 0 0 -594.75 0 594.75 cm /Img1 Do
endstream
endobj
6 0 obj <</Type/XObject/Subtype/Image/ColorSpace/DeviceRGB/BitsPerComponent 8/Filter/DCTDecode
/Width 1123/Height 794/Length 202537 >>stream

where a Windows command-line or any other script language, can write that last line with the correct values. And a trailer, which is where it may then get messy. So as much of the tail was moved to the head to keep the trailer writing minimal. I have done similar command-line embedding for Video and Audio, so DCT (JPEG) images should not be a problem (except I prefer lossless pixel-perfect PNG, and that's way harder).

Here is a matching trailer for the header above:

endstream
endobj
xref
0 7
0000000000 65535 f 
0000000016 00000 n 
0000000061 00000 n 
0000000115 00000 n 
0000000228 00000 n 
0000000272 00000 n 
0000000380 00000 n 

trailer
<</Size 7/Info <</Producer (Cmd2PDF)>>/Root 1 0 R>>
startxref
203076
%%EOF

You simply need to ensure the startxref is correct.

So the working program is first use any graphics app to prep the width height and length and apply the dimensions and thus offset to end of header and trailer then briefly:

copy /b 8bitHead.txt + 8bit.jpg + 8bitTail.txt 8bitColour.pdf

Since JPEG is a binary compressive encoding, you can't use any plain text copy and paste as it destroys the highest 8th bit of each byte corrupting the jpeg, hence its the pants for building in a textual fashion. Thus needs binary sandwich between the 2 text parts hence copy /b:

Edit

I gave a fairly complex value above for object 5, that can be simplified so say we have an image to be scaled as 500 pt by 477 pt and we want it centred, we can offset use by half of the extra width and half the extra height so simplified to W 0 0 H dx/2 dy/2 where dx is the width of white-space and similar for dy height.

5 0 obj <</Length 61>> stream
500.000 0 0 477.000 170.750 53.873 cm /Img1 Do               
endstream
endobj

Edit 2

For a different question I revisited the methods needed to use a simpler batch file to automate a single pixel perfect JPEG addition. It is not much different to above and needs some spit and polish for production. However it shows how to automate for various source images and can be bettered for a set of images in a loop, but its a start point.

@echo off
set "filename=%~f1"

REM cleanup any failed run !
if exist %temp%\output1.txt del %temp%\output1.txt
if exist %temp%\output2.txt del %temp%\output2.txt
if exist %temp%\output.pdf del %temp%\output.pdf

REM we could write a text header here but its faster to copy one prepared earlier
copy header.txt %temp%\output1.txt

REM Write current image data
@echo fsObj = new ActiveXObject("Scripting.FileSystemObject");var ARGS = WScript.Arguments;var img=new ActiveXObject("WIA.ImageFile");var filename=ARGS.Item(0);img.LoadFile(filename);WScript.StdOut.Write("/Width "+img.Width+"/Height "+img.Height);>"%temp%\dimimg.js"
@cscript //nologo "%temp%\dimimg.js" "%filename%">>%temp%\output1.txt
for %%I in ("%filename%") do @echo /Length %%~zI^>^>>>%temp%\output1.txt
echo stream>>%temp%\output1.txt

REM append image
copy /b %temp%\output1.txt+%filename% %temp%\output2.txt
echo/>>%temp%\output2.txt
echo endstream>>%temp%\output2.txt
echo endobj>>%temp%\output2.txt

REM prep the trailer
for %%I in ("%temp%\output2.txt") do set "startxref=%%~zI"
copy /b %temp%\output2.txt+trailer.txt %temp%\output.pdf
echo %startxref%>>%temp%\output.pdf
echo %%%%EOF>>%temp%\output.pdf

REM call the result
if exist %temp%\output1.txt del %temp%\output1.txt
if exist %temp%\output2.txt del %temp%\output2.txt
%temp%\output.pdf

A demo working set can be found here https://github.com/GitHubRulesOK/MyNotes/blob/master/jpgTOpdf.zip