Search code examples
pdftext-processingwatermarkinformation-hidingdata-hiding

Watermarking in PDF documents


I am starting now my first year of master and my project is about digital watermarking in PDF documents.

I start reading some papers, but I noticed that there are no really sufficient works done for hiding information in PDF documents.

I am reading now an article entitled "Blind digital watermarking in PDF documents using Spread Transform Dither Modulation" which was published very recently and done by Bitar W. A et al. I found this paper very interesting, so they hided a message using the x-coordinates in PDF documents. But I think applying the authors' method is very difficult since the authors did not provide any online MATLAB code. I am also wondering why the proposed hiding technique is not also used for the y-coordinates values?

Did anyone work in projects related to the above paper? I just want some important references to start with.


Solution

  • I'm not into this type of research but after seeing your question I downloaded the paper in question and read it.

    In my eyes the technique does not live up to its promise. The article explains at the start

    The main idea behind this technique [Digital Watermarking] is that once a careful user detects the presence of the hidden message, he should be unable to remove that message without strongly altering the watermarked document.

    The article assumes that all such a user can do is to randomly change the x coordinates a bit, applying some noise, and calls its proposed method for information hiding robust as it indeed is quite robust against such noise attacks as long as they do not disturb the appearance too much.

    But there is a different, very straight forward method for such a careful user to remove the message without strongly altering (at least in a negative sense) the document, a method that does not apply random noise:

    There is a natural way to arrange the x coordinates of text on a line: by letting each glyph take the space matching its width according to the font information! If need be, the line additionally can be shortened or lengthened by applying character spacing. The result of this: the offsets applied to hide information are completely lost and the document appearance probably even improves (as natural, untampered character distances are used).

    Even if the font in question has been changed by removing glyph widths information, taking the average width of all occurrences of each glyph as its width should serve as a good approximation.