I know how to find the maximum possible capacity of an image given the resolution, image type, and bits per pixel for hiding. How do I find the estimated message size?
Say the image is 100 x 200 pixels and is a 8-bit paletted image. And we are hiding 4-Bit LSB. What would the estimated message size be?
The estimated message length is the total length of 1s and 0s you will embed. This is composed of the header (optional) and message stream.
This depends on the size of the message and how you hide it. Generally, you want to ask what form your message takes when you convert it to 1s and 0s (message stream). The numbers 0-255 and ASCII characters can be represented by 8 bits each. The most straightforward examples include:
plain text: number of characters
x 8
binary (black and white) image: height
x width
grayscale image: height
x width
x 8
colour image: height
x width
x 24
You can also decide to compress your message before embedding with, for example, Huffman coding. This should convert your message to fewer bits than the above examples, but you will also need to include your decoding table in your message so that the recipient can decode the message. Overall, Huffman compression results to a shorter length, including the table, for even just a few hundred characters.
Speaking of metadata, in many cases, embedding just your message stream is not enough. For a message stream which is shorter than the maximum message capacity, you have 3 options on what to do with the remaining space:
fill it with random 1s and 0s (effectively introduce most distortions to the cover image),
do nothing, or
stretch the message so that it takes up as much of the maximum message capacity as possible, e.g. matrix encoding in the F5 steganographic system.
If you decide to do nothing, you may need to tell the recipient how many pixels he has to read to extract the whole message so not to carry on reading gibberish. You can either say the total number of bits of the message stream, or how many pixels to read until all the information is extracted. For the former option, the tendency is to devote 32 bits for the message length, but that can be quite the overkill. You can either set a more practical limit, or adopt a more dynamic approach.
A practical limit would be 24 bits, if you assume you will never use a bigger cover than a 1920x1200 grayscale image with 4-bit LSB embedding (1920x1200x4 = 9216000 < 2^24 maximum storage capacity).
A more dynamic approach would be to estimate the minimum number of bits to represent the message length, e.g. 8 bits for up to a message length of 256, 9 bits for up to 512, etc. Then encode this number to a 5-bit value, followed by the message length. For example, if the message length is 3546 bits, using 32 bits to encode the length, it becomes 00000000000000000000110111011010. But with the dynamic approach, it is 01100 10111011010, where 01100 is binary for 12, which says to read the 12 following bits to obtain the message length.
If your program handles both text and images as the secret message, you'll also need to tell the recipient the type of the secret. If you are ever only going to use the four above types, I would encode that using two bits: plain text = 00, binary image = 01, grayscale image = 10, colour image = 11.
If the secret is an image, you'll also need to provide the height and width lengths. 16x2 bits is the general tendency here. But similarly to the message length above, you can use something more practical. For example, you can expect no image to have more than 2048 pixel length for either width or height, so 11x2 bits will be enough to encode this information.
If you embed your secret in more than the last LSB, your message length may not be divisible by that number. For example, a message length of 301, when you embed in the 4-bit LSB. In this case, you need to pad with message with 3 more junk 1s or 0s so that it becomes divisible by 4. Now, 304 is your reported message stream, but after you extract it, you can discard the last 3 bits. It is logical to assume you will never embed in more than 7-bit LSB, so devoting 3 bits to padding should be more than enough.
Depending on what you choose to include in the metadata, you can stitch all of these together and call them header.
Let's do a couple of examples to see this in action. We let the header format to be in the order of message length, secret type, padding, height, width (last two only if necessary).
The message stream is 11 x 8 = 88 bits.
88 mod 4 = 0, so the padding is 000 (3 bits).
The message length is 88 = 00111 1011000 (12 bits).
Secret type is text, so 00 (2 bits).
Estimated message length: Header + message stream = (12 + 2 + 3) + 88 = 105 bits.
The message stream is 151 x 256 x 8 = 309248 bits.
309248 mod 3 = 2, so the padding is 3-2 = 1 = 001 (3 bits).
The message length is 309249 = 10011 1001011100000000001 (24 bits).
Secret type is grayscale image, so 10 (2 bits).
Secret is image, so adding the width and height using 2 16-bit numbers (32 bits).
Estimated message length: Header + message stream = (24 + 2 + 3 + 32) + 309249 = 309310 bits.