I have the following 3 "dataURL image", all of them returns the same image if you open them via "URL", but two of the below dataURL code has "PHP code" and "JavaScript code" embedded on last.
How can I remove those malicious codes from my base64 dataURL image coming from users I don't trust.
base64 dataURL image (safe):

base64 dataURL 2 image (PHP code injected):

base64 dataURL 3 image (Javascript code injected):

You can see text code by "decoding online" using tools like these - https://www.base64decode.org/
I am allowing the user to upload the image to my server and I "convert image" to base64 dataURL image
From above all 3 base64 dataURL image, you can see all returns same image, but their base64 code is different due to embedded text code inside the image.
I am using Go in the backend to save the image. I am using the following HTML code to convert image to dataURL base64 text.
<input type='file' onchange="readURL(this);" />
<img id="blah" src="#" alt="your image" />
<script>
function readURL(input) {
if (input.files && input.files[0]) {
var reader = new FileReader();
reader.onload = function (e) {
document.getElementById("blah").src = e.target.result;
};
reader.readAsDataURL(input.files[0]);
}
}
</script>
My concern is "text" that should not be inside the image, should not be there.
Above dataURL returns the same image, yet they have different base64 code due to extra data inside.
I want to fetch the actual image base64 code from above 2 malicious code.
Let's assume, User B uploaded image where I get "base64 dataURL 3" image, but I want base64 dataURL original image from user's uploaded image.
How this can be done?
ImageMagick convert -strip <in> <out>
will do it. It will also remove other extraneous data (EXIF, embedded thumbnails, etc.), so make sure that behavior is what you want.
$ xxd img.jpg | tail -n 3
00000280: 647f ffd9 3c73 6372 6970 743e 616c 6572 d...<script>aler
00000290: 7428 2768 656c 6c6f 2729 3b3c 2f73 6372 t('hello');</scr
000002a0: 6970 743e 0a ipt>.
$ convert -strip img.jpg img2.jpg
$ xxd img2.jpg | tail -n 3
00000260: 383a 2ebd 4c00 32c8 1ba4 0064 6d3f 229f 8:..L.2....dm?".
00000270: 9001 90a7 e4c8 a1d3 eff9 0019 1800 0647 ...............G
00000280: ffd9
Regardless, if you don't try to execute the images, nothing will happen. But if nothing else, it's wasted space in your image files.
To do this from Go, use the Go ImageMagick bindings and call StripImage