I'm trying to copy EXIF tags from one JPEG to another, which has no metadata. I tried to do what is described in this comment.
My idea is copy everything from the tags source file until the first ffdb
excluded, then copy everything from the image source file (which has no tags) starting from the first ffdb
included. The resulting file is corrupt (missing SOS marker).
A full reproducer, including the suggestion by Luatic, is available at https://go.dev/play/p/9BLjuZk5qlr. Just run it in a directory containing a test.jpg file with tags.
This is the draft Go code to do so.
func copyExif(from, to string) error {
os.Rename(to, to+"~")
//defer os.Remove(to + "~")
tagsSrc, err := os.Open(from)
if err != nil {
return err
}
defer tagsSrc.Close()
imageSrc, err := os.Open(to + "~")
if err != nil {
return err
}
defer imageSrc.Close()
dest, err := os.Create(to)
if err != nil {
return err
}
defer dest.Close()
// copy from tagsSrc until ffdb, excluded
buf := make([]byte, 1000000)
n, err := tagsSrc.Read(buf)
if err != nil {
return err
}
x := 0
for i := 0; i < n-1; i++ {
if buf[i] == 0xff && buf[i+1] == 0xdb {
x = i
break
}
}
_, err = dest.Write(buf[:x])
if err != nil {
return err
}
// skip ffd8 from imageSrc, then copy the rest (there are no tags here)
skip := []byte{0, 0}
_, err = imageSrc.Read(skip)
if err != nil {
return err
}
_, err = io.Copy(dest, imageSrc)
if err != nil {
return err
}
return nil
}
Checking the result files it seems the code does what I described before.
On the top left, the source for tags. On the bottom left, the source for image. On the right, the result.
Does anybody know what I'm missing? Thank you.
This turns out to be more difficult than expected. I referred to this resource which explains the general structure of JPEG as a stream of segments, the only exception being the "Entropy-Coded Segment" (ECS) which holds the actual image data.
My idea is copy everything from the tags source file until the first
ffdb
excluded, then copy everything from the image source file (which has no tags) starting from the firstffdb
included. The resulting file is corrupt (missing SOS marker).
This makes very strong assumptions about JPEG files which won't hold. First of all, ffdb
can very well appear somewhere inside a segment. Ordering of segments is also very loose, so you have no guarantee what comes before or after ffdb
(the segment which defines the quantization tables). Even if it did somehow happen to work most of the time, it would still be a very brittle, unreliable solution.
The proper approach is to iterate over all the segments, copying only metadata segments from the file providing the metadata and only non-metadata segments from the file providing the image data.
What complicates this is that for some reason, the ECS does not follow the segment conventions. Thus after reading SOS (Start of Scan), we need to skip to the end of ECS by finding the next segment tag: 0xFF
followed by a byte that may neither be data (a zero) or a "restart marker" (0xD0
- 0xD7
).
For testing, I used this image with EXIF metadata. My test command looked as follows:
cp exif.jpg exif_stripped.jpg && exiftool -All= exif_stripped.jpg && go run main.go exif.jpg exif_stripped.jpg
I used exiftool
to strip the EXIF metadata, and then tested the Go program by readding it. Using exiftool exif_stripped.jpg
(or an image viewer of your choice) I then viewed the metadata and compared against the output of exiftool exif.jpg
(side note: you could probably obsolete this Go program entirely simply by using exiftool
).
The program I wrote replaces EXIF metadata, comments, and copyright notices. I added a simple command-line interface for testing. If you want to keep only EXIF metadata, simply change the isMetaTagType
function to
func isMetaTagType(tagType byte) bool { return tagType == exif }
package main
import (
"os"
"io"
"bufio"
"errors"
)
const (
soi = 0xD8
eoi = 0xD9
sos = 0xDA
exif = 0xE1
copyright = 0xEE
comment = 0xFE
)
func isMetaTagType(tagType byte) bool {
// Adapt as needed
return tagType == exif || tagType == copyright || tagType == comment
}
func copySegments(dst *bufio.Writer, src *bufio.Reader, filterSegment func(tagType byte) bool) error {
var buf [2]byte
_, err := io.ReadFull(src, buf[:])
if err != nil { return err }
if buf != [2]byte{0xFF, soi} {
return errors.New("expected SOI")
}
for {
_, err := io.ReadFull(src, buf[:])
if err != nil { return err }
if buf[0] != 0xFF {
return errors.New("invalid tag type")
}
if buf[1] == eoi {
// Hacky way to check for EOF
n, err := src.Read(buf[:1])
if err != nil && err != io.EOF { return err }
if n > 0 {
return errors.New("EOF expected after EOI")
}
return nil
}
sos := buf[1] == 0xDA
filter := filterSegment(buf[1])
if filter {
_, err = dst.Write(buf[:])
if err != nil { return err }
}
_, err = io.ReadFull(src, buf[:])
if err != nil { return err }
if filter {
_, err = dst.Write(buf[:])
if err != nil { return err }
}
// Note: Includes the length, but not the tag, so subtract 2
tagLength := ((uint16(buf[0]) << 8) | uint16(buf[1])) - 2
if filter {
_, err = io.CopyN(dst, src, int64(tagLength))
} else {
_, err = src.Discard(int(tagLength))
}
if err != nil { return err }
if sos {
// Find next tag `FF xx` in the stream where `xx != 0` to skip ECS
// See https://stackoverflow.com/questions/2467137/parsing-jpeg-file-format-format-of-entropy-coded-segments-ecs
for {
bytes, err := src.Peek(2)
if err != nil { return err }
if bytes[0] == 0xFF {
data, rstMrk := bytes[1] == 0, bytes[1] >= 0xD0 && bytes[1] <= 0xD7
if !data && !rstMrk {
break
}
}
if filter {
err = dst.WriteByte(bytes[0])
if err != nil { return err }
}
_, err = src.Discard(1)
if err != nil { return err }
}
}
}
}
func copyMetadata(outImagePath, imagePath, metadataImagePath string) error {
outFile, err := os.Create(outImagePath)
if err != nil { return err }
defer outFile.Close()
writer := bufio.NewWriter(outFile)
imageFile, err := os.Open(imagePath)
if err != nil { return err }
defer imageFile.Close()
imageReader := bufio.NewReader(imageFile)
metaFile, err := os.Open(metadataImagePath)
if err != nil { return err }
defer metaFile.Close()
metaReader := bufio.NewReader(metaFile)
_, err = writer.Write([]byte{0xFF, soi})
if err != nil { return err }
{
// Copy metadata segments
// It seems that they need to come first!
err = copySegments(writer, metaReader, isMetaTagType)
if err != nil { return err }
// Copy all non-metadata segments
err = copySegments(writer, imageReader, func(tagType byte) bool {
return !isMetaTagType(tagType)
})
if err != nil { return err }
}
_, err = writer.Write([]byte{0xFF, eoi})
if err != nil { return err }
// Flush the writer, otherwise the last couple buffered writes (including the EOI) won't get written!
return writer.Flush()
}
func replaceMetadata(toPath, fromPath string) error {
copyPath := toPath + "~"
err := os.Rename(toPath, copyPath)
if err != nil { return err }
defer os.Remove(copyPath)
return copyMetadata(toPath, copyPath, fromPath)
}
func main() {
if len(os.Args) < 3 {
println("args: FROM TO")
return
}
err := replaceMetadata(os.Args[2], os.Args[1])
if err != nil {
println("replacing metadata failed: " + err.Error())
}
}