audio video ffmpeg video-capture video-processing

FFmpeg inaccurate outputs

Possible Duplicate:
ffmpeg: videos before and after conversion aren't the same length

Recently, I've been trying to use FFmpeg for an application which requires a VERY accurate manipulation when it comes to the time parameter (milliseconds resolution). Unfortunately, I was surprised to find out that FFmpeg's manipulation functionalities return some inaccurate results.

Here is the output of 'ffmpeg':

ffmpeg version 0.11.1 Copyright (c) 2000-2012 the FFmpeg developers
  built on Jul 25 2012 19:55:05 with gcc 4.2.1 (Apple Inc. build 5664)
  configuration: --enable-gpl --enable-shared --enable-pthreads --enable-libx264 --enable-libmp3lame
  libavutil      51. 54.100 / 51. 54.100
  libavcodec     54. 23.100 / 54. 23.100
  libavformat    54.  6.100 / 54.  6.100
  libavdevice    54.  0.100 / 54.  0.100
  libavfilter     2. 77.100 /  2. 77.100
  libswscale      2.  1.100 /  2.  1.100
  libswresample   0. 15.100 /  0. 15.100
  libpostproc    52.  0.100 / 52.  0.100

Now, let's assume I want to rip the audio track of 'foo.mov'. Here is the relevant output of 'ffmpeg -i foo.mov':

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'foo.mov':
  Metadata:
    major_brand     : qt  
    minor_version   : 0
    compatible_brands: qt  
    creation_time   : 2012-07-24 23:16:08
  Duration: 00:00:40.38, start: 0.000000, bitrate: 805 kb/s
    Stream #0:0(und): Video: h264 (Baseline) (avc1 / 0x31637661), yuv420p, 480x360, 733 kb/s, 24.46 fps, 29.97 tbr, 600 tbn, 1200 tbc
    Metadata:
      rotate          : 90
      creation_time   : 2012-07-24 23:16:08
      handler_name    : Core Media Data Handler
    Stream #0:1(und): Audio: aac (mp4a / 0x6134706D), 44100 Hz, mono, s16, 63 kb/s
    Metadata:
      creation_time   : 2012-07-24 23:16:08
      handler_name    : Core Media Data Handler

As you probably noticed, the video file duration is 00:00:40.38. Using the following command, I ripped it's audio track:

'ffmpeg -i foo.mov foo.wav'

Output:

Output #0, wav, to 'foo.wav':
  Metadata:
    major_brand     : qt  
    minor_version   : 0
    compatible_brands: qt  
    creation_time   : 2012-07-24 23:16:08
    encoder         : Lavf54.6.100
    Stream #0:0(und): Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, mono, s16, 705 kb/s
    Metadata:
      creation_time   : 2012-07-24 23:16:08
      handler_name    : Core Media Data Handler
Stream mapping:
  Stream #0:1 -> #0:0 (aac -> pcm_s16le)
Press [q] to stop, [?] for help
size=3482kB time=00:00:40.42 bitrate= 705.6kbits/s    
video:0kB audio:3482kB global headers:0kB muxing overhead 0.001290%

As you can see, the output file is longer than the file in the input.

Another example is audio (and video) file trimming: Let's assume I would like to use ffmpeg for audio file trimming. I used the next command:

'ffmpeg -t 00:00:10.000 -i foo.wav trimmed_foo.wav -ss 00:00:25.000'

Output:

[wav @ 0x10180e800] max_analyze_duration 5000000 reached at 5015510
Guessed Channel Layout for  Input Stream #0.0 : mono
Input #0, wav, from 'foo.wav':
  Duration: 00:00:40.42, bitrate: 705 kb/s
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, mono, s16, 705 kb/s
Output #0, wav, to 'trimmed_foo.wav':
  Metadata:
    encoder         : Lavf54.6.100
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, mono, s16, 705 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (pcm_s16le -> pcm_s16le)
    Press [q] to stop, [?] for help
size=864kB time=00:00:10.03 bitrate= 705.6kbits/s    
video:0kB audio:864kB global headers:0kB muxing overhead 0.005199%

Again, the output file is 30 milliseconds longer than I expected.

I tried, for a long time, to research the issue without any success. When I use audacity for the same functionality, it does it very accurately!

Does anyone have any idea how to solve this problem?

Solution

TL; DR: FFmpeg and your iOS device are the wrong tools for your needs.

There are a host of problems to cover, so in no particular order:

Neither FFmpeg or the underlying codecs that you're working with are designed for the sort of time resolution you want. 40ms is 1 frame at 25fps, which just isn't much in the context of most video and audio files. Hyperaccurate timing isn't a design feature of common audio codecs, like your source AAC data, and FFmpeg follows suit.
Don't do any transcoding! If you want to change the data as little as possible... don't change it. You can use ffmpeg -i in.mov -c:a copy out.m4a to extract the audio stream exactly instead of transcoding it to wav format.
Use FFprobe instead of FFmpeg to get file information. FFmpeg just gives some cursory information about input and output files because its default logging is overly verbose. FFprobe is usually bundled with FFmpeg and is specifically designed to extract information in a convenient form. Use ffprobe -show_streams -show_format in.mov to get information.
Increase your -analyzeduration! You might've noticed the note about max_analyze_duration reached in your output. From the docs that's how many microseconds are going to actually be read of the file before FFmpeg estimates the total length. Again, for most purposes knowing the length of the file to microsecond accuracy isn't feasible or desirable and it is expensive. If you want hyperaccuracy, make sure that that parameter is set much higher, probably longer than your actual input.
Be a bit more careful with your option placement. This is fairly minor, but I thought that I should bring it up in case you're unaware. Many of FFmpeg's options behave differently depending on the order they're given with respect to input and output. Notably -ss that you're using. You have it after the input, which is where you want it, but you also have the output-only option -t at the beginning which is... weird. The more natural way to order that command would be:
```
ffmpeg -i foo.wav -ss 00:00:25.000 -t 00:00:10.000 trimmed_foo.wav
```
All the timing commands accept input in seconds (including fractional seconds), so you don't have to prepend everything with 00:00:.
Distinguish container length and actual stream length. I don't use Audacity, but I wouldn't be surprised if it showed extreme accuracy because it was lying to you about what it was doing. Actually trimming audio or video data with millisecond accuracy would require not merely choosing which frames from the input are included in the output (which is accurate to 40ms at 25fps!) but changing frame data to insert silence at the end. Far easier would be to just trim based on frame inclusion, then put the hyper-accurate length in the container file metadata. Some playback software might actually cut off based on that number, but again, most AV software just isn't designed for that level of accuracy. I would be curious to see what FFmpeg shows as the length of a file trimmed by Audacity.

That's all that springs to mind now, but I'm happy to give more feedback once you've had a chance to incorporate some of the above. My guess would be that this sort of accuracy is required for research pruposes, in which case, happy researching!