When compressing a set of images with libx264, why does frame rate affect final output size?

I'm using ffmpeg to encode a set of images as a short timelapse video, using libx264 codec. My first attempt, I encoded it at 30 FPS, using:

ffmpeg -r 30 -pattern_type glob -i "*.jpg" -vcodec libx264 -crf 30 -pix_fmt yuv420p output.mp4

With 60 frames, that gives me a 163 KB file that's 2 seconds long. Then I realized I needed it to be slower, so I re-ran the same command, but changed -r to 2. Now I have a file that's 30 seconds long, but the size jumped to 891 KB! The video quality looks perceptually the same.

How do I encode at a slower frame rate, without the final file size ballooning?

Notes: Some theories I had, and things I checked. First, to make sure ffmpeg wasn't duplicating frames in the longer verison, I check the I/P/B counts. The 30 FPS file had:

[libx264 @ 0x7f9b26001c00] frame I:1     Avg QP:30.67  size: 44649
[libx264 @ 0x7f9b26001c00] frame P:15    Avg QP:31.19  size:  5471
[libx264 @ 0x7f9b26001c00] frame B:44    Avg QP:31.45  size:   767

The 2 FPS file had:

[libx264 @ 0x7fcd32842200] frame I:1     Avg QP:21.29  size: 90138
[libx264 @ 0x7fcd32842200] frame P:15    Avg QP:22.48  size: 33686
[libx264 @ 0x7fcd32842200] frame B:44    Avg QP:26.29  size:  6674

So, the I/P/B counts are identical, but the QP is much lower for the 2 FPS file. To offset, I tried increasing -crf for the 2 FPS file, to get about the same target size, but that just gave me a very blurry video (had to go to crf=40). I tried messing with -minrate, -maxrate, -bt, none helped. I'm guessing there is some x264 codec setting which is frame rate dependent, but I'm at a loss trying to figure out which one (from what I understand, constant bitrate is affected by frame rate but CRF should not be, but maybe I'm misunderstanding it.

Solution

The CRF mode aims to obtain and maintain a certain quality level in its encoded output. If the same set of frames are to be shown at 25 fps then each frame's duration is 40 milliseconds and transient features will not be fully appreciated by a viewer. Encoders like x264/x265 will more aggressively optimize those frames. OTOH, if shown at 2 fps, each frame is visible for half a second, and so there's less leeway to skimp on preserving perceptual quality.

For x264, this is the message of the commit that implements that logic.

VFR/framerate-aware ratecontrol, part 2

MB-tree and qcomp complexity estimation now consider the duration of a frame in their calculations. This is very important for visual optimizations, as frames that last longer are inherently more important quality-wise. Improves VFR-aware PSNR as much as 1-2db on extreme test cases, ~0.5db on more ordinary VFR clips (e.g. deduped anime episodes).

WARNING: This change redefines x264's internal quality measurement. x264 will now scale its quality based on the framerate of the video due to the aforementioned frame duration logic. That is, --crf X will give lower quality per frame for a 60fps video than for a 30fps one. This will make --crf closer to constant perceptual quality than previously. The "center" for this change is 25fps: that is, videos lower than 25fps will go up in quality at the same CRF and videos above will go down. This choice is completely arbitrary.

Note that to take full advantage of this, x264 must encode your video at the correct framerate, with the correct timestamps.