How to insert frames to compensate for frames lost during capture

My original clip was 22:47 long. I captured the video in avi with Ut Video Lossless Codec at 29.97 fps, with pcm 16 bit unsigned audio. I am using Virtualdub with VHScrCap driver for capture. Virtualdub and mpc and potplayer play the captured file apparently too fast, but with the right audio pitch in the first 3-4 min, but high pitch in the rest of the video. The duration is 19:06, shorter than the original 22:47 (confirmed by mediainfo) The cause of the problem seems to be that I am losing more frames when capturing large HD frames.

Regular encoding

Encoding captured clip to mp4:

ffmpeg -ss 3.25 -i input.avi -map 0:0 -map 0:1 -threads 0 -c:v libx264 -profile:v main \
-preset:v medium -level 3.1 -x264opts crf=26.0 -aspect 16:9 -t 1112.69 \
-y -f mp4 -vf "crop=1432:808:4:46, hqdn3d=1.5:1.5:6:6, \
scale=1216:684, pad=1280:720:32:18" -c:a ac3 -ac 2 -ar 48000 -b:a 160k \
output.mp4

The output is 18:32 long, framerate is still 29:97. The audio pitch is OK in the first 2 minutes, and way too high in the rest of the video.

Trying to correct

I try to correct it in three steps by (1) encoding a video stream that is slowed down to 23.976 fps and extracting a wav audio stream, (2) slowing speed and pitch of audio and (3) remuxing video and audio: (1)

ffmpeg -ss 3.25 -i input.avi -threads 0 \
-c:v libx264 -profile:v main -preset:v medium -level 3.1 -x264opts crf=26.0 \
-aspect 16:9 -t 1390.862 -an -y -f mp4 -r 24000/1001 \
-vf "crop=1432:808:4:46, hqdn3d=1.5:1.5:6:6, scale=1216:684, pad=1280:720:32:18, \
setpts=1.25*PTS" video_out.mp4  \
-t 1112.69 -y -vn -f wav  audio_out.wav

(2) The wav audio stream is then slowed down with lower pitch with sox:

sox --norm audio_out.mp4.wav audio_out-24.wav speed 0.8

(3) The two streams are then remuxed with:

ffmpeg -i video_out.mp4 -i audio_out-24.wav -map 0:0 -map 1:0 -c:v copy \
-c:a ac3 -ac 2 -af aresample=resampler=soxr -ar 48000 -b:a 160k \
final_output.mp4

This time, the video duration (23:10) is closer to the original, the pitch is OK for the whole video except for the first 2-3 minutes, where it is (predictably) too low.

I have a sense that (1) the capture log, and ffprobe give the frame by frame information that show what is the 'instantaneous' real frame rate, and (2) that information is not used by ffmpeg encoding, but presumably could be used to correct the frame rate by inserting duplicate or interpolated frames to restitute the correct frame rate. I suspect I could get the information from (1), but have no clue how to do (2).

If someone familiar with this type of issue could give me some advice, and point me in the right direction, I would really appreciate.

Solution

Well, if anyone is interested, here is where I stand.

I am not sure if this is THE answer, but it is my answer for now. I found out that trying to correct and improve a poorly captured video is not a very good idea. This is what I am now trying to do to avoid loss of frames during capture and obtain a good quality video. Note: an easy way to find out if the capture is good is to watch the number of inserted frames vs total frames captured. (I use VirtualDub to capture, and those numbers are displayed in real time). Try to get zero inserted frame.

Restart the computer to eliminate those old processes that are running while you are trying to capture.
Look for any unnecessary processes in Windows Task Manager, and kill them.
Experiment with the size of the frame you want to capture. This depends on the processing power of your CPU. I have found out that I shouldn't try to capture 1920x1080 (I have an Intel i7-3770K which is probably above average), but I can do 1280x720.
I set my capture frame rate at 23.976 fps (NTSC), which is easier than 29.97 fps.
Select an encoder that is lossless and requires as little processing power as possible. I use UT Video Codec YUV420 for video and no audio compression (PCM). Given that, you need plenty of GB to store the captured video. It can take 20GB for an hour. (I do the compression separately with a script that uses ffmpeg and encode that 20GB+ video into a 500 MB file)

Given those precautions, I can capture these videos with virtually no lost frame, and then smooth playing.

For further study: I have been wondering if trading a lower frame rate for a higher definition could be a good trade off. For example, capturing at 20 fps instead of 23.976, and then find a way do add frames later in a way that does not shock the eye. (I assume that should be done with avisynth's ConvertFPS() function, not ffmpeg) I have not done any experimentation of this method yet.