Specifying parameters to create videos for ffmpeg's concat demuxer (to avoid a large re-encode)

Thursday, 19 April 2018

Specifying parameters to create videos for ffmpeg's concat demuxer (to avoid a large re-encode)

ffmpeg can be used to concatenate files together:

If you have media files with exactly the same codec and codec parameters you can concatenate them [...]

(emphasis mine) My intention¹ is to produce media files with the same codec and parameters so that I can take advantage of concat without incurring a long re-encode.

Preamble:

I have a file I would like to cut and keep useful parts from. I have written a python script to find the nearest keyframe to the desired cut point, and cut there, since when doing a stream copy ffmpeg can only use I-frames:

Using -ss as input option together with -c:v copy might not be accurate since ffmpeg is forced to only use/split on i-frames.

As it happens, the splits aren't happening at exactly the right moment, but are close enough for the moment that I can focus on another part of the equation. If I use the concat demuxer at this point, the different parts get joined together perfectly- so far so good!

However, I would like there to be smooth transitions between these segments, so I have further split these segments so that the short ends can be used to create a crossfade transition without re-encoding the entire set of files.

A basic diagram would probably help illustrate this:

  [111AAAA111BBBBB111111CCCCCCC1111DDDDD111]   | (original file)
     [AAAA] [BBBBB]    [CCCCCCC]  [DDDDD]      | (desired clips extracted)
[AAA] [A][B] [BBB] [B][C] [CCCCC] [C][D] [DDDD]| (split ends from clips)
      [AAA][ab][BBB][bc][CCCCC][cd][DDD]       | (transitions between short ends)
            [AAAabBBBbcCCCCCcdDDD]             | (intended output)

Problem:

This is where I've gotten to. When I used ffmpeg's concat demuxer to join the clips above I get significant video and audio artifacts on playback. My guess is there is a mismatch in codec parameters, as noted as a prerequisite way up at the top of this question. So, checking the video with ffprobe gives:

$ ffprobe -i ab-transition.mkv 2>&1 | grep Stream.*Video ; ffprobe -i B.mkv 2>&1 | grep Stream.*Video
Stream #0:0: Video: h264 (Main), yuv420p(tv, bt709/bt709/iec61966-2-1), 1280x720, SAR 1:1 DAR 16:9, 62.50 fps, 62.50 tbr, 1k tbn, 120 tbc (default)
Stream #0:0: Video: h264 (Main), yuv420p(tv, bt709/bt709/iec61966-2-1), 1280x720 [SAR 1:1 DAR 16:9], 62.50 fps, 62.50 tbr, 1k tbn, 125 tbc (default)

(I have omitted audio stream output as the streams have ostensibly the same parameters, yet the audio is not joined correctly)

There are differences. I used the -show_streams to get more detailed info, which are available at http://pastebin.com/4vcnDYtj (single blank line separating 2 outputs). diffing the output gives:

7c7
< codec_time_base=1/120
---
> codec_time_base=1/125
70,71c70,71
< start_pts=12
< start_time=0.012000
---
> start_pts=11
> start_time=0.011000

Update:

I have found options and matched parameters for everything that I can see except the codec time base (tbc). Is there a setting which will allow me to set codec_time_base (tbc)? Setting -r has no effect.

Update 2: Fearing this question was too niche for SU, I asked the question of the ffmpeg-user mailing list. Unfortunately -time_base is not an appropriate encoder option in this case:

This is an option for FFmpeg-internal encoders that you try to use for an external encoder (x264).

And more unfortunately, when I asked about general feasibility, the reply was

I don't think this is possible.

I have asked for clarification and possibilities surrounding the original encoding software - in this case OBS - which is potentially less flexible in option specification than ffmpeg due to having to match live stream consumer (Twitch) format specifications. I've yet to receive a reply from the mailing list, but have asked in the OBS forums as well.

More crucially, will controlling for these allow me to use the concat demuxer in ffmpeg to join these together without the need for a long encode process? Many thanks in advance.

_{(I realise this is a wall-of-text-and-a-half, so additions, subtractions or clarification suggestions are welcome of course. I would link to more official info but being <10 rep I cannot include more than 2 links!)}

1: For more context, see my related question: How to efficiently and automatedly join video clips using short transitions?

Notes

Thursday, 19 April 2018