Tuesday, 29 August 2017

mp4 - Best settings for FFMpeg with NVENC


I'm using my FFMPEG with the suport of my GPU (NVENC) to convert files from my satelite receiver (SD, mpeg2 .TS-Files) into h264 .mp4-files


Here is the line i'm using


ffmpeg -i "e:\input.ts" -vcodec h264_nvenc -preset slow -level 4.1
-qmin 10 -qmax 52 "e:\output.mp4"

But the quality is not as good as expected. And the full power of my system is not used:


enter image description here


Only 11% GPU and 30% CPU usage.


Question: Are there a few improvements I can make to improve the quality by equal file size and use more calculating power of my Geforce GTX 1080?


I found a few parameters from 林正浩 to change but -preset slow should already be the best quality approach right?



Answer



Here is a rough guide to tuning the encoder:


We'll start from the basics, as it would be detrimental to jump into the conclusion that a quick barrage of options will suddenly improve expected output without understanding the desired objectives and expectations:


1. Start by understanding the encoder's options.


For NVENC-based encoders, start with learning the options each encoder takes (Note that I'm on Linux, which is why I'm using xclip to copy the codec options to the clipboard prior to pasting them here):


(a). For the H.264 encoder:


ffmpeg -hide_banner -h encoder=h264_nvenc | xclip -sel clip

Output:


Encoder h264_nvenc [NVIDIA NVENC H.264 encoder]:
General capabilities: delay
Threading capabilities: none
Supported pixel formats: yuv420p nv12 p010le yuv444p yuv444p16le bgr0 rgb0 cuda
h264_nvenc AVOptions:
-preset E..V.... Set the encoding preset (from 0 to 11) (default medium)
default E..V....
slow E..V.... hq 2 passes
medium E..V.... hq 1 pass
fast E..V.... hp 1 pass
hp E..V....
hq E..V....
bd E..V....
ll E..V.... low latency
llhq E..V.... low latency hq
llhp E..V.... low latency hp
lossless E..V....
losslesshp E..V....
-profile E..V.... Set the encoding profile (from 0 to 3) (default main)
baseline E..V....
main E..V....
high E..V....
high444p E..V....
-level E..V.... Set the encoding level restriction (from 0 to 51) (default auto)
auto E..V....
1 E..V....
1.0 E..V....
1b E..V....
1.0b E..V....
1.1 E..V....
1.2 E..V....
1.3 E..V....
2 E..V....
2.0 E..V....
2.1 E..V....
2.2 E..V....
3 E..V....
3.0 E..V....
3.1 E..V....
3.2 E..V....
4 E..V....
4.0 E..V....
4.1 E..V....
4.2 E..V....
5 E..V....
5.0 E..V....
5.1 E..V....
-rc E..V.... Override the preset rate-control (from -1 to INT_MAX) (default -1)
constqp E..V.... Constant QP mode
vbr E..V.... Variable bitrate mode
cbr E..V.... Constant bitrate mode
vbr_minqp E..V.... Variable bitrate mode with MinQP (deprecated)
ll_2pass_quality E..V.... Multi-pass optimized for image quality (deprecated)
ll_2pass_size E..V.... Multi-pass optimized for constant frame size (deprecated)
vbr_2pass E..V.... Multi-pass variable bitrate mode (deprecated)
cbr_ld_hq E..V.... Constant bitrate low delay high quality mode
cbr_hq E..V.... Constant bitrate high quality mode
vbr_hq E..V.... Variable bitrate high quality mode
-rc-lookahead E..V.... Number of frames to look ahead for rate-control (from 0 to INT_MAX) (default 0)
-surfaces E..V.... Number of concurrent surfaces (from 0 to 64) (default 0)
-cbr E..V.... Use cbr encoding mode (default false)
-2pass E..V.... Use 2pass encoding mode (default auto)
-gpu E..V.... Selects which NVENC capable GPU to use. First GPU is 0, second is 1, and so on. (from -2 to INT_MAX) (default any)
any E..V.... Pick the first device available
list E..V.... List the available devices
-delay E..V.... Delay frame output by the given amount of frames (from 0 to INT_MAX) (default INT_MAX)
-no-scenecut E..V.... When lookahead is enabled, set this to 1 to disable adaptive I-frame insertion at scene cuts (default false)
-forced-idr E..V.... If forcing keyframes, force them as IDR frames. (default false)
-b_adapt E..V.... When lookahead is enabled, set this to 0 to disable adaptive B-frame decision (default true)
-spatial-aq E..V.... set to 1 to enable Spatial AQ (default false)
-temporal-aq E..V.... set to 1 to enable Temporal AQ (default false)
-zerolatency E..V.... Set 1 to indicate zero latency operation (no reordering delay) (default false)
-nonref_p E..V.... Set this to 1 to enable automatic insertion of non-reference P-frames (default false)
-strict_gop E..V.... Set 1 to minimize GOP-to-GOP rate fluctuations (default false)
-aq-strength E..V.... When Spatial AQ is enabled, this field is used to specify AQ strength. AQ strength scale is from 1 (low) - 15 (aggressive) (from 1 to 15) (default 8)
-cq E..V.... Set target quality level (0 to 51, 0 means automatic) for constant quality mode in VBR rate control (from 0 to 51) (default 0)
-aud E..V.... Use access unit delimiters (default false)
-bluray-compat E..V.... Bluray compatibility workarounds (default false)
-init_qpP E..V.... Initial QP value for P frame (from -1 to 51) (default -1)
-init_qpB E..V.... Initial QP value for B frame (from -1 to 51) (default -1)
-init_qpI E..V.... Initial QP value for I frame (from -1 to 51) (default -1)
-qp E..V.... Constant quantization parameter rate control method (from -1 to 51) (default -1)
-weighted_pred E..V.... Set 1 to enable weighted prediction (from 0 to 1) (default 0)
-coder E..V.... Coder type (from -1 to 2) (default default)
default E..V....
auto E..V....
cabac E..V....
cavlc E..V....
ac E..V....
vlc E..V....

(b). For the HEVC/H.265 encoder:


ffmpeg -hide_banner -h encoder=hevc_nvenc | xclip -sel clip

Output:


Encoder hevc_nvenc [NVIDIA NVENC hevc encoder]:
General capabilities: delay
Threading capabilities: none
Supported pixel formats: yuv420p nv12 p010le yuv444p yuv444p16le bgr0 rgb0 cuda
hevc_nvenc AVOptions:
-preset E..V.... Set the encoding preset (from 0 to 11) (default medium)
default E..V....
slow E..V.... hq 2 passes
medium E..V.... hq 1 pass
fast E..V.... hp 1 pass
hp E..V....
hq E..V....
bd E..V....
ll E..V.... low latency
llhq E..V.... low latency hq
llhp E..V.... low latency hp
lossless E..V.... lossless
losslesshp E..V.... lossless hp
-profile E..V.... Set the encoding profile (from 0 to 4) (default main)
main E..V....
main10 E..V....
rext E..V....
-level E..V.... Set the encoding level restriction (from 0 to 186) (default auto)
auto E..V....
1 E..V....
1.0 E..V....
2 E..V....
2.0 E..V....
2.1 E..V....
3 E..V....
3.0 E..V....
3.1 E..V....
4 E..V....
4.0 E..V....
4.1 E..V....
5 E..V....
5.0 E..V....
5.1 E..V....
5.2 E..V....
6 E..V....
6.0 E..V....
6.1 E..V....
6.2 E..V....
-tier E..V.... Set the encoding tier (from 0 to 1) (default main)
main E..V....
high E..V....
-rc E..V.... Override the preset rate-control (from -1 to INT_MAX) (default -1)
constqp E..V.... Constant QP mode
vbr E..V.... Variable bitrate mode
cbr E..V.... Constant bitrate mode
vbr_minqp E..V.... Variable bitrate mode with MinQP (deprecated)
ll_2pass_quality E..V.... Multi-pass optimized for image quality (deprecated)
ll_2pass_size E..V.... Multi-pass optimized for constant frame size (deprecated)
vbr_2pass E..V.... Multi-pass variable bitrate mode (deprecated)
cbr_ld_hq E..V.... Constant bitrate low delay high quality mode
cbr_hq E..V.... Constant bitrate high quality mode
vbr_hq E..V.... Variable bitrate high quality mode
-rc-lookahead E..V.... Number of frames to look ahead for rate-control (from 0 to INT_MAX) (default 0)
-surfaces E..V.... Number of concurrent surfaces (from 0 to 64) (default 0)
-cbr E..V.... Use cbr encoding mode (default false)
-2pass E..V.... Use 2pass encoding mode (default auto)
-gpu E..V.... Selects which NVENC capable GPU to use. First GPU is 0, second is 1, and so on. (from -2 to INT_MAX) (default any)
any E..V.... Pick the first device available
list E..V.... List the available devices
-delay E..V.... Delay frame output by the given amount of frames (from 0 to INT_MAX) (default INT_MAX)
-no-scenecut E..V.... When lookahead is enabled, set this to 1 to disable adaptive I-frame insertion at scene cuts (default false)
-forced-idr E..V.... If forcing keyframes, force them as IDR frames. (default false)
-spatial_aq E..V.... set to 1 to enable Spatial AQ (default false)
-temporal_aq E..V.... set to 1 to enable Temporal AQ (default false)
-zerolatency E..V.... Set 1 to indicate zero latency operation (no reordering delay) (default false)
-nonref_p E..V.... Set this to 1 to enable automatic insertion of non-reference P-frames (default false)
-strict_gop E..V.... Set 1 to minimize GOP-to-GOP rate fluctuations (default false)
-aq-strength E..V.... When Spatial AQ is enabled, this field is used to specify AQ strength. AQ strength scale is from 1 (low) - 15 (aggressive) (from 1 to 15) (default 8)
-cq E..V.... Set target quality level (0 to 51, 0 means automatic) for constant quality mode in VBR rate control (from 0 to 51) (default 0)
-aud E..V.... Use access unit delimiters (default false)
-bluray-compat E..V.... Bluray compatibility workarounds (default false)
-init_qpP E..V.... Initial QP value for P frame (from -1 to 51) (default -1)
-init_qpB E..V.... Initial QP value for B frame (from -1 to 51) (default -1)
-init_qpI E..V.... Initial QP value for I frame (from -1 to 51) (default -1)
-qp E..V.... Constant quantization parameter rate control method (from -1 to 51) (default -1)
-weighted_pred E..V.... Set 1 to enable weighted prediction (from 0 to 1) (default 0)

2. Understand the hardware's limitations, and stick to sane defaults first before applying options:


Refer to this answer for the hardware limitations you'll run into with NVENC, especially for HEVC encodes on Pascal.


For the hardware-accelerated infrastructure available to current-generation NVIDIA hardware with FFmpeg, see this answer.


Then, using that information, proceed to the next step.


3. Syntax is critical:


Here is the order in which you have to pass arguments to FFmpeg:


(a). Call up the binary.


(b). Pass any arguments to FFmpeg (such as -loglevel to it directly) before declaring inputs.


(c). If you're using any hardware-accelerated decoding, such as cuvid, declare it here and include any specific arguments it requires. At this point, it would be imperative to mention that decoders have specific constraints, such as expected input resolutions, supported codecs, etc, and as such, it's recommended that in production, to determine and validate the need for hardware-accelerated decoders as failure at this stage results in a failed encode and is unrecoverable. In fact, the MPV devs have mentioned this repeatedly, don't rely on hardware-accelerated decoding for mission-critical content delivery.


(d). Declare your inputs. For streams, use the URL and if needed, prepend extra flags (such as buffer sizes) as needed. For local resources (on an accessible filesystem), the absolute file path is needed.


(e). Optionally, insert a filter. This is needed for functions such as resize,pixel format conversations, de-interlacing, etc. Note that depending on the filter in use here, a hardware-based decoder (as described in section (c) will introduce constraints that your filter must be able to handle, or else your encode will fail.


(f). Call up the appropriate video and audio encoders, and pass necessary arguments to them, such as mappings, bitrates, encoder presets, etc. When it comes to bitrates, ensure that your desired values are set via the -b:v, -maxrate:v and -bufsize:v options. Do not leave these blank. This is a good starting point on why these values matter. As always, start by specifying a preset. Scroll down to the bottom to see notes on the performance impact of presets with this particular encoder.


(g). Whereas FFmpeg can deduce the required output format of a file depending on the selected extension of the output file, it is recommended to explicitly declare the output format (through the -f option) so that extra options can be passed to the underlying muxer if needed, as is often the case with streaming formats such as HLS, mpegts and DASH.


(h). The absolute path to the output file.


With your example above, quoted as:


ffmpeg -i "e:\input.ts" -vcodec h264_nvenc -preset slow -level 4.1
-qmin 10 -qmax 52 "e:\output.mp4"

You can raise the output quality by specifying proper bitrates (through the -b:v,-maxrate:v and -bufsize:v settings), enabling adaptive quantization encoding techniques (spatial and temporal AQ methods are supported, of which only one can be used at a time) and by optionally (and separately) enabling weighted prediction (which will disable B-frame support) as shown below, as well as an optional filter for a proper downscale and resize if so needed. The example below shows a snippet handling mpegts input encoded in mpeg2:


ffmpeg -loglevel debug -threads 4 -hwaccel cuvid -c:v mpeg2_cuvid -i "e:\input.ts" \
-filter:v hwupload_cuda,scale_npp=w=1920:h=1080:interp_algo=lanczos \
-c:v h264_nvenc -b:v 4M -maxrate:v 5M -bufsize:v 8M -profile:v main \
-level:v 4.1 -rc:v vbr_hq -rc-lookahead:v 32 \
-spatial_aq:v 1 -aq-strength:v 15 -coder:v cabac \
-f mp4 "e:\output.mp4"

Warning: Note that weighted prediction (-weighted_pred) cannot be enabled at the same time as adaptive quantization. Attempting to do so will result in encoder initialization failure.


The snippet above assumes that the input file is an MPEG2 stream. If that's not the case, switch to the correct CUVID decoder after analyzing it:


ffprobe -i e:\input.ts

If its' H.264/AVC, modify the snippet as shown below:


ffmpeg -loglevel debug -threads 4 -hwaccel cuvid -c:v h264_cuvid -i "e:\input.ts" \
-filter:v hwupload_cuda,scale_npp=w=1920:h=1080:interp_algo=lanczos \
-c:v h264_nvenc -b:v 4M -maxrate:v 5M -bufsize:v 8M -profile:v main \
-level:v 4.1 -rc:v vbr_hq -rc-lookahead:v 32 -spatial_aq:v 1 \
-aq-strength:v 15 -coder:v cabac \
-f mp4 "e:\output.mp4"

I have noticed that enabling either adaptive quantization OR weighted prediction options for NVENC may introduce issues with stability, particularly with specific device driver combinations. Where possible, consider using B-frames (no more than 3) combined with the generic option -refs:v set to 16 or thereabouts, instead of toggling on AQ and weighted prediction:


ffmpeg -loglevel debug -threads 4 -hwaccel cuvid -c:v h264_cuvid -i "e:\input.ts" \
-filter:v hwupload_cuda,scale_npp=w=1920:h=1080:interp_algo=lanczos \
-c:v h264_nvenc -b:v 4M -maxrate:v 5M -bufsize:v 8M -profile:v main \
-level:v 4.1 -rc:v vbr_hq -rc-lookahead:v 32 -refs:v 16 \
-bf:v 3 -coder:v cabac \
-f mp4 "e:\output.mp4"

With Turing, in particular, you may also benefit from enabling B-frames for reference as shown below (see the toggle -b_ref_mode:v middle):


ffmpeg -loglevel debug -threads 4 -hwaccel cuvid -c:v h264_cuvid -i "e:\input.ts" \
-filter:v hwupload_cuda,scale_npp=w=1920:h=1080:interp_algo=lanczos \
-c:v h264_nvenc -b:v 4M -maxrate:v 5M -bufsize:v 8M -profile:v main \
-level:v 4.1 -rc:v vbr_hq -rc-lookahead:v 32 -refs:v 16 \
-bf:v 3 -coder:v cabac -b_ref_mode:v middle \
-f mp4 "e:\output.mp4"

An extra note on thread counts (passed to ffmpeg via the -threads option):


More encoder threads beyond a certain threshold increases latency and will have a higher encoding memory footprint. Quality degradation is more prominent with higher thread counts in constant bitrate modes and near-constant bitrate mode called VBV (video buffer verifier), due to increased encode delay. Keyframes need more data then other frame types to avoid pulsing poor quality keyframes.


Zero-delay or sliced thread mode has no delay, but this option farther worsens multi-threads quality in supported encoders.


It's therefore wise to limit thread counts on encodes where latency matters, as the perceived encoder throughput increase offsets any advantages it may bring in the long term.


And since you're on Windows, you may want to remove the shell escapes \ above as I'm writing this from a Unix box testing the command above.


Notes on performance impact with presets and interlaced encoding considerations:


For high throughput, low latency performance, ensure you're using either llhp or llhq presets. This is most useful for workloads such as live streaming where broader compatibility with a wider variety of devices is expected, and as such, performance crippling features such as B-frames can be omitted altogether for a good trade-off between higher bit-rates in use and throughput. Higher presets (such as the default medium) have rapidly diminishing returns in quality output while at the same time incurring significant slowdowns in encoder throughput. The quality difference between llhp and llhq as measured by Netflix's VMAF is virtually negligible, yet the encoder's performance boost (over ~30% on my test bed) with the former is definitely appreciable.


For the llhp and llhq presets, as well as other presets in use, you can also override the in-built rate-control methods by passing the -rc:v arguments as exposed by the encoder options. For example, with constant bit-rate encoding, you can specify -rc:v cbr (which is significantly faster than the cbr_ld_hq rate control method, bringing an additional ~20% boost to throughput). Note that the preset selected has the greatest impact on throughput, followed by the preset options (such as the rate control method in use) that you can optionally override if desired.


Consider your encoding workflow and adjust as necessary. Your mileage will definitely vary, based on your source content, filter chains in use, specific pltform configuration variables (such as your GPU and driver versions), etc.


In the same breath, note that NVIDIA has explicitly disabled interlaced encoding on Turing across all tiers, even in the 1660Ti line that uses the older Volta NVENC encoder. If you require interlaced encoding support, please switch to Pascal or older SKUs instead.


No comments:

Post a Comment

Where does Skype save my contact's avatars in Linux?

I'm using Skype on Linux. Where can I find images cached by skype of my contact's avatars? Answer I wanted to get those Skype avat...