Search code examples
ffmpegsubtitleandroid-ffmpegvideo-subtitles

Font Size Mismatch When Using Fallback Fonts in FFmpeg ASS Subtitles


I’m facing an issue with font size mismatch in ASS subtitles rendered by FFmpeg when combining English text with Arabic or Persian characters.

Steps to Reproduce:

1.Subtitle File: Here’s an example of the ASS subtitle file:

[Script Info]
Title: Generated ASS
ScriptType: v4.00+
WrapStyle: 2
ScaledBorderAndShadow: yes
YCbCr Matrix: none
PlayResX: 1920
PlayResY: 1080

[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: Default, ABeeZee, 40, &H00FFFFFF, &H000000FF, &H00000000, &H00000000, -1, 0, 0, 0, 100, 100, 0, 0, 1, 1, 0, 2, 10, 10, 10, 1

[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:01.00,0:00:05.00,Default,,0,0,0,,Hello! This is English text.
Dialogue: 0,0:00:05.00,0:00:10.00,Default,,0,0,0,,سلام! This is mixed Arabic/English text.
  1. Command Used:
ffmpeg -i input.mp4 -vf "ass=subtitle.ass" output.mp4
  1. Screenshots: Below are two screenshots of the issue:

• Text rendered with the main font (ABeeZee) is correct.

• Text rendered with the fallback font (Cinema) appears larger/smaller in size.

  1. Debug Logs:
[Parsed_ass_0 @ 0xb4000079f04f8210] libass API version: 0x1701000
[Parsed_ass_0 @ 0xb4000079f04f8210] 
[Parsed_ass_0 @ 0xb4000079f04f8210] libass source: commit: 0.17.1-0-ge8ad72accd3a84268275a9385beb701c9284e5b3
[Parsed_ass_0 @ 0xb4000079f04f8210] 
[Parsed_ass_0 @ 0xb4000079f04f8210] Shaper: FriBidi 1.0.13 (SIMPLE) HarfBuzz-ng 8.0.1 (COMPLEX)
[Parsed_ass_0 @ 0xb4000079f04f8210] 
[Parsed_ass_0 @ 0xb4000079f04f8210] Using font provider fontconfig
[Parsed_ass_0 @ 0xb4000079f04f8210] 
[Parsed_ass_0 @ 0xb4000079f04f8210] Added subtitle file: '/data/***/ass_file.ass' (4 styles, 29 events)
[Parsed_ass_0 @ 0xb4000079f04f8210] 
[Parsed_ass_0 @ 0xb4000079f04f8210] fontselect: (ABeeZee, 400, 0) -> /data/***/ABeeZee_regular.ttf, 0, ABeeZee-Regular
[Parsed_ass_0 @ 0xb4000079f04f8210] 
[Parsed_ass_0 @ 0xb4000079f04f8210] Glyph 0x633 not found, selecting one more font for (ABeeZee, 400, 0)
[Parsed_ass_0 @ 0xb4000079f04f8210] 
[Parsed_ass_0 @ 0xb4000079f04f8210] fontselect: (ABeeZee, 400, 0) -> /data/***/Cinema.ttf, 0, Cinema

Question:

How can I ensure that text rendered with fallback fonts has the same size as the main font? Are there specific settings in FFmpeg or the ASS subtitle format that can address this issue? Alternatively, is there a way to manually adjust the scaling of fallback fonts in FFmpeg?


Solution

  • This issue is partly due to the font and partly due to how text rendering works in ASS renderers. Your fallback font, Cinema (by MaryamSoft), has an unusually large win-descender, which is rather weird.

    Take a look at this picture: Line height box and baseline

    The red line is a baseline on which letters "sit". Below baseline is win-descender and above is win-ascender. Ascender space is for ascents (parts of letters that extends x-height) while win-ascender is actual line that determines upper limit of rendering font. Win-descender serves same purpose but below baseline.

    And because in Cinema font, win-descender is so unreasonably large, it creates unnecessary margin/padding that results in the font appearing smaller than intended.

    And those metrics are crucial in how ASS display font, let's say that text has font size of 40px in ASS (like in your example), then it is used as follows:

    • The text is rendered and constrained by the lines defined by the usWinAscent and usWinDescent values from the font's OS/2 table.
    • The rendering is then scaled so that the total line height (from win-ascender to win-descender) equals 40px, so-called "real dimension sizing". In the image above, the blue rectangles illustrate this and both are 40px tall. (This is different from, for example, CSS, where scaling is nominal, meaning it is scaled using unitsPerEm)

    (Of course this is a simplification of entire process to illustrate main idea)

    It is technically possible to kind-of mitigate this using ASS alone (by applying the bigger \fs override tag inline, artificially adjusting line spacing, using a large x rotation origin \org, and slightly rotating the text using \frz). However, I discourage such hacks (aside from the fact that it'll result in worse rendering performance and cause tiny distortion in the text, but to be honest it will be indistinguishable without a side-by-side comparison).

    A better approach would be to modify the font so that the usWinAscent and usWinDescent values are more reasonable. If the font lacks an OS/2 table, you’d need to adjust the sTypoAscender and sTypoDescender values instead. Be cautious not to overdo it, if you make values too low, it might cause parts of the glyphs to be clipped.

    The best solution is to select a fallback font that is better designed. It is also crucial to use fonts that both (normal and fallback) have similar win-ascender and win-descender ratio. Alternatively, you can use a single font with extensive script and language support, such as Noto Sans. The base version of Noto Sans includes a wide range of glyphs and can be extended with support for other scripts (in case of Arabic, Japanese and other languages).