Search code examples
ffmpegsubtitledrawtextvideo-subtitles

Text in ass subtitles is displayed smaller then should


I add subtitles to video by ffmpeg using two different ways. The first way is by using drawtext command and this way everything works perfectly. Here is the command

ffmpeg -i ./input.mp4 -vf "drawtext=text='reise':fontfile=../fonts/Audiowide-Regular.ttf:fontsize=55:fontcolor=white:x=0:y=683" -codec:a copy ./output.mp4

The second way is by using ass subtitles file. This way I got smaller letters and wrong y position for text. Below is the ass subtitle file content


[Script Info]
Title: Advanced Highlighted Subtitle Example
ScriptType: v4.00+
WrapStyle: 0
PlayResX: 1048
PlayResY: 750

[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: Default,Audiowide Regular,55,&HFFFFFF,&H00FFFFFF,&H00000000,&H00000000,1,0,0,0,100,100,0,0,0,0,0,2,10,10,10,1

[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text

Dialogue: 0,00:00:0.00,00:00:2.38,Default,,0,0,0,,{\pos(0,683)\an4}reise

And command for second approach

ffmpeg -i ./input.mp4 -vf "ass=../subtitles.ass:fontsdir=../fonts/Audiowide-Regular.ttf"  ../output.mp4

So they both get the same video, same font and same text. The problem is that in case of using ass file the text is much smaller and dislocatedresult by using ffmpeg drawtextresult by using ass subtitles

The numbers on axis indicate sizes by pixels. As you can see in second image it's much smaller and has wrong y coordinate. It seems like it has wrong scaling numbers. What is wrong with my ass file configs?

I have tried solution from [universalmediaserver]to remove PlayResX/Y, but it doesn't work. (https://www.universalmediaserver.com/forum/viewtopic.php?t=5907). I also tried to measure the text width in many other ways(like in html rendered in browser, canvas...), so I'm pretty sure that drawtext does give correctly rendered width. The problem is related to ass subtitles file. Also if I use popular fonts like Arial the deviation is much less.


Solution

  • FFmpeg's drawtext filter interprets text font size differently than ASS renderers (like Libass). The drawtext filter uses the font's nominal size (in pixels), scaling it according to the units per EM. In contrast, ASS renderers use the font's real dimensions for scaling, which they determine by summing the ascender field and minus value of descender field from the font's tables (such as OS/2 and hhea).

    So to match size between FFmpeg's drawtext and ASS, we need need to find way for calculating font real dimension size (ASS's) from nominal size (drawtext's). So let's firstly calculate base size of font that is then used for scaling it.

    For nominal size, we need to read unitsPerEm from Font Header Table, in case of Audiowide font it's 2048. For real dimension size, we need to get ascender and descender fields value, that can be found in hhea table, in case of Audiowide font it's ascender is 2027 and descender is -584.

    So then: Nominal size = unitsPerEm = 2048 Real dimension size = ascender - descender = 2027 - (-584) = 2611

    So then real dimension size is bigger by some scale.

    Scale = Real dimension size / Nominal size = 2611 / 2048 ≈ 1.279

    So we need to multiply original font size (55) by the scale factor: 55 * 1.279 ≈ 70.345

    Secondly, note that the drawtext filter uses different alignment than \an4 alignment tag in ASS that you used which corresponds to left-middle alignment. To match the positions, you should use \an7 (left-top alignment) in ASS.

    Thirdly, drawtext aligns text to the highest glyph (for historic reasons) instead of baseline plus ascent (how it's typically done), but you can change this by setting y_align=font in the drawtext filter.

    So here is corrected ASSv4+ script file:

    [Script Info]
    Title: Advanced Highlighted Subtitle Example
    ScriptType: v4.00+
    WrapStyle: 0
    PlayResX: 1048
    PlayResY: 750
    
    [V4+ Styles]
    Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding 
    Style: Default,Audiowide,70.345,&H00FFFFFF,&H00FFFFFF,&H00000000,&H00000000,-1,0,0,0,100,100,0,0,1,0,0,2,10,10,10,1
    
    [Events]
    Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
    
    Dialogue: 0,00:00:0.00,00:00:2.38,Default,,0,0,0,,{\pos(0,683)\an7}reise
    

    And corrected FFmpeg command:

    ffmpeg -i ./input.mp4 -vf "drawtext=text='reise':fontfile=../fonts/Audiowide-Regular.ttf:fontsize=55:fontcolor=white:x=0:y=683:y_align=font" -codec:a copy ./output.mp4

    And below is example how to read metric values from font by using Freetype (it's what drawtext and ASS renderer Libass use under the hood for rendering fonts) in Python:

    import freetype
    
    face = freetype.Face('path/to/your/fontfile.ttf')
    
    units_per_em = face.units_per_EM
    ascender = face.ascender
    descender = face.descender
    
    print(f"Units per EM: {units_per_em}")
    print(f"Ascender: {ascender}")
    print(f"Descender: {descender}")
    

    Result for Audiowide font should be:

    Units per EM: 2048
    Ascender: 2027
    Descender: -584
    

    And you can install needed library using:

    python -m pip install freetype-py