Search code examples
htmlpowerbipowerquerydecodem

Decoding HTML symbol decimal numbers into actual symbols in Power BI


The data has HTML values inside text:

col1
-------------------------------------------------------------
Drell-Yan Process Background Estimation Using eμ Method
Expressions of constant π
Computational Analysis of Protein β-Structure
δ13C and 14C Measurements in Aerosol Particles

I need to get actual symbols instead of all the HTML decimal values.


Solution

  • Html.Table in powerquery-m can decode HTML decimal values into visible symbols:

    let
        Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("PY7BCsIwDIZfJcyrg03nUHaa06MwEA8ydyg1arFrR5KCvr1l4C7JId/3/+m65EBobXpVDlryGplhr/T7ST64OxxZzKDEeAcXNu4JeAtZtioXu01ZwQnl5e9Jv+yS42ek6EaQwT9Axy3KCfzxMqsmrvHDGGRKVBbqOL5sJiW2Cxo3G0VZpWehoCUQTu582Vb5ugFDkBdNfEJxJAZ0whD9Gsmzt9AqEqMtctL3Pw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [col1 = _t]),
        #"Changed Type" = Table.TransformColumnTypes(Source,{{"col1", type text}}),
        #"Added Custom" = Table.AddColumn(#"Changed Type", "HtmlTable", each Html.Table([col1],{{"HtmlDecoded",":root"}})),
        #"Expanded HtmlTable" = Table.ExpandTableColumn(#"Added Custom", "HtmlTable", {"HtmlDecoded"}, {"HtmlDecoded"})
    in
        #"Expanded HtmlTable"
    

    HTML symbols decoded in another column

    Or in place:

    let
        Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("PY7BCsIwDIZfJcyrg03nUHaa06MwEA8ydyg1arFrR5KCvr1l4C7JId/3/+m65EBobXpVDlryGplhr/T7ST64OxxZzKDEeAcXNu4JeAtZtioXu01ZwQnl5e9Jv+yS42ek6EaQwT9Axy3KCfzxMqsmrvHDGGRKVBbqOL5sJiW2Cxo3G0VZpWehoCUQTu582Vb5ugFDkBdNfEJxJAZ0whD9Gsmzt9AqEqMtctL3Pw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [col1 = _t]),
        #"Changed Type" = Table.TransformColumnTypes(Source,{{"col1", type text}}),
        Decoded = Table.TransformColumns(#"Changed Type", {{"col1", each Table.FirstValue(Html.Table(_,{{"HtmlDecoded",":root"}})) }} )
    in
        Decoded
    

    HTML symbols decoded in place