Search code examples
javascriptregexstring

Extract all matching font-family attributes from string


I'm trying to extract values of all attributes in an HTML snippet that match font-family:"" pattern

Example input:

`<body lang=EN-ZA style='tab-interval:36.0pt;word-wrap:break-word'>
<!--StartFragment--><span style='font-size:14.0pt;line-height:107%;
font-family:"Comic Sans MS";:"Times New Roman";
:"Times New Roman";mso-font-kerning:0pt;mso-ligatures:none;
mso-ansi-language:EN-ZA;mso-fareast-language:EN-ZA;mso-bidi-language:AR-SA'>
Test Font 1
 </span><span
style='font-size:14.0pt;line-height:107%;font-family:"Boucherie Block";
:"Times New Roman";:"Times New Roman";
mso-font-kerning:0pt;mso-ligatures:none;mso-ansi-language:EN-ZA;mso-fareast-language:
EN-ZA;mso-bidi-language:AR-SA'>Test Font 2 </span><!--EndFragment-->
</body>`

Required output:

Comic Sans MS

Boucherie Block

I've tried using the following regex:

var tmpStr = targetText.match('font-family:"(.*)";');

enter image description here

But this includes the font after the semicolon (Times New Roman) which I'm not interested in and it doesn't contain the other font family, which is supposed to be Boucherie Block. Any tips would be appreciated,if there's another way to get the required output without using regex I'm open to that,the main thing is to get both fonts out of the string


Solution

  • I would use [^"] and regex instead of a string

    const targetText = `<body lang=EN-ZA style='tab-interval:36.0pt;word-wrap:break-word'>
    <!--StartFragment--><span style='font-size:14.0pt;line-height:107%;
    font-family:"Comic Sans MS";:"Times New Roman";
    :"Times New Roman";mso-font-kerning:0pt;mso-ligatures:none;
    mso-ansi-language:EN-ZA;mso-fareast-language:EN-ZA;mso-bidi-language:AR-SA'>
    Test Font 1
     </span><span
    style='font-size:14.0pt;line-height:107%;font-family:"Boucherie Block";
    :"Times New Roman";:"Times New Roman";
    mso-font-kerning:0pt;mso-ligatures:none;mso-ansi-language:EN-ZA;mso-fareast-language:
    EN-ZA;mso-bidi-language:AR-SA'>Test Font 2 </span><!--EndFragment-->
    </body>`
    
    var tmpStr = targetText.matchAll(/font-family:"([^"]*)";/g);
    
    console.log([...tmpStr].map(e => e[1]))