Search code examples
phpregexregex-group

Extract usefull data from html with regex & php


I want to extract the parts of the raw html data in php. but i know better way to do this is regex.

 <div style=\"white-space: nowrap; margin: 10px\"><div style=\"white-space: nowrap; padding: 3px;\"><div style=\"width: 60px; height: 32px; vertical-align: top; display: inline-block;\"><div style=\"position: relative; width: 48px; height: 32px; vertical-align: top; display: inline-block; border: 2px solid rgb(255, 255, 255);\"><div style=\"position: absolute; width: 48px; height: 32px; vertical-align: top; display: inline-block; background-size: contain; background-image: url(https://steamcdn-a.akamaihd.net/apps/570/icons/econ/sockets/gem_stat.30d7935c1f0a1b9e8e28c691c2bd28f7d5f471bc.png)\"></div></div></div><div style=\"vertical-align: top; display: inline-block; margin-left: 12px padding: 2px\"><span style=\"font-size: 18px; white-space: normal; color: rgb(255, 255, 255)\">Team First Blood, Tower or Roshan: 5</span><br><span style=\"font-size: 12px\">TI8 Rune</span></div></div><div style=\"white-space: nowrap; padding: 3px;\"><div style=\"width: 60px; height: 32px; vertical-align: top; display: inline-block;\"><div style=\"position: relative; width: 48px; height: 32px; vertical-align: top; display: inline-block; border: 2px solid rgb(255, 255, 255);\"><div style=\"position: absolute; width: 48px; height: 32px; vertical-align: top; display: inline-block; background-size: contain; background-image: url(https://steamcdn-a.akamaihd.net/apps/570/icons/econ/sockets/gem_stat.30d7935c1f0a1b9e8e28c691c2bd28f7d5f471bc.png)\"></div></div></div><div style=\"vertical-align: top; display: inline-block; margin-left: 12px padding: 2px\"><span style=\"font-size: 18px; white-space: normal; color: rgb(255, 255, 255)\">Gem Carriers Killed: 0</span><br><span style=\"font-size: 12px\">Inscribed Gem</span></div></div><div style=\"white-space: nowrap; padding: 3px;\"><div style=\"width: 60px; height: 32px; vertical-align: top; display: inline-block;\"><div style=\"position: relative; width: 48px; height: 32px; vertical-align: top; display: inline-block; border: 2px solid rgb(255, 255, 255);\"><div style=\"position: absolute; width: 48px; height: 32px; vertical-align: top; display: inline-block; background-size: contain; background-image: url(https://steamcdn-a.akamaihd.net/apps/570/icons/econ/sockets/gem_stat.30d7935c1f0a1b9e8e28c691c2bd28f7d5f471bc.png)\"></div></div></div><div style=\"vertical-align: top; display: inline-block; margin-left: 12px padding: 2px\"><span style=\"font-size: 18px; white-space: normal; color: rgb(255, 255, 255)\">Kill Assists: 2220</span><br><span style=\"font-size: 12px\">Inscribed Gem</span></div></div><div style=\"white-space: nowrap; padding: 3px;\"><div style=\"width: 60px; height: 32px; vertical-align: top; display: inline-block;\"><div style=\"position: relative; width: 48px; height: 32px; vertical-align: top; display: inline-block; border: 2px solid rgb(255, 255, 255);\"><div style=\"position: absolute; width: 48px; height: 32px; vertical-align: top; display: inline-block; background-size: contain; background-image: url(https://steamcdn-a.akamaihd.net/apps/570/icons/econ/sockets/gem_stat.30d7935c1f0a1b9e8e28c691c2bd28f7d5f471bc.png)\"></div></div></div><div style=\"vertical-align: top; display: inline-block; margin-left: 12px padding: 2px\"><span style=\"font-size: 18px; white-space: normal; color: rgb(255, 255, 255)\">Kills: 964</span><br><span style=\"font-size: 12px\">Inscribed Gem</span></div></div><div style=\"white-space: nowrap; padding: 3px;\"><div style=\"width: 60px; height: 32px; vertical-align: top; display: inline-block;\"><div style=\"position: relative; width: 48px; height: 32px; vertical-align: top; display: inline-block; border: 2px solid rgb(255, 255, 255);\"><div style=\"position: absolute; width: 48px; height: 32px; vertical-align: top; display: inline-block; background-size: contain; background-image: url(https://steamcdn-a.akamaihd.net/apps/570/icons/econ/sockets/gem_stat.30d7935c1f0a1b9e8e28c691c2bd28f7d5f471bc.png)\"></div></div></div><div style=\"vertical-align: top; display: inline-block; margin-left: 12px padding: 2px\"><span style=\"font-size: 18px; white-space: normal; color: rgb(255, 255, 255)\">Heroes Killed Inside Smoke: 160</span><br><span style=\"font-size: 12px\">Inscribed Gem</span></div></div></div>

i know there are a background url like

background-image: url(https://steamcdn-a.akamaihd.net/apps/570/icons/econ/sockets/gem_stat.30d7935c1f0a1b9e8e28c691c2bd28f7d5f471bc.png)
background-image: url(https://steamcdn-a.akamaihd.net/apps/570/icons/econ/sockets/--this_will_be_varied--.png)
background-image: url() or empty...

and in the span a string name with numbers maybe...

<span style=\"font-size: 12px\">Inscribed Gem</span>
<span style=\"font-size: 18px; white-space: normal; color: rgb(255, 255, 255)\">Kills: 964</span>
<span style=\"font-size: 18px; white-space: normal; color: rgb(255, 255, 255)\">Kill Assists: 2220</span>

and i want to extract url() and string part. later i will extract url address or set them as null if empty... because the string part have a url (or empty url) in it's before.


Solution

  • use a positive lookahead to seach for text preceding a </span

    ((url\([\.\:\/\-\w]*\))|[\w:, 0-9]*(?=<\/span))
    

    this regex works for all the examples you gave