I need to get the href value in HTML body tag by regular expression
<html>
<head>
</head>
<body class="directory">
<input id="search" type="text" placeholder="Search" autocomplete="off" />
<div id="wrapper">
<h1><a href="/">~</a> / <a href="/public">public</a> / <a href="/public/img">img</a> / <a href="/public/img/events">events</a> / <a href="/public/img/events/poster">poster</a> / </h1>
<ul id="files" class="view-tiles"><li><a href="/public/img/events" class="" title=".."><span class="name">..</span><span class="size"></span><span class="date"></span></a></li>
<li><a href="/public/img/events/poster/2018-09-26-1.PNG" class="" title="2018-09-26-1.PNG"><span class="name">2018-09-26-1.PNG</span><span class="size">1406471</span><span class="date">2018-9-16 18:37:23</span></a></li>
<li><a href="/public/img/events/poster/2018-09-26-2.PNG" class="" title="2018-09-26-2.PNG"><span class="name">2018-09-26-2.PNG</span><span class="size">530859</span><span class="date">2018-9-16 18:37:44</span></a></li>
<li><a href="/public/img/events/poster/2018-09-26-3.PNG" class="" title="2018-09-26-3.PNG"><span class="name">2018-09-26-3.PNG</span><span class="size">551409</span><span class="date">2018-9-16 18:38:24</span></a></li>
<li><a href="/public/img/events/poster/test" class="" title="test"><span class="name">test</span><span class="size">0</span><span class="date">2018-10-4 20:16:58</span></a></li></ul>
</div>
</body>
<html>
I want to have a list that contains
/public/img/events/poster/2018-09-26-1.PNG and
/public/img/events/poster/2018-09-26-2.PNG and
/public/img/events/poster/2018-09-26-3.PNG.
The expression i used :
/[<body\sclass="directory">].+[<li><a\shref\s*=\s*\"]([^">]+)\"\s+[class].+[<\/body>]/g
However i got the result:
<ul id="files" class="view-tiles"><li><a href="/public/img/events" class="" title=".."><span class="name">..</span><span class="size"></span><span class="date"></span></a></li>
<li><a href="/public/img/events/poster/2018-09-26-1.PNG" class="" title="2018-09-26-1.PNG"><span class="name">2018-09-26-1.PNG</span><span class="size">1406471</span><span class="date">2018-9-16 18:37:23</span></a></li>
<li><a href="/public/img/events/poster/2018-09-26-2.PNG" class="" title="2018-09-26-2.PNG"><span class="name">2018-09-26-2.PNG</span><span class="size">530859</span><span class="date">2018-9-16 18:37:44</span></a></li>
<li><a href="/public/img/events/poster/2018-09-26-3.PNG" class="" title="2018-09-26-3.PNG"><span class="name">2018-09-26-3.PNG</span><span class="size">551409</span><span class="date">2018-9-16 18:38:24</span></a></li>
<li><a href="/public/img/events/poster/test" class="" title="test"><span class="name">test</span><span class="size">0</span><span class="date">2018-10-4 20:16:58</span></a></li></ul>
Can someone guide me please?
You can use this regex:
/<li[^>]*>[^<]*<a[^>]*href="([^"]+)"/g
and then access the href="([^"]+)
capturing group by calling match[1]
like follows (assuming you are using javascript):
var myString = `<html>
<head>
</head>
<body class="directory">
<input id="search" type="text" placeholder="Search" autocomplete="off" />
<div id="wrapper">
<h1><a href="/">~</a> / <a href="/public">public</a> / <a href="/public/img">img</a> / <a href="/public/img/events">events</a> / <a href="/public/img/events/poster">poster</a> / </h1>
<ul id="files" class="view-tiles"><li><a href="/public/img/events" class="" title=".."><span class="name">..</span><span class="size"></span><span class="date"></span></a></li>
<li><a href="/public/img/events/poster/2018-09-26-1.PNG" class="" title="2018-09-26-1.PNG"><span class="name">2018-09-26-1.PNG</span><span class="size">1406471</span><span class="date">2018-9-16 18:37:23</span></a></li>
<li><a href="/public/img/events/poster/2018-09-26-2.PNG" class="" title="2018-09-26-2.PNG"><span class="name">2018-09-26-2.PNG</span><span class="size">530859</span><span class="date">2018-9-16 18:37:44</span></a></li>
<li><a href="/public/img/events/poster/2018-09-26-3.PNG" class="" title="2018-09-26-3.PNG"><span class="name">2018-09-26-3.PNG</span><span class="size">551409</span><span class="date">2018-9-16 18:38:24</span></a></li>
<li><a href="/public/img/events/poster/test" class="" title="test"><span class="name">test</span><span class="size">0</span><span class="date">2018-10-4 20:16:58</span></a></li></ul>
</div>
</body>
<html>`;
var myRegexp = /<li[^>]*>[^<]*<a[^>]*href="([^"]+)"/g;
match = myRegexp.exec(myString);
while (match != null) {
// matched text: match[0]
// match start: match.index
// capturing group n: match[n]
console.log(match[1])
match = myRegexp.exec(myString);
}
Credits to this answer for the code example.
Author asked to include a match for the body tag
Just curious. How do i update the express if i want to limit mapping range in tag? I update the express as belows but no result. ]>.]>[^<]]href="([^"]+)".</body[^>]*>
There is only so much you can do with a regex and in general it's not recommended doing advanced html parsing with regexes. Your approach gives you problems with the linebreaks and the fact that you want to match multiple li
s in a single body.
Also, by HTML convention, <li>
s are only allowed in the body.
If you want to do so, break it down into two steps and match the
var myString = `<html>
<head>
<!-- Not valid HTML, just for testing -->
<ul id="files" class="view-tiles"><li><a href="/public/img/events" class="" title=".."><span class="name">..</span><span class="size"></span><span class="date"></span></a></li>
<li><a href="/public/img/events/poster/2018-09-26-1.PNG" class="" title="2018-09-26-1.PNG"><span class="name">2018-09-26-1.PNG</span><span class="size">1406471</span><span class="date">2018-9-16 18:37:23</span></a></li>
<li><a href="/public/img/events/poster/2018-09-26-2.PNG" class="" title="2018-09-26-2.PNG"><span class="name">2018-09-26-2.PNG</span><span class="size">530859</span><span class="date">2018-9-16 18:37:44</span></a></li>
<li><a href="/public/img/events/poster/2018-09-26-3.PNG" class="" title="2018-09-26-3.PNG"><span class="name">2018-09-26-3.PNG</span><span class="size">551409</span><span class="date">2018-9-16 18:38:24</span></a></li>
<li><a href="/public/img/events/poster/test" class="" title="test"><span class="name">test</span><span class="size">0</span><span class="date">2018-10-4 20:16:58</span></a></li></ul>
</head>
<body class="directory">
<input id="search" type="text" placeholder="Search" autocomplete="off" />
<div id="wrapper">
<h1><a href="/">~</a> / <a href="/public">public</a> / <a href="/public/img">img</a> / <a href="/public/img/events">events</a> / <a href="/public/img/events/poster">poster</a> / </h1>
<ul id="files" class="view-tiles"><li><a href="/public/img/events" class="" title=".."><span class="name">..</span><span class="size"></span><span class="date"></span></a></li>
<li><a href="/public/img/events/poster/2018-09-26-1.PNG" class="" title="2018-09-26-1.PNG"><span class="name">2018-09-26-1.PNG</span><span class="size">1406471</span><span class="date">2018-9-16 18:37:23</span></a></li>
<li><a href="/public/img/events/poster/2018-09-26-2.PNG" class="" title="2018-09-26-2.PNG"><span class="name">2018-09-26-2.PNG</span><span class="size">530859</span><span class="date">2018-9-16 18:37:44</span></a></li>
<li><a href="/public/img/events/poster/2018-09-26-3.PNG" class="" title="2018-09-26-3.PNG"><span class="name">2018-09-26-3.PNG</span><span class="size">551409</span><span class="date">2018-9-16 18:38:24</span></a></li>
<li><a href="/public/img/events/poster/test" class="" title="test"><span class="name">test</span><span class="size">0</span><span class="date">2018-10-4 20:16:58</span></a></li></ul>
</div>
</body>
<html>`;
var bodyRegex = /<\s*body.*>([\s\S]*)<\s*\/body>/g;
var bodyString = bodyRegex.exec(myString)[0];
var myRegexp = /<li[^>]*>[^<]*<a[^>]*href="([^"]+)"/g;
match = myRegexp.exec(bodyString);
while (match != null) {
// matched text: match[0]
// match start: match.index
// capturing group n: match[n]
console.log(match[1])
match = myRegexp.exec(bodyString);
}