I'm trying to make a rudimentary scraper for a subreddit I like.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Document</title>
</head>
<body>
<h1>Place Holder</h1>
<p id="name"></p>
</body>
<script>
// API for get requests
let fetchRes = fetch('https://www.reddit.com/r/meme/top.json');
fetchRes.then(res =>
res.json()).then(d => {
console.log(d)
// for testing purposes, use the following:
// console.log(d.data.children[0].data.id);
for (let i = 0; i < d.data.children.length; i++) {
x = d.data.children[i].data.url
document.getElementById("name").append("<br><img src='" + x + "'>")
console.log(d.data.children[i].data.url);
}
})
</script>
</html>
If my code is:
.document.getElementById("name").innerHTML = x
Then it will grab the latest post/image and place it perfectly into my page. HOWEVER, if I keep the .append, it just lists all the img urls instead of properly embedding them. What am I doing wrong?
Once you have the response, you may just convert the d.data.children
array in an array of img
elements by using the map array method. Each item of the array will become a newly created img element where its src
is set as the value coming from the data.url
property.
Then just append that array of img
elements to the target container. Note I used the spread syntax to pass the array as a list of arguments.
For the sake of records there's no vulnerability that can be exploited here because we are creating vanilla img elements via document.createElement() and then just setting their src property with an harmless string. In the worst case scenario you'll have corrupted images but nothing more.
But since I might be wrong and ignoring some scenarios, I also added a function to verify the url provided is actually legit and aiming at a plausible picture. In case an url is not considered safe, a fallback url for the picture will be used.
// API for get requests
let fetchRes = fetch('https://www.reddit.com/r/meme/top.json');
fetchRes
.then(res => res.json())
.then(d => {
const images = d.data.children.map(c => {
const img = document.createElement('img');
const src = c.data.url;
if(isSafeUrl(src))
img.src = src;
else
img.src = "./fallback.jpg";
return img;
});
document.getElementById('container').append(...images);
});
function isSafeUrl(url) {
try {
const parsedUrl = new URL(url);
if (!['http:', 'https:'].includes(parsedUrl.protocol)) {
return false;
}
return /\.(jpg|jpeg|png|gif|webp|bmp)$/i
.test(parsedUrl.pathname);
} catch (e) {
return false;
}
}
<body>
<h1>Place Holder</h1>
<p id="container"></p>
</body>
Exploring potential security risks:
The problem with using an unknown string to set the value of the src
property of an <image>
element comes from the chance that an url may also contain javascript code that will be evaluated from the page at its origin thus delivering a so called XSS attack:
https://owasp.org/www-community/attacks/xss/
Because it thinks the script came from a trusted source, the malicious script can access any cookies, session tokens, or other sensitive information retained by the browser and used with that site. These scripts can even rewrite the content of the HTML page.
How can an url contain javascript? In two ways that I know of:
Both cases will be correctly ignored by modern browsers. The first case is even invalid because it doesn't apply to <img>
by specifications.
However the function isSafeUrl
is carefully checking each url and for it to be valid must have the schema http://
or https://
thus preventing any of those dangers.
There's only a remaining caveats though! the server hosting those pictures can monitor clients loading them but there's nothing you can do to prevent that to happen if not adding a proxy for the sake of caching or decoupling the identity of the actual user consuming your page.
What if the measures used here are not enough and you wished more?
Despite the measure taken so far are pretty solid, there's something more that could be added:
https://i.redd.it/
But I really think it would be too far.
Here's a snippet showing how to inject the alert('XSS')
javascript statment both via javascript:
and data:
urls into the src
of an <image>
. Note that in the second case, the code is encoded in base64. As you can see by clicking those buttons nothing happens while an alert was supposed to prompt the user.
const threats = [
"javascript:alert('XSS')",
"data:text/html;base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4="
];
function inject(index){
const img = document.getElementById('bogus');
img.src= threats[index];
}
button{
cursor: pointer;
}
<button onclick="inject(0)">Inject javascript: URL</button>
<button onclick="inject(1)">Inject data: URL</button>
<img id="bogus">