Search code examples
phpimagecurlhtml-escape-characters

Save image from url with curl and file_put_contents PHP


i want to save a picture from a remote server to my site. I create the text in the TinyMCE editor and insert the image from the remote server there. Next, I need to save this picture to my server. To do this, i get a link to the picture from the text:

    preg_match('/<img(.*)src(.*)=(.*)"(.*)"/U', $text, $result);
    $url =  array_pop($result);

Next through curl and file_put_contents i get the file and copy to my server.

    $headers = array();
    $headers[] = 'Content-Type: image/jpeg';
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL,  $url ) ;
    curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; 
    Windows NT 5.0)");
    $image = curl_exec($ch);
    curl_close($ch);

    file_put_contents('myfolder/image.jpg', $url);

As a result, not a picture is created, but a text file 'myfolder/image.jpg' with a size of 16 kb with text - Bad URL timestamp.

curl_getinfo returns [content_type] => text/plain [http_code] => 403

But, in case i am assigning $url manually in CURLOPT_URL, for example

$url = 'https://scontent.ftbs4-1.fna.fbcdn.net/v/t1.0-9/39900479_1856467244440953_5986986678919626752_n.jpg?_nc_cat=0&oh=6262ebe636e7328f0471af2820fd4050&oe=5C03BEC7'

then the file is successfully copied.

curl_getinfo returns [content_type] => image/jpeg [http_code] => 200 

Where did I do wrong?

This $_POST:

Array ( 
  [id] => 143
  [title] => Topic
  [description] => description
  [text] => <!DOCTYPE html> <html> <head> </head> <body> <p>Hello</p> <p><img src="https://scontent.ftbs4-1.fna.fbcdn.net/v/t1.0-9/39900479_1856467244440953_5986986678919626752_n.jpg?_nc_cat=0&amp;oh=6262ebe636e7328f0471af2820fd4050&amp;oe=5C03BEC7" alt="" width="776" height="776" /></p> </body> </html>
)

full php code

<?php 
//print_r($_POST);

preg_match_all('/<img[^>]+>/i',$_POST['text'] , $result); 

foreach($result  as $img_tag){
foreach( $img_tag as $tag){   
preg_match('/<img(.*)src(.*)=(.*)"(.*)"/U', $tag, $regexResult);
$img_link = array_pop($regexResult);
$file_name = basename($img_link);

//$img_link = 'https://scontent.ftbs4-1.fna.fbcdn.net/v/t1.0-9/39900479_1856467244440953_5986986678919626752_n.jpg?_nc_cat=0&oh=6262ebe636e7328f0471af2820fd4050&oe=5C03BEC7';

$headers = array();
$headers[] = 'Content-Type: image/jpeg';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,  $img_link ) ;
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");                                                                   
$html = curl_exec($ch);
curl_close($ch);

$targetPath = '/folder/'.$_POST['id'].'/';

file_put_contents($targetPath.$file_name, $html);
}}  
?>

Solution

  • In your $_POST the content of the img src is coming in with certain special characters like & encoded as &amp;.

    If you open this URL in the browser, you get the same error: https://scontent.ftbs4-1.fna.fbcdn.net/v/t1.0-9/39900479_1856467244440953_5986986678919626752_n.jpg?_nc_cat=0&amp;oh=6262ebe636e7328f0471af2820fd4050&amp;oe=5C03BEC7.

    You can reverse this escaping using html_entity_decode. If I change this line the curl works:

    $img_link = html_entity_decode(array_pop($regexResult));