Search code examples
perlhttp-headerswgetlwpcontent-disposition

Using content-disposition with LWP::UserAgent on a Google Drive document


I'm trying to save a Google Drive shared file using its actual filename received from the server (content-disposition):

I tried to analyze the header:

use strict;
use warnings;
use Data::Dumper;
use LWP::UserAgent qw( );
my $str = 'https://drive.google.com/file/d/0B6vqTWO9kmdmdzk5ejhDSXgzMDg/view?usp=sharing';
$str =~ /file\/d\/(\w+)/;
my $url = 'https://drive.google.com/uc?export=download&id='.$1;
my $ua = LWP::UserAgent->new();
my $response = $ua->head($url)->{'_headers'};
print Dumper( $response );

I got this:

$VAR1 = bless( {
'client-ssl-cert-subject' => '/C=US/ST=California/L=Mountain View/O=Google Inc/CN=*.googleusercontent.com',
'connection' => 'close',
'date' => 'Mon, 19 Oct 2015 14:45:45 GMT',
'content-type' => 'text/html; charset=UTF-8',
'x-guploader-uploadid' => 'AEnB2UrQJIoJUIIhWnKz9HAlW_2XKApLe_0IDMZjS0gGQOMdRaF68Od2xsxssp7mBdQP9kNrjvDueWUP5pSa1eHbprSjbPvfbA',
'alternate-protocol' => '443:quic,p=1',
'expires' => 'Mon, 19 Oct 2015 14:45:45 GMT',
'::std_case' => {
   'client-ssl-socket-class' => 'Client-SSL-Socket-Class',
   'client-ssl-cert-subject' => 'Client-SSL-Cert-Subject',
   'client-ssl-cipher' => 'Client-SSL-Cipher',
   'client-peer' => 'Client-Peer',
   'x-guploader-uploadid' => 'X-GUploader-UploadID',
   'alternate-protocol' => 'Alternate-Protocol',
   'alt-svc' => 'Alt-Svc',
   'client-ssl-cert-issuer' => 'Client-SSL-Cert-Issuer',
   'client-date' => 'Client-Date',
   'client-response-num' => 'Client-Response-Num'
 },
'client-ssl-cert-issuer' => '/C=US/O=Google Inc/CN=Google Internet Authority G2',
'server' => 'UploadServer',
'client-date' => 'Mon, 19 Oct 2015 14:45:55 GMT',
'client-ssl-socket-class' => 'IO::Socket::SSL',
'client-ssl-cipher' => 'ECDHE-ECDSA-AES128-GCM-SHA256',
'client-peer' => '2a00:1450:400c:c08::84:443',
'alt-svc' => 'quic=":443"; p="1"; ma=604800',
'cache-control' => 'private, max-age=0',
'client-response-num' => 1
}, 'HTTP::Headers' );

I expected to find the content-disposition header within the above. On the other hand, wget correctly gives the filename:

#wget --content-disposition "https://drive.google.com/uc?export=download&id=0B6vqTWO9kmdmdzk5ejhDSXgzMDg"
--2015-10-19 20:21:37--  https://drive.google.com/uc?export=download&id=0B6vqTWO9kmdmdzk5ejhDSXgzMDg
Resolving drive.google.com (drive.google.com)... 2a00:1450:400c:c04::64, 74.125.206.139, 74.125.206.101, ...
Connecting to drive.google.com (drive.google.com)|2a00:1450:400c:c04::64|:443... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: https://doc-0s-2s-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/unidk5uvfpl9kut1rl3hb5lqcvis8vdq/1445263200000/06380472059566149580/*/0B6vqTWO9kmdmdzk5ejhDSXgzMDg?e=download [following]
Warning: wildcards not supported in HTTP.
--2015-10-19 20:21:37--  https://doc-0s-2s-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/unidk5uvfpl9kut1rl3hb5lqcvis8vdq/1445263200000/06380472059566149580/*/0B6vqTWO9kmdmdzk5ejhDSXgzMDg?e=download
Resolving doc-0s-2s-docs.googleusercontent.com (doc-0s-2s-docs.googleusercontent.com)... 2a00:1450:400c:c08::84, 74.125.140.132
Connecting to doc-0s-2s-docs.googleusercontent.com (doc-0s-2s-docs.googleusercontent.com)|2a00:1450:400c:c08::84|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 16 [text/plain]
Saving to: ‘testdoc.txttestdoc.txt’

testdoc.txttestdoc.txt                              100%[====================================================================================================================>]      16  --.-KB/s   in 0s

2015-10-19 20:21:38 (550 KB/s) - ‘testdoc.txttestdoc.txt’ saved [16/16]

How can I get the correct filename from the server using perl?


Solution

  • The difference is that you're not doing a HEAD with wget. If you look at the status of the response in your Perl code you're getting

    503 Service Unavailable
    

    which could refer to many things, but in this case means that HEAD isn't supported. Change that for a GET and all is well

    use strict;
    use warnings;
    
    use feature 'say';
    
    use LWP::UserAgent;
    
    my $str = 'https://drive.google.com/file/d/0B6vqTWO9kmdmdzk5ejhDSXgzMDg/view?usp=sharing';
    die unless $str =~ m{file/d/(\w+)};
    
    my $url = "https://drive.google.com/uc?export=download&id=$1";
    
    my $ua = LWP::UserAgent->new;
    
    my $resp = $ua->get($url);
    
    say $resp->status_line;
    
    say $resp->header('Content-Disposition');
    

    output

    200 OK
    attachment;filename="testdoc.txt";filename*=UTF-8''testdoc.txt