I am trying to work on user input such as
to obtain a https link for the input, such as
I am trying to do this in a least manual, most clean possible way, so I can upload my script somewhere and show it to people without being ashamed of its low quality. This means:
s/^http/^https/
substitution.So far I have found two solutions but each of them has flaws.
Run parse query on {{canonicalurl:user_input_here}} using canonicalurl magic word. It gives only http, not https links however.
#!/usr/bin/perl
use strict;
use warnings;
use MediaWiki::API;
use Data::Dumper;
my $mw = MediaWiki::API->new();
$mw->{config}->{api_url} = 'https://en.wikipedia.org/w/api.php';
my $info_ref = $mw->api ( {
action => 'parse',
prop => 'text',
text => '{{canonicalurl:Hello}}',
} ) or die $mw->{error}->{code} . ': ' . $mw->{error}->{details};
my $html = $info_ref->{parse}{text}{'*'};
print Dumper $html;
Use info query. However it does not work for sections, i.e. "Foo#bar" input will get output linking to "Foo".
#!/usr/bin/perl
use strict;
use warnings;
use MediaWiki::API;
my $mw = MediaWiki::API->new();
$mw->{config}->{api_url} = 'https://en.wikipedia.org/w/api.php';
sub get_url_by_title(){
my $title = shift;
my $info_ref = $mw->api ( {
action => 'query',
prop => 'info',
inprop => 'url',
iwurl => 1,
titles => $title,
} ) or die $mw->{error}->{code} . ': ' . $mw->{error}->{details};
if (exists $info_ref->{query}{pages}){
return (values $info_ref->{query}{pages})[0]{'fullurl'};
}
elsif (exists $info_ref->{query}{interwiki}){
return (values $info_ref->{query}{interwiki})[0]{'url'};
}
}
Canonical url refers to the type of url that is canonical for the wiki. In Wikimedia's current config this is http. (I wouldn't be surprised if that changes one day). What you can look at is {{fullurl:Pagename}}. It will respond with a url starting with "//" if both http and https are valid. Otherwise it will respond with a normal url.
The info query (your second method) may be better since it does not invoke the parser (which is a little less work for the servers, although really that doesn't matter). Its always possible to just plop the target (or whatever the part after the # sign is called no a days) on to the end of the url afterwards.