Can someone please explain the following sed command?
title=$(wget -q -O - https://twitter.com/intent/user?user_id=$ID | sed -n 's/^.*<title>\(.*\) on Twitter<.title>.*$/\1/p')
printf "%s\n" "$title"
I tried (and failed terribly) to recreate it because I thought I understood what was going on in the code. So I wrote (well, more modded) it to be the following:
data-user-id=$(wget -q -O - https://twitter.com/$Username | sed -n 's/^.*"data-user-id">\([^<]*\)<.*$/\1/p')
printf "%s\n" "$data-user-id"
Obviously it errored because the syntax is wrong or something. But I'm trying to understand what is going on so I can make my own variant of it.
P.S. I can't just use the API for this due to how everything needs to be configured.
Give a try to this:
wget -q -O - https://twitter.com/"${Username}" | sed -n '/data-screen-name=.'"${Username}"'".*data-user-id=/I {s/^.*data-screen-name=.'"${Username}"'".*data-user-id="\([0-9]*\)".*$/\1/Ip;q}'
128700677
data-user-id
is present in several lines, so it is needed to select a line where data-screen-name=Username
sed
is using regular expression, there are 2 good tutorials to start with:
A different sed
script with a different output:
Username="StackOverflow"
wget -q -O - https://twitter.com/"${Username}" | sed -n '/data-screen-name=.'"${Username}"'".*data-user-id=/I {p;q}'
data-screen-name="StackOverflow" data-name="Stack Overflow" data-user-id="128700677"
-n
instructs sed
to not print anything, except when p
command is used.
.
means any char.
*
applies to the previous char in the regex and it means zero or any number of this char.
.*
means zero or any number of any char.
/data-screen-name=.'"${Username}"'".*data-user-id=/
select lines which contains data-screen-name=
and any one char (.
) and StackOverflow
and "
char and zero or any number of any char (.*
) and data-user-id=
.
/I
means ignore case.
{p;q}
are commands executed when above regex is true.
p
prints the current line.
q
exits the sed
script.
The first sed
script at the top contains an additional s/regex/replacement/
to clean up the line.
The additional elements used:
^
means the start of the line.
\( ... \)
are used to define a group.
"\([0-9]*\)"
is a group made of only digits, surrended with 2 "
which are not part of the group. It is the first group found in the regex, so it can be referenced in the replacement part with \1
.