I'm trying to extract information from a bounce of ovpn files in order to update my server list. I find a way to extract information with sed and all works, but I'm stuck when I try to extract data to make the directory structure.
What I have is files inside a folder, for example:
ch101.tcp443.ovpn
ch101.udp1194.ovpn
ch102.nordvpn.com.tcp443.ovpn
ch102.nordvpn.com.udp1194.ovpn
ch102.tcp443.ovpn
ch102.udp1194.ovpn
Now I want to extract information to make directory structure, so I made a regex to extract all the info I need
It works on all files that I have, and gets data from the name of file. So from "ch101.udp1194.ovpn" it extracts "ch101" and "udp", into groups 1 and 2.
But when I try to make it works with sed I fail. I tried to break it down into steps, but even with only the 1st group looking for "ch101" it doesn't work:
echo 'ch101.udp1194.ovpn' | sed -rn 's/^([a-z\-]+\d{1,4})/\1/p'
What did I miss? I'm not sed expert but I find similar expression that works but this one don't.
My final purpose is to create directory and store in it all the information that I need, so:
for i in /opt/ovpn/*.ovpn ; do
[ -f "$i" ] || continue
FIRST_ARG=$(echo $i | sed ...) # extract ch101
SECOND_ARG=$(echo $i | sed ...) # extract udp
FIRST_ARG_TEXT=$(echo $FIRST_ARG | sed ...) # extract text from FIRST_ARG
FIRST_ARG_NUM=$(echo $FIRST_ARG | sed ...) # extract num from FIRST_ARG
FIRST_ARG_NUM_4FORMAT=$(printf '%04i\n' $FIRST_ARG_NUM) # 4 digits for FIRST_ARG_NUM
mkdir /opt/somedir/$FIRST_ARG_TEXT$FIRST_ARG_NUM_4FORMAT$SECOND_ARG
cp ........
done
So from ch101.udp1194.ovpn I'll end with a directory named
ch0101udp
Maybe is not the best and clean way but to me seems simple and is the max that my knowledge can achieve
Any idea or question is good to me
Ps. I'm under busybox 1.30 so this must be sh not bash
A couple of problems: sed does not support a lot of the character class escape sequences like \d
so you need to specify them as [0-9]
.
As well, you're trying to replace the matched sequence with itself, so there would be no change in the output. You need to have .*
to catch the stuff around it.
Something like this would work for your first group:
sed -En 's/^([a-z\-]+[0-9]{1,4}).*/\1/p'
But really what you should be doing is using a proper program to do this. Not sure if it's available on Busybox but awk could do everything you're looking for:
echo 'ch101.udp1194.ovpn' | awk -F. '{a=$1; b=$(NF-1); gsub(/[0-9]/, "", a); gsub(/[0-9]/, "", b); gsub(/^[a-z-]+/, "", $1); printf("%s%04d%s", a, $1, b)}'
Output from your sample data:
ch0101tcp
ch0101udp
ch0102tcp
ch0102udp
ch0102tcp
ch0102udp
An explanation:
awk -F. '{
a=$1; # assign the first field to a
b=$(NF-1); # assign the second last field to b
gsub(/[0-9]/, "", a); # remove numbers from a
gsub(/[0-9]/, "", b); # remove numbers from b
gsub(/^[a-z-]+/, "", $1); # remove letters from the first field
printf("%s%04d%s", a, $1, b) # output in desired format
}'