I'm not use to the syntax of bash script. I'm trying to read a file. For each line I want to keep only the part of the string before the delimiter '/' and put it back into a new file if the word respect a perticular length. I've download a dictionary, but the format does not meet my expectation. Since there is 84000 words, I don't really want to manualy remove what after the '/' for each word. I though it would be an easy thing and I follow couple of idea in other similar question on this site, but it seem that I'm missing something somewhere because it still doesn't work. I can't get the length right. The file Test_Input contains one word per line. Here's the code:
#!/usr/bin/bash
filename="Test_Input.txt"
while read -r line
do
sub= echo $line | cut -d '/' -f1
length= echo ${#sub}
if $length >= 4 && $length <= 10;
then echo $sub >> Test_Output.txt
fi
done < "$filename"
Several items:
sub= echo $line | cut -d '/' -f1
, as this would have certainly failed. Alternatively, you can also use sub=$()
, as in $(echo $line | cut -d '/' -f1)
if
clause need to be encompassed by single or double []
, like this: if [[ $length -ge 4 ]] && [[ $length -le 10 ]];
<=
doesn't reliably work in bash. Just use -ge
for "greater or equal" and -le
for "less or equal"./
characters, in your version sub
will contain the whole line. This might not be what you want, so I'd advise to also add the -s
flag to cut
.somevar=$(echo $someothervar)
. Just use somevar=$someothervar
Here's a version that works:
#!/usr/bin/env bash
filename="Test_Input.txt"
while read -r line
do
sub=$(echo $line | cut -s -d '/' -f 1)
length=${#sub}
if [[ $length -ge 4 ]] && [[ $length -le 10 ]];
then echo $sub >> Test_Output.txt
fi
done < "$filename"
Of course, you could also just use sed
:
sed -n -r '/^[^/]{4,10}\// s;/.*$;;p' Test_Input.txt > Test_Output.txt
Explanation:
-n
Don't print anything unless explicitly marked for printing.-r
Use the extended regex/<searchterm>/ <operation>
Search for lines that match a certain criteria, and perform this operation:
^[^/]{4,10}\/
From the beginning of the line, there should be between 4 and 10 non-slash characters, followed by the slashs;/.*$;;p
replace everything between the first slash and the end of the line with nothing, then print.