I obtained this text file using sed and awk (leap.log):
Template_frcmod
MASS
Pd 0.000 0.000
BOND
Pd-c
Pd-3e
c-Pd
4p-ca
o-3e
n-3e
Pd-4e
3p-ca
o-4e
n-4e
ANGLE
Pd-c-Pd
Pd-3e-o
Pd-3e-n
Pd-1c-Pd
c-Pd-4p
c-Pd-3e
c-Pd-1c
c-Pd-3p
c-Pd-4e
4p-ca-ca
4p-Pd-3e
4p-Pd-1c
o-3e-n
3e-n-c3
3e-Pd-1c
ca-4p-ca
Pd-4e-o
Pd-4e-n
1c-Pd-4e
3p-ca-ca
3p-Pd-4e
o-4e-n
4e-n-c3
ca-3p-ca
DIHE
Pd-4p-ca-ca
Pd-3e-n-c3
c-Pd-3e-o
c-Pd-3e-n
c-Pd-4e-o
c-Pd-4e-n
4p-Pd-3e-o
4p-Pd-3e-n
o-3e-n-c3
o-3e-Pd-1c
n-3e-Pd-1c
ca-4p-ca-ca
ca-ca-4p-ca
Pd-3p-ca-ca
Pd-4e-n-c3
1c-Pd-4e-o
1c-Pd-4e-n
3p-Pd-4e-o
3p-Pd-4e-n
o-4e-n-c3
ca-3p-ca-ca
ca-ca-3p-ca
IMPROPER
NONBON
Now I have a problem with "one character" atom names:
c-Pd-4p
in this line and all other similar lines (which contain one character atom names), "c" must be two characters: "c " (with a space) :
c -Pd-4p
or in this line:
4e-n-c3
"n"
must be "n "
4e-n -c3
or this line:
"Pd-c" must be "Pd-c "
exc.. all atom names which contains one char must be two chars and get a space char.
When I try to change "c"
to "c "
"1c" become "1c ":
Pd-1c-Pd
--> Pd-1c -Pd
but I don't want to change 2 char atom names. It must be stay the same.
When try to this command:
awk 'BEGIN{FS="-"}{ if(length($2) == 1 ) $2= $2" " } {print $0}' leap.log
This time the "-" signs disappeared. What should I do to add all one character atom names with a space?
Expected results (comments jut for this question real file will have not comments):
Template_frcmod
MASS
Pd 0.000 0.000
BOND
Pd-c #Also the last "c" must be "c "
Pd-3e
c -Pd
4p-ca
o -3e
n -3e
Pd-4e
3p-ca
o -4e
n -4e
ANGLE
Pd-c -Pd
Pd-3e-o
Pd-3e-n
Pd-1c-Pd
c -Pd-4p
c -Pd-3e
c -Pd-1c
c -Pd-3p
c -Pd-4e
4p-ca-ca
4p-Pd-3e
4p-Pd-1c
o -3e-n
3e-n -c3
3e-Pd-1c
ca-4p-ca
Pd-4e-o
Pd-4e-n
1c-Pd-4e
3p-ca-ca
3p-Pd-4e
o -4e-n
4e-n -c3
ca-3p-ca
DIHE
Pd-4p-ca-ca
Pd-3e-n-c3
c -Pd-3e-o #Also the last "o" must be "o "
c -Pd-3e-n #Also the last "n" must be "n "
c -Pd-4e-o #Also the last "o" must be "o "
c-Pd-4e-n #Also the last "n" must be "n "
4p-Pd-3e-o #Also the last "o" must be "o "
4p-Pd-3e-n #Also the last "n" must be "n "
o -3e-n-c3
o -3e-Pd-1c
n-3e-Pd-1c
ca-4p-ca-ca
ca-ca-4p-ca
Pd-3p-ca-ca
Pd-4e-n-c3
1c-Pd-4e-o
1c-Pd-4e-n
3p-Pd-4e-o
3p-Pd-4e-n
o -4e-n -c3
ca-3p-ca-ca
ca-ca-3p-ca
IMPROPER
NONBON
Assumptions:
-
-
-
delimited strings and all such strings with length()==1
are to have a space (
) appended on the end of the fieldOne awk
idea (strips leading white space):
awk '
/-/ { n=split($1,arr,"-") # split field #1 into arr[] array based on "-" delimiter
x=delim=""
for (i=1;i<=n;i++) { # loop through array
# piece together our new field
x=x delim arr[i] ( length(arr[i]) == 1 ? " " : "")
delim="-"
}
$1=x # replace field #1 with value in variable "x"
}
1
' leap.log
Another awk
idea (maintains leading white space):
awk '
BEGIN { FS=OFS="-" } # define input/output field delimiter == "-"
NF>1 { for (i=1;i<=NF;i++) { # if more than one "-" delimited field then ...
old=$i
gsub(/ /,"",old) # strip any (leading) spaces from field
if (length(old) == 1) # if lenght() == 1 then ...
$i=$i " " # append space to current field
}
}
1
' leap.log
These both generate:
Template_frcmod
MASS
Pd 0.000 0.000
BOND
Pd-c
Pd-3e
c -Pd
4p-ca
o -3e
n -3e
Pd-4e
3p-ca
o -4e
n -4e
ANGLE
Pd-c -Pd
Pd-3e-o
Pd-3e-n
Pd-1c-Pd
c -Pd-4p
c -Pd-3e
c -Pd-1c
c -Pd-3p
c -Pd-4e
4p-ca-ca
4p-Pd-3e
4p-Pd-1c
o -3e-n
3e-n -c3
3e-Pd-1c
ca-4p-ca
Pd-4e-o
Pd-4e-n
1c-Pd-4e
3p-ca-ca
3p-Pd-4e
o -4e-n
4e-n -c3
ca-3p-ca
DIHE
Pd-4p-ca-ca
Pd-3e-n -c3
c -Pd-3e-o
c -Pd-3e-n
c -Pd-4e-o
c -Pd-4e-n
4p-Pd-3e-o
4p-Pd-3e-n
o -3e-n -c3
o -3e-Pd-1c
n -3e-Pd-1c
ca-4p-ca-ca
ca-ca-4p-ca
Pd-3p-ca-ca
Pd-4e-n -c3
1c-Pd-4e-o
1c-Pd-4e-n
3p-Pd-4e-o
3p-Pd-4e-n
o -4e-n -c3
ca-3p-ca-ca
ca-ca-3p-ca
IMPROPER
NONBON
NOTE: for the 1st awk
script the entries under DIHE
lose their leading white space