I am trying to incorporate 2 functions into my awk
command.
I want tolower
the information in Col1 in a Column 2 (thus the information in Col1, will be the value of 2 cols - Col1 and Col2, with the values in lower
in Col2) and I want to count from 1-N that begins and ends with the start of certain markers that I have.
The data (tab-separated) currently looks like this:
<s>
He PRP -
could MD -
tell VB -
she PRP -
was VBD -
teasing VBG -
him PRP -
. . .
</s>
<s>
He PRP -
kept VBD -
his PRP$ -
eyes NNS -
closed VBD -
, , -
but CC -
he PRP -
could MD -
feel VB -
himself PRP -
smiling VBG -
. . .
</s>
The ideal output would be like this:
<s>
He he PRP 1
could could MD 2
tell tell VB 3
she she PRP 4
was was VBD 5
teasing teasing VBG 6
him him PRP 7
. . . 8
</s>
<s>
He he PRP 1-
kept kept VBD 2
his his PRP$ 3
eyes eyes NNS 4
closed closed VBD 5
, , , 6
but but CC 7
he he PRP 8
could could MD 9
feel feel VB 10
himself himself PRP 11
smiling smiling VBG 12
. . . 13
</s>
The 2-step awk
that I am trying that does not work is this:
Step 1:
awk '!NF{$0=x}1' input | awk '{$1=$1; print "<s>\n" $0 "\t.\n</s>"}' RS= FS='\n' OFS='\t-\n' > output
Here, I do not know how to make the "-" into a counter
and Step 2 (which directly gives me an error):
awk '{print $1 "\t" '$1 = tolower($1)' "\t" $2 "\t" $3}' input > output
Any suggestions 1. on how to solved the lower and counter and 2. if it is possible to combine these two steps?
Thank you in advance
I would do something like:
$ awk 'BEGIN{FS=OFS="\t"} NF>1{$1=$1 FS tolower($1); $4=++f} NF==1{f=0}1' file
<s>
He he PRP - 1
could could MD - 2
tell tell VB - 3
she she PRP - 4
was was VBD - 5
teasing teasing VBG - 6
him him PRP - 7
. . . . 8
</s>
<s>
He he PRP - 1
kept kept VBD - 2
his his PRP$ - 3
eyes eyes NNS - 4
closed closed VBD - 5
, , , - 6
but but CC - 7
he he PRP - 8
could could MD - 9
feel feel VB - 10
himself himself PRP - 11
smiling smiling VBG - 12
. . . . 13
</s>
That is, set $1
and $4
on no <s>
lines and reset the counter otherwise (yes, I know it is resetting twice but I cannot think on something neater right now). Then 1
to print normally.
Note you are playing a lot with print
and the delimiters. It is best to just change the fields and let print
happen automatically upon a True condition (1
) and using the given field separators. A kind of model-view-controller : )