I'd like to identify certain pattern and move entire lines to a specific part of a file thereby re-arranging the file contents should you say. I prefer a notepad++
solution but if you think that is too complex then a cygwin shell ( awk )
or JSfiddle
also works
I will make my point with example below
I have a pattern that is
"col<variable space>stat<variable space>col ( axx,bvb,ccc) on mr.dan" (<some word> confidence)
e.g.
"col stat col ( a123,b6949,c4433) on Mr.Randy" (Low confidence)
"col stat col ( a1fddf23, b6ff949,c4433 ) on John.Doe " (Low confidence)
"col stat col ( ax ) on John.Dane " (Ok confidence)
"col stat col ( axdf,fsdds ) on Jane.Dame " ( Fair confidence )
What it should do
(<word> confidence)
part and stick a ";"
at the end of line ( I can manage this part and dont need help here )col ( axdf,fsdds )
of the with pattern col\s+(\s*word1\s*,\s*word2\s*,\s*wordN\s*)\s*on\s*word.word\s*
Above pattern need to be re-arranged so that ones with one word col ( word)
will come on top , followed by two words col ( word1, word2)
and so on in the ascending order of the number of words in col ( word )
expression
So out put of the above should be
col stat col ( ax ) on John.Dane ; # 1 word in col (word) expr
col stat col ( axdf,fsdds ) on Jane.Dame ; # 2 words in col (word) expr
col stat col ( a1fddf23, b6ff949,c4433 ) on John.Doe ; ; # 3 words in col (word) expr
col stat col ( a123,b6949,c4433) on Mr.Randy;
What I did
I could get the 1st part done using
"\s*\((\s*(\w+)*\s*Confidence\))
replace with ;
I need help with the 2nd part the col ( word)
expression rearrange.
logical pseudocode for Notepad++
would be first two isolate the wordlist in each of those column expressions in separate buffers. next you count the number of words in each buffer and then arrange the buffers. based on the buffer arrangement you lineup the expressions.
Also open to JsFiddle
or Shellscript regex / awk
This can't be done with Notepad++, I suggest to use a script, here an example of Perl script that does the job.
The whole file is read in memory, it will be a problem if he file is very large.
#!/usr/bin/perl
use Modern::Perl;
# Read input file in an array
my $file_in = 'file.txt';
open my $fh, '<', $file_in or die "unable to open '$file_in': $!";
my @lines = <$fh>;
# Replace last quote until end of line with semicolon and remove quotes
my @unsorted = map { s/"[^"]*$/;/; s/"//g; $_ } @lines;
# use Schartzian transform for sorting
my @sorted =
# remove the number of words
map { $_->[0] }
# sort on number of words
sort { $a->[1] <=> $b->[1] }
# Add number of words
map {
# list of words inside parenthesis
my ($words) = $_ =~ /\(([^)]+)\)/;
# split to have number of words
my @w = split',', $words;
# add this number as second element in array
[$_, scalar @w]
}
@unsorted;
# Write into output file
my $file_out = 'file_out.txt';
open my $fh_out, '>', $file_out or die "unable to open '$file_out': $!";
say $fh_out $_ for @sorted;
Output file:
col stat col ( ax ) on John.Dane ;
col stat col ( axdf,fsdds ) on Jane.Dame ;
col stat col ( a123,b6949,c4433) on Mr.Randy;
col stat col ( a1fddf23, b6ff949,c4433 ) on John.Doe ;