I am trying to process a whitelist in bash, but I would like to process it with awk.
The whitelist.txt has a format like
module:function(arguments)
and in one line multiple arguments or multiple functions can be specified, in a comma-separated list format.
Multiple functions can be encased in-between square brackets, while arguments are encased in-between round brackets, like this:
some_module_name:[func1(arg1,arg2),func2]
Arguments can be encased in-between double quotes if they have white spaces, like this:
some_module:[function-alpha(arg1,"argument 2 has spaces"),function-beta]
On the right-end of any line there can be some flags, like offline_install=true
, paralell_run=true
and so on (separated by white spaces). Those will be stored separately in an array afterwards.
How can I use awk to transform the input file format:
module1:function1(alpha,beta),function2 offline_install=true paralell_run=true
module2:[function-alpha(arg1,"argument 2 is inside quotes",arg3),function-beta]
into the output file format:
module1:function1(alpha)
module1:function1(beta)
module1:function2
module2:function-alpha(arg1)
module2:function-alpha("argument 2 is inside quotes")
module2:function-alpha(arg3)
module2:function-beta
Specifically, I want to:
Here's another example, if it helps. whitelist.txt:
control_service:(stop=apache,"disable=apache (everywhere)")
database:add_redo_log_groups
database:check_data_consistency paralell_run=true
sas_certificate
p/a/g/config.sh:[function1(status,"start firewall [1]","stop [all] firewall"),func2] offline_install=true paralell_run=true
control_server:stop,disable
database:kill_sessions paralell_run=true
my_module
my_module:[func1,func2]
output_file.txt
control_service:(stop=apache)
control_service:("disable=apache (everywhere)")
database:add_redo_log_groups
database:check_data_consistency paralell_run=true
sas_certificate
p/a/g/config.sh:function1(status) offline_install=true paralell_run=true
p/a/g/config.sh:function1("start firewall [1]") offline_install=true paralell_run=true
p/a/g/config.sh:function1("stop [all] firewall") offline_install=true paralell_run=true
p/a/g/config.sh:func2 offline_install=true paralell_run=true
control_server:stop
control_server:disable
database:kill_sessions paralell_run=true
my_module
my_module:func1
my_module:func2
I've tried different approaches, but so far, I have not been able to generate the correct output. Any help with the awk script would be greatly appreciated.
Here is a start showing you how to encode/decode the problematic characters inside the quoted strings so you can then identify and/or split on those characters outside the quoted strings:
$ cat tst.awk
{
print "---------"
printf "$0 = %s\n", $0
module = gensub(/:.*/,"",1,$0)
fns_args = substr($0,length(module)+2)
printf "module = %s\n", module
printf "raw fns_args = %s\n", fns_args
encoded_fns_args = encode(fns_args)
printf "encoded fns_args = %s\n", encoded_fns_args
if ( match(encoded_fns_args,/\[(.*)]\s*(.*)/,a) ) {
encoded_args = a[2]
decoded_args = decode(encoded_args)
printf "decoded_args = %s\n", decoded_args
n = split(a[1],encoded_fns,/,/)
for ( i=1; i<=n; i++ ) {
encoded_fn = encoded_fns[i]
decoded_fn = decode(encoded_fn)
printf "decoded_fn = %s\n", decoded_fn
}
}
}
function encode(str, a) {
gsub(/[@]/,"@A",str)
while ( match(str,/([^"]*)("[^"]*")(.*)/,a) ) {
gsub(/[[]/,"@B",a[2])
gsub(/[]]/,"@C",a[2])
gsub(/[(]/,"@D",a[2])
gsub(/[)]/,"@E",a[2])
gsub(/[,]/,"@F",a[2])
gsub(/["]/,"@G",a[2])
str = a[1] a[2] a[3]
}
return str
}
function decode(str) {
gsub(/@G/,"\"",str)
gsub(/@F/,",",str)
gsub(/@E/,")",str)
gsub(/@D/,"(",str)
gsub(/@C/,"]",str)
gsub(/@B/,"[",str)
gsub(/@A/,"@",str)
return str
}
$ awk -f tst.awk whitelist.txt
---------
$0 = control_service:(stop=apache,"disable=apache (everywhere)")
module = control_service
raw fns_args = (stop=apache,"disable=apache (everywhere)")
encoded fns_args = (stop=apache,@Gdisable=apache @Deverywhere@E@G)
---------
$0 = database:add_redo_log_groups
module = database
raw fns_args = add_redo_log_groups
encoded fns_args = add_redo_log_groups
---------
$0 = database:check_data_consistency paralell_run=true
module = database
raw fns_args = check_data_consistency paralell_run=true
encoded fns_args = check_data_consistency paralell_run=true
---------
$0 = sas_certificate
module = sas_certificate
raw fns_args =
encoded fns_args =
---------
$0 = p/a/g/config.sh:[function1(status,"start firewall [1]","stop [all] firewall"),func2] offline_install=true paralell_run=true
module = p/a/g/config.sh
raw fns_args = [function1(status,"start firewall [1]","stop [all] firewall"),func2] offline_install=true paralell_run=true
encoded fns_args = [function1(status,@Gstart firewall @B1@C@G,@Gstop @Ball@C firewall@G),func2] offline_install=true paralell_run=true
decoded_args = offline_install=true paralell_run=true
decoded_fn = function1(status
decoded_fn = "start firewall [1]"
decoded_fn = "stop [all] firewall")
decoded_fn = func2
---------
$0 = control_server:stop,disable
module = control_server
raw fns_args = stop,disable
encoded fns_args = stop,disable
---------
$0 = database:kill_sessions paralell_run=true
module = database
raw fns_args = kill_sessions paralell_run=true
encoded fns_args = kill_sessions paralell_run=true
---------
$0 = my_module
module = my_module
raw fns_args =
encoded fns_args =
---------
$0 = my_module:[func1,func2]
module = my_module
raw fns_args = [func1,func2]
encoded fns_args = [func1,func2]
decoded_args =
decoded_fn = func1
decoded_fn = func2
The above uses GNU awk for various extensions and is not intended to be the full script you need, it's just a [big] start showing you a way to solve the problem.