I would like to reformat a pure ascii file, test.txt, containing (just a sample of 10 lines out of several hundred):
{0.91, 0.87, -69.79,
-0.3149, 0.05}, {0.9392,
1.089, 69, -0.31,
0.052}, {-0.8768, 0.7025,
69.80, -0.314, 0.053},
{0.930, -1.2638750861516, 69.79,
0.314, 0.05301}, {0.9367,
-1.368063705085268, 69.79962, -0.31,
0.052}, {0.946, -1.644,
69.7, 0.3, 0.052}
to a final file, test_processed.txt, containing (for the same sample):
0.91, 0.87, -69.79, -0.3149, 0.05
0.9392, 1.089, 69, -0.31, 0.052
-0.8768, 0.7025, 69.80, -0.314, 0.053
0.930, -1.2638750861516, 69.79, 0.314, 0.05301
0.9367, -1.368063705085268, 69.79962, -0.31, 0.052}
0.946, -1.644, 69.7, 0.3, 0.052
That is, a plain CSV file, with each line containing exactly the five fields within the original pairs of matched braces.
I tried to fiddle a little with gawk and regex'es, but wasn't able to figure out how to manage this; I have the feeling that tweaking with awk's variables RS and ORS might help, but could not forge ahead...
Using gnu-awk
, you may use this awk using RS
to match anything between{...}
and then remove starting {
, ending }
and newlines:
awk -v RS='{[^}]+}' 'RT{gsub(/^{|}$|\n */, "", RT); print RT}' file
0.91, 0.87, -69.79, -0.3149, 0.05
0.9392, 1.089, 69, -0.31, 0.052
-0.8768, 0.7025, 69.80, -0.314, 0.053
0.930, -1.2638750861516, 69.79, 0.314, 0.05301
0.9367, -1.368063705085268, 69.79962, -0.31, 0.052
0.946, -1.644, 69.7, 0.3, 0.052
How it works:
-v RS='{[^}]+}'
: Sets record separator a match for {...}
RT
: Checks if RT
is not empty. RT
is set as the string from input, matched by RS
pattern.{...}
is action block in awkgsub(/^{|}$|\n */, "", RT)
: Removes starting {
, ending }
and line break followed by 0 or more spaces from RT
print RT
: prints modified RT