I have a piece of data (data.txt) that, due to user fault, looks like that:
4,48
4485
4,49
4495
4,5
4505
4,51
4,6
11445
11,45
The pattern is this: whenever there is a comma, 0s have been dropped. So: 4450 was improperly changed to 4,45, 4600 was changed to 4,6; and 11450 changed to 11,45.
So, two actions should be performed when a comma is found:
The end result should be:
4480
4485
4490
4495
4500
4505
4510
4600
11445
11450
How could I use a regex on sed (or another program) to get this result?
dataa.txt:
4,48
4485
4,49
4495
4,5
4505
4,51
4,6
11445
11,45
and datab.txt:
4,5
4,6
For the first file:
$ sed -E 's/(\,[0-9][0-9])/\10/g;s/\,//g' dataa.txt
and for the second file:
$ sed -E 's/(\,[0-9])/\100/g;s/\,//g' datab.txt
Then, concatenate the files. It would be better to do that without these extra steps (spliting and concatenating).
There are very good solutions using awk (thank you!), and one is reproduced below:
$ awk '{gsub(/,/, ""); printf "%.4s\n", $0 * 1000}' data.txt
But when dealing with 5 digit numbers (you can spot them for the number of digits on the left of the comma) it also does not work. It would also would require spliting the data.
How could we achieve the end result, without splitting the data?
(edited for clarity)
First make sure you have enough digits after the comma. Next cut everything after the third decimal and remove the comma:
sed -r 's/(,.*)/\1000/; s/,(...).*/\1/ ' data.txt
Note: the \1000
is remembering matched string 1 with \1
and adding 000
.