I have the following text file with fields separated with spaces and in some lines some fields are empty.
DISTRIBUTION MINIMUM_SYSTEM_REQUIREMENTS BASED_ON IMAGE_SIZE LATEST_RELEASE_YEAR FOUNDER
------------------ ------------------------------------ ---------------- ------------------------ ----------------------- --------------------
Absolute Linux CPU: Intel/AMD 64bit RAM: 64 MB Slackware 2020 Absolute Linux Team
Alpine Linux RAM: 128MB BusyBox, musl 2022 LEAF Project members
antiX CPU: Intel/AMD X86, RAM 256MB 700 MB Base, 1GB Full 2020 Anticapitalista
I'd like to add ,
or |
as separator based on the beginning position of each header, since if I replace with a regex, more than 2 spaces with |
, then the empty fields become in incorrect position or I get incorrect number of fields.
What I tried so far is read the first line and store all headers within an array. But currently, I'm getting wrong headers position.
headers=("DISTRIBUTION" "MINIMUM SYSTEM REQUIREMENTS" "BASED ON" "IMAGE SIZE" "LATEST RELEASE YEAR" "FOUNDER")
firstline=$(head -n 1 input.txt)
for w in "${headers[@]}"; do
echo $firstline| grep -b -o "$w"
done
0:DISTRIBUTION
13:MINIMUM SYSTEM REQUIREMENTS
41:BASED ON
50:IMAGE SIZE
61:LATEST RELEASE YEAR
81:FOUNDER
I'm getting the headers positions 0,13,41,50,61,81
and the correct is 0,20,57,74,99,123
.
May be someone could help me with this in bash or with awk that I think could be easier but don't know how to implement it. Thanks
The output I'm looking for would be like below
DISTRIBUTION |MINIMUM SYSTEM REQUIREMENTS |BASED ON |IMAGE SIZE |LATEST RELEASE YEAR |FOUNDER
-------------------|------------------------------------|-- -------------|------------------------|-----------------------|--------------------
Absolute Linux |CPU: Intel/AMD 64bit RAM: 64 MB |Slackware | |2020 |Absolute Linux Team
Alpine Linux |RAM: 128MB |BusyBox, musl | |2022 |LEAF Project members
antiX |CPU: Intel/AMD X86, RAM 256MB | |700 MB Base, 1GB Full |2020 |Anticapitalista
UPDATE
Outputs for the kind solutions provided.
markp's output
DISTRIBUTION |MINIMUM_SYSTEM_REQUIREMENTS |BASED_ON |IMAGE_SIZE |LATEST_RELEASE_YEAR |FOUNDER
------------------|------------------------------------|----------------|------------------------|-----------------------|--------------------
Absolute Linux |CPU: Intel/AMD 64bit RAM: 64 MB | Slackware | | 2020 | Absolute Linux Team
Alpine Linux |RAM: 128MB | BusyBox, musl | | 2022 | LEAF Project members
antiX |CPU: Intel/AMD X86, RAM 256MB | | 700 MB Base, 1GB Full | 2020 | Anticapitalista
Ed Morton's output:
DISTRIBUTION |MINIMUM_SYSTEM_REQUIREMENTS |BASED_ON |IMAGE_SIZE |LATEST_RELEASE_YEAR |FOUNDER ||
------------------|------------------------------------|----------------|------------------------|-----------------------|--------------------||
Absolute Linux |CPU: Intel/AMD 64bit RAM: 64 MB | Slackware | | 2020 | Absolute Linux Te||
Alpine Linux |RAM: 128MB | BusyBox, musl | | 2022 | LEAF Project memb||s
antiX |CPU: Intel/AMD X86, RAM 256MB | | 700 MB Base, 1GB Full | 2020 | Anticapitalista||
kvantour's output
DISTRIBUTION |MINIMUM_SYSTEM_REQUIREMENTS |BASED_ON |IMAGE_SIZE |LATEST_RELEASE_YEAR |FOUNDER
Absolute Linux |CPU: Intel/AMD 64bit RAM: 64 MB | Slackware | | 2020 | Absolute Linux Tea|m
Alpine Linux |RAM: 128MB | BusyBox, musl | | 2022 | LEAF Project membe|r
antiX |CPU: Intel/AMD X86, RAM 256MB | | 700 MB Base, 1GB Full | 2020 | Anticapitalista
tshiono' output
DISTRIBUTION |MINIMUM SYSTEM REQUIREMENTS |BASED ON |IMAGE SIZE |LATEST RELEASE YEAR |FOUNDER
------------------|------------------------------------|-- -------------|------------------------|-----------------------|--------------------
Absolute Linux |CPU: Intel/AMD 64bit RAM: 64 MB | Slackware | | 2020 | Absolute Linux Team
Alpine Linux |RAM: 128MB | BusyBox, musl | | 2022 | LEAF Project members
antiX |CPU: Intel/AMD X86, RAM 256MB | | 700 MB Base, 1GB Full | 2020 | Anticapitalista
Differences in last field (note, didn't let me show the image, only the link)
One awk
idea where we replace characters with pipes:
awk '
function addpipe(line) {
offset=0
for (i=1;i<n;i++) { # loop through separator array using lengths to break line into chunks and piece back together with a pipe
line= substr(line,1,offset+length(a[i])) "|" substr(line,offset+length(a[i])+2)
offset=offset + length(a[i]) +1
}
print line
}
FNR==1 { header=$0; next }
FNR==2 { n=split($0,a,".") # split separator line on periods
addpipe(header)
}
{ addpipe($0) }
' file
This generates:
DISTRIBUTION |MINIMUM SYSTEM REQUIREMENTS |BASED ON |IMAGE SIZE |LATEST RELEASE YEAR |FOUNDER
------------------|------------------------------------|-- -------------|------------------------|-----------------------|--------------------
Absolute Linux |CPU: Intel/AMD 64bit RAM: 64 MB |Slackware | |2020 |Absolute Linux Team
Alpine Linux |RAM: 128MB |BusyBox, musl | |2022 |LEAF Project members
antiX |CPU: Intel/AMD X86, RAM 256MB | |700 MB Base, 1GB Full |2020 |Anticapitalista