I have a text file that looks like the below. The delimiters are spaces. As you can see, the delimiters sometimes are doubled or tripled so that simultaneous delimiters should be treated as a single delimiter. Also, I am looking to transform the date into a MySQL timestamp format.
889468 216 -rw-r--r-- 1 wls1 wls1 217868 Nov 1 00:42 /home/wls1/1800WLS610Entry_20191031194242110_C0NTRA.jpg
2889469 228 -rw-r--r-- 1 wls1 wls1 231092 Nov 1 01:21 /home/wls1/1800WLS610Entry_20191031202145570_FPP3360.jpg
2889471 196 -rw-r--r-- 1 wls1 wls1 197452 Nov 1 01:55 /home/wls1/1800WLS610Entry_20191031205544650_0NLY.jpg
2889470 196 -rw-r--r-- 1 wls1 wls1 199512 Nov 1 01:55 /home/wls1/1800WLS610Entry_20191031205544720_C0NTRACT.jpg
2889472 236 -rw-r--r-- 1 wls1 wls1 240152 Nov 1 01:57 /home/wls1/1800WLS610Entry_20191031205719060_KSK6973.jpg
2889473 232 -rw-r--r-- 1 wls1 wls1 236876 Nov 1 01:57 /home/wls1/1800WLS610Entry_20191031205748650_KSK6973.jpg
2889474 224 -rw-r--r-- 1 wls1 wls1 229292 Nov 1 04:22 /home/wls1/1800WLS610Entry_20191031232239000_0NLY.jpg
2889475 228 -rw-r--r-- 1 wls1 wls1 230476 Nov 1 04:28 /home/wls1/1800WLS610Entry_20191031232853120_0NLY.jpg
2889477 224 -rw-r--r-- 1 wls1 wls1 228708 Nov 1 04:31 /home/wls1/1800WLS610Entry_20191031231809320_C0NTRACT.jpg
2889476 216 -rw-r--r-- 1 wls1 wls1 219104 Nov 1 04:31 /home/wls1/1800WLS610Entry_20191031233143530_CTP75.jpg
I need to extract the full path of the file name, the time stamp, and the username of the owner. So that the resulting file looks like this below. The delimiter should be a single tab character. And the date field should be converted into a MySQL timestamp.
/home/wls1/1800WLS610Entry_20191031194242110_C0NTRA.jpg wls1 2019-11-01 00:42:00
/home/wls1/1800WLS610Entry_20191031202145570_FPP3360.jpg wls1 2019-11-01 01:21:00
/home/wls1/1800WLS610Entry_20191031205544650_0NLY.jpg wls1 2019-11-01 01:55:00
/home/wls1/1800WLS610Entry_20191031205544720_C0NTRACT.jpg wls1 2019-11-01 01:55:00
/home/wls1/1800WLS610Entry_20191031205719060_KSK6973.jpg wls1 2019-11-01 01:57:00
/home/wls1/1800WLS610Entry_20191031205748650_KSK6973.jpg wls1 2019-11-01 01:57:00
/home/wls1/1800WLS610Entry_20191031232239000_0NLY.jpg wls1 2019-11-01 04:22:00
/home/wls1/1800WLS610Entry_20191031232853120_0NLY.jpg wls1 2019-11-01 04:28:00
/home/wls1/1800WLS610Entry_20191031231809320_C0NTRACT.jpg wls1 2019-11-01 04:31:00
/home/wls1/1800WLS610Entry_20191031233143530_CTP75.jpg wls1 2019-11-01 04:31:00
To accomplish the above, I have been trying to use cat and cut as such:
cat text.txt | cut -d ' ' -f 12,25,27,28,29
I vary the argument for the -f directive to tell cut which columns I want, but I see that it won't treat simultaneous spaces as a single delimiter.
The above cat/cut statement yields the following:
1 217868 1 00:42
wls1 Nov 1 01:21 /home/wls1/1800WLS610Entry_20191031202145570_FPP3360.jpg
wls1 Nov 1 01:55 /home/wls1/1800WLS610Entry_20191031205544650_0NLY.jpg
wls1 Nov 1 01:55 /home/wls1/1800WLS610Entry_20191031205544720_C0NTRACT.jpg
wls1 Nov 1 01:57 /home/wls1/1800WLS610Entry_20191031205719060_KSK6973.jpg
wls1 Nov 1 01:57 /home/wls1/1800WLS610Entry_20191031205748650_KSK6973.jpg
wls1 Nov 1 04:22 /home/wls1/1800WLS610Entry_20191031232239000_0NLY.jpg
wls1 Nov 1 04:28 /home/wls1/1800WLS610Entry_20191031232853120_0NLY.jpg
wls1 Nov 1 04:31 /home/wls1/1800WLS610Entry_20191031231809320_C0NTRACT.jpg
wls1 Nov 1 04:31 /home/wls1/1800WLS610Entry_20191031233143530_CTP75.jpg
So, the above is a step in the right direction.
But notice that top line? The file size is one character less in that line and so it messed it up. Also, I am uncertain how to re-arrange the order of the columns and re-format the time stamp.
Thanks in advance for your help!
If you want to start with the provided file text.txt
, please try the following:
declare -A m2n=([Jan]=1 [Feb]=2 [Mar]=3 [Apr]=4 [May]=5 [Jun]=6 [Jul]=7 [Aug]=8 [Sep]=9 [Oct]=10 [Nov]=11 [Dec]=12)
while IFS= read -r line; do
fname="$(cut -c 73- <<< "$line")"
read -r -a ary <<< "$line"
date=$(printf "%04d-%02d-%02d" "$(date +%Y)" "${m2n[${ary[7]}]}" "${ary[8]}")
time="${ary[9]}:00"
printf "%s\t%s\t%s\t%s\n" "$fname" "${ary[4]}" "$date" "$time"
done < "text.txt"
Result:
/home/wls1/1800WLS610Entry_20191031194242110_C0NTRA.jpg wls1 2019-11-01 00:42:00
/home/wls1/1800WLS610Entry_20191031202145570_FPP3360.jpg wls1 2019-11-01 01:21:00
/home/wls1/1800WLS610Entry_20191031205544650_0NLY.jpg wls1 2019-11-01 01:55:00
/home/wls1/1800WLS610Entry_20191031205544720_C0NTRACT.jpg wls1 2019-11-01 01:55:00
/home/wls1/1800WLS610Entry_20191031205719060_KSK6973.jpg wls1 2019-11-01 01:57:00
/home/wls1/1800WLS610Entry_20191031205748650_KSK6973.jpg wls1 2019-11-01 01:57:00
/home/wls1/1800WLS610Entry_20191031232239000_0NLY.jpg wls1 2019-11-01 04:22:00
/home/wls1/1800WLS610Entry_20191031232853120_0NLY.jpg wls1 2019-11-01 04:28:00
/home/wls1/1800WLS610Entry_20191031231809320_C0NTRACT.jpg wls1 2019-11-01 04:31:00
/home/wls1/1800WLS610Entry_20191031233143530_CTP75.jpg wls1 2019-11-01 04:31:00
Note that he columns are not visually aligned due to the variable length of the filenames.
A potential problem of the script above is the acquisition of the year. The year information is missing in the file and you may need to add a conditional branch especially when you are crossing the years.
If you can go back to the original files and you can directly execute find
command over them,
please try instead:
find /home/wls1 -type f -name "*.jpg" -printf "%p\t%u\t%TY%Tm%Td\t%TH:%TM:%.2TS\n"
which will bring you the desired output.
Hope this helps.