I have a log with millions of lines that like this
1482364800 bunch of stuff 172.169.49.138 252377 + many other things
1482364808 bunch of stuff 128.169.49.111 131177 + many other things
1482364810 bunch of stuff 2001:db8:0:0:0:0:2:1 124322 + many other things
1482364900 bunch of stuff 128.169.49.112 849231 + many other things
1482364940 bunch of stuff 128.169.49.218 623423 + many other things
Its so big that I can't really read it into memory for python to parse so i want to zgrep out only the items I need into another smaller file but Im not very good with grep. In python I would normally open.gzip(log.gz) then pull out data[0],data[4],data[5]to a new file so my new file only has the epoc and ip and date(the ip can be ipv6 or 4)
expected result of the new file:
1482364800 172.169.49.138 252377
1482364808 128.169.49.111 131177
1482364810 2001:db8:0:0:0:0:2:1 124322
1482364900 128.169.49.112 849231
1482364940 128.169.49.218 623423
How do I do this zgrep?
Thanks
I'm on OSX and maybe that is the issue but I couldnt get zgrep to work in filtering out columns. and zcat kept added a .Z at the end of the .gz. Here's what I ended up doing:
awk '{print $1,$3,$4}' <(gzip -dc /path/to/source/Largefile.log.gz) | gzip > /path/to/output/Smallfile.log.gz
This let me filter out the 3 columns I needed from the Largefile to a Smallfile while keeping both the source and destination in compressed format.