Search code examples
bashawkgrepcut

How to get the first n rows of a csv file with a specific column value?


In Kaggle, I have got a csv file like this:

ip,app,device,os,channel,click_time,attributed_time,is_attributed
83230,3,1,13,379,2017-11-06 14:32:21,,0
17357,3,1,19,379,2017-11-06 14:33:34,,1
35810,3,1,13,379,2017-11-06 14:34:12,,0
45745,14,1,13,478,2017-11-06 14:34:52,,0
161007,3,1,13,379,2017-11-06 14:35:08,,1
18787,3,1,16,379,2017-11-06 14:36:26,,0
103022,3,1,23,379,2017-11-06 14:37:44,,0
114221,3,1,19,379,2017-11-06 14:37:59,,0

Now I want to fetch the first 200 rows whose "is_attributed" is 1. How I can do that with "cut" and other utilities please?


Solution

  • When the columns don't change, you can use a simple regexp match:

    grep -E  '(^ip,|,1$)' »file.csv« | head -n 201