Search code examples
linuxcsvsedcut

Combining only specified data into a single csv file


I am familiar with combing csv files using cat. I also am familiar with doing so while specifying rows.

What I need to know though is how to combine only specified columns that start a specified row in the csv files. The csv files I am using are kinda wild but they are all the same format. I have no control over their output and am forced to figure out how to combine a couple hundred files (hopefully not manually).

Example of the data:

| Column1      | Column3 | Column4      | Column5 | Column6      | Column7 | Column8 | Column9 | Column10     | Column11 |
|--------------|---------|--------------|---------|--------------|---------|---------|---------|--------------|----------|
| garbage data |         | garbage data | garbage |              |         |         | garbage |              |          |
| garbage data |         | garbage data |         |              |         |         |         |              |          |
| garbage data |         | garbage data |         |              |         |         |         |              |          |
| garbage data |         | garbage data |         |              |         |         |         |              |          |
| garbage data |         | garbage data |         | garbage      | garbage |         |         |              |          |
| garbage data |         | garbage data |         | good data 1  |         |         |         | good data 1  | garbage  |
| garbage data |         | garbage data |         | good data 2  |         |         |         | good data 2  | garbage  |
| garbage data |         | garbage data |         | good data 3  |         |         |         | good data 3  | garbage  |
| garbage data |         | garbage data |         | good data 4  |         |         |         | good data 4  | garbage  |
| garbage data |         | garbage data |         | good data 5  |         |         |         | good data 5  | garbage  |
| garbage data |         | garbage data |         | good data 6  |         |         |         | good data 6  | garbage  |
| garbage data |         | garbage data |         | good data 7  |         |         |         | good data 7  | garbage  |
| garbage data |         | garbage data |         | good data 8  |         |         |         | good data 8  | garbage  |
| garbage data |         | garbage data |         | good data 9  |         |         |         | good data 9  | garbage  |
| garbage data |         | garbage data |         | good data 10 |         |         |         | good data 10 | garbage  |

EDIT: The desired output would be row 6 where "good data" begins down (files are 1000 to 2000 rows each) from Columns 6 and 10.

EDIT 2: Desired Output

| Column10     | Column6      |
|--------------|--------------|
| good data 1  | good data 1  |
| good data 2  | good data 2  |
| good data 3  | good data 3  |
| good data 4  | good data 4  |
| good data 5  | good data 5  |
| good data 6  | good data 6  |
| good data 7  | good data 7  |
| good data 8  | good data 8  |
| good data 9  | good data 9  |
| good data 10 | good data 10 |

All feedback is most welcome.


Solution

  • Use sed and cut:

    sed '1,6d' file | cut -f6,10
    
    • sed '1,6d' will remove all lines up to the sixth
    • cut -f6,10 will extract the needed columns (using tab as the delimiter)

    To process all csv files in one go:

    sed '1,6d' *.csv | cut -f6,10 > output.csv