Search code examples
csvtextmergeterminal

MacOS - Combine multiple text files into one spreadsheet with one file per a column?


Using MacOS terminal, is there away to take a directory of text files and have them all combined into one spreadsheet (either CSV or Numbers format) so:

  • Every file is in a separate column
  • Every line of the txt file is in a separate row.
  • The files are placed in the spreadsheet in alphabetical order (using the first letter of file name of the text file).

Example 1: here is how my text files look before combining: Example of text files

Example 2: here is how my text files should look in a spreadsheet after combining: enter image description here

(These examples are a partial extract. I actually have 100s of files to combine).


Steps I have tried:

  1. I searched Stack Overflow for an answer, but all the other questions about this task use Python or Panda. I would prefer a solution that could be done directly from MacOS terminal without needing to install packages like Python or Panda.

  2. From researching, I believe the paste command could be used:

    paste -d '\t' *.txt > ^0-merged.csv

However, when I try this, it produces the following error message: paste: Too many open files. It also produces a CSV file that is completely blank.


Solution

  • Ruby is part of MacOS.

    Given:

    head -n 3 *.txt
    ==> GOOD THINGS IN LIFE.txt <==
    Art
    Fun
    Hugs
    
    ==> IN THE BACKYARD.txt <==
    Hose
    Tree
    Soil
    
    ==> KITCHEN CUPBOARD ESSENTIALS.txt <==
    Tea
    Rice
    Milk
    
    ==> KNITTING STITCHES.txt <==
    Rib
    Dip
    Seed
    
    # and the rest of your lines in each case...
    

    You can do:

    ruby -e '
    a=[]
    ARGV.sort.each{|fn|
        a<<[fn]+File.open(fn).read.split(/\R/)
    }
    a.transpose.each{|sa|
        puts sa.join(",")
    }
    ' *.txt
    

    Prints:

    GOOD THINGS IN LIFE.txt,IN THE BACKYARD.txt,KITCHEN CUPBOARD ESSENTIALS.txt,KNITTING STITCHES.txt
    Art,Hose,Tea,Rib
    Fun,Tree,Rice,Dip
    Hugs,Soil,Milk,Seed
    Earth,Fence,Salt,Tile
    Honor,Porch,Pesto,Linen
    Space,Patio,Flour,Cable
    Sport,Grass,Honey,Wicker
    Intelligence,Wading Pool,Baking Powder,Knotted Boxes
    Innovation,Welcome Mat,Vegetable Oil,Chinese Wave
    Confidence,Back Stoop,Tomato Paste,Checkerboard
    Good Deeds,Fruit Tree,Black Pepper,Herringbone
    Creativity,Downspout,Baking Soda,Stockinette
    Education,Birdbath,Ketchup,Garter
    Kindness,Terrace,Surer,Waffle
    Integrity,Planter,Sugar,Puri Ridge
    Faith,Carport,Coffee,Netted
    Friends,Flowerbed,Cinnamon,Elongated
    Respect,Shovel,Cheese,Farrow Rib
    People,Hedges,Bread,Plaited
    Yourself,Rocks,Olive Oil,Clamshell
    Happiness,Lawnmower,Crackers,Bamboo
    Heart,Hot Tub,Pasta,English Rib
    Religion,Garden,Scissors,Basket
    Wisdom,Stoop,Garlic,Raspberry
    

    If you want a 'proper' csv with quoted fields that works better with Excel, you can use the CSV module included with Ruby:

    ruby -r csv -e '
    a=[]
    ARGV.sort.each{|fn|
        a<<[fn]+File.open(fn).read.split(/\R/)
    }
    a=a.transpose
    puts CSV.generate(**{headers:true, quote_empty:true, force_quotes:true}){|csv|
        csv<<a[0]
        a[1..].each{|row|
            csv<<row
        }
    }
    ' *.txt
    

    Prints:

    "GOOD THINGS IN LIFE.txt","IN THE BACKYARD.txt","KITCHEN CUPBOARD ESSENTIALS.txt","KNITTING STITCHES.txt"
    "Art","Hose","Tea","Rib"
    "Fun","Tree","Rice","Dip"
    "Hugs","Soil","Milk","Seed"
    "Earth","Fence","Salt","Tile"
    "Honor","Porch","Pesto","Linen"
    "Space","Patio","Flour","Cable"
    "Sport","Grass","Honey","Wicker"
    "Intelligence","Wading Pool","Baking Powder","Knotted Boxes"
    "Innovation","Welcome Mat","Vegetable Oil","Chinese Wave"
    "Confidence","Back Stoop","Tomato Paste","Checkerboard"
    "Good Deeds","Fruit Tree","Black Pepper","Herringbone"
    "Creativity","Downspout","Baking Soda","Stockinette"
    "Education","Birdbath","Ketchup","Garter"
    "Kindness","Terrace","Surer","Waffle"
    "Integrity","Planter","Sugar","Puri Ridge"
    "Faith","Carport","Coffee","Netted"
    "Friends","Flowerbed","Cinnamon","Elongated"
    "Respect","Shovel","Cheese","Farrow Rib"
    "People","Hedges","Bread","Plaited"
    "Yourself","Rocks","Olive Oil","Clamshell"
    "Happiness","Lawnmower","Crackers","Bamboo"
    "Heart","Hot Tub","Pasta","English Rib"
    "Religion","Garden","Scissors","Basket"
    "Wisdom","Stoop","Garlic","Raspberry"
    

    Comments:

    It also puts the file name at the top of every column. Is there any way to omit the file name? Also, it seems to treat Uppercase A-Z and lowercase a-z as separate (e.g. so file names with A-Z will come first and then file names with a-z) Thanks!

    If you have files of different length, you can pad the end of the shorter files so that you still have a proper matrix to transpose:

    ruby -r csv -e '
    a=[]
    ARGV.sort_by{|s| s.downcase}.each{|fn|
        a<<File.open(fn).read.split(/\R/)
    }
    max_length=a.max_by{|sa| sa.length}.length
    a.each.with_index{|sa,i| 
        if sa.length<max_length then a[i].concat [""]*(max_length-sa.length) end }
    a=a.transpose
    puts CSV.generate(**{headers:true, quote_empty:true, force_quotes:true}){|csv|
        csv<<a[0]
        a[1..].each{|row|
            csv<<row
        }
    }
    ' *.txt