Search code examples
pythoncsvarcpy

How to copy unique individuals from a csv file into new files


I am very new to python so please bear with me. I have a csv file that looks like this:

Animal Locations.

I am trying to loop through the file and for every unique individual there are, create a new csv file and copy rows over. I was successful in doing this for one animal but I am having trouble creating the syntax for a more generic approach. Here is what I currently have:

import arcpy
import csv
from csv import DictReader

WS = arcpy.env.workspace = raw_input("Where if your workspace")
infile = raw_input("where is your file?") 
outfile = raw_input("What is your outfile name?") 
arcpy.env.overwriteOutput = True


with open(infile, "r") as csvFile, open(outfile, "w") as out, open("outfile2.csv", "w") as out2:
    reader = csv.DictReader(csvFile)
    writer = csv.writer(out)
    writer.writerow(reader.fieldnames)
    for row in reader:
         if row["Animal"] == "1":
            values = [row[field] for field in reader.fieldnames]
            writer.writerow(values)

Solution

  • To write each Animal into its own CSV file, you would need to open a different file for each type of animal. This could be done by using a dictionary to store the file object and csv writer object for each animal. At the end, this could then be used to close all the files correctly:

    import csv
    
    output_csvs = {}    # e.g. {'1' : [file_object, csv_object]}
    
    with open('input.csv', 'rb') as f_input:
        csv_reader = csv.reader(f_input)
        header = next(csv_reader)
    
        for row in csv_reader:
            animal = row[0]
    
            if animal in output_csvs:
                output_csvs[animal][1].writerow(row)
            else:
                f_output = open('animal_{}.csv'.format(animal), 'wb')
                csv_output = csv.writer(f_output)
                output_csvs[animal] = [f_output, csv_output]
                csv_output.writerow(header)
                csv_output.writerow(row)
    
    for csv_file, csv_writer in output_csvs.values():
        csv_file.close()
    

    This would give you a set of output CSV files named according to the animal, e.g. animal_1.csv


    Alternatively, if the data is small enough to be read into memory, it could be sorted by animal and output one block at a time by making use of Python's itertools.groupby() function:

    from itertools import groupby
    import csv
    
    with open('input.csv', 'rb') as f_input:
        csv_reader = csv.reader(f_input)
        header = next(csv_reader)
    
        for animal, group in groupby(sorted(csv_reader), lambda x: x[0]):
            with open('animal_{}.csv'.format(animal), 'wb') as f_output:
                csv_output = csv.writer(f_output)
                csv_output.writerow(header)
                csv_output.writerows(group)
    

    Using sorted() ensures that all animals of the same kind are grouped together. If this is already the case in the data, sorting is not needed.


    To access these files you could use glob.glob():

    import matplotlib.pyplot as plt            
    import glob
    
    for animal_filename in glob.glob('animal_*.csv'):
        with open(animal_filename, 'rb') as f_input:
            csv_input = csv.reader(f_input)
            heading = next(csv_input)
            x, y = [], []
    
            for row in csv_input:
                x.append(int(row[1]))
                y.append(int(row[2]))
    
            fig, ax = plt.subplots()
            plt.title(animal_filename)
            ax.scatter(x, y)
    
    plt.show()