Search code examples
pythoncsvpandasis-empty

Create empty csv file with pandas


I am interacting through a number of csv files and want to append the mean temperatures to a blank csv file. How do you create an empty csv file with pandas?

for EachMonth in MonthsInAnalysis:
    TheCurrentMonth = pd.read_csv('MonthlyDataSplit/Day/Day%s.csv' % EachMonth)
    MeanDailyTemperaturesForCurrentMonth = TheCurrentMonth.groupby('Day')['AirTemperature'].mean().reset_index(name='MeanDailyAirTemperature')
    with open('my_csv.csv', 'a') as f:
        df.to_csv(f, header=False)

So in the above code how do I create the my_csv.csv prior to the for loop?

Just a note I know you can create a data frame then save the data frame to csv but I am interested in whether you can skip this step.

In terms of context I have the following csv files:

enter image description here

Each of which have the following structure:

enter image description here

The Day column reads up to 30 days for each file.

I would like to output a csv file that looks like this:

enter image description here

But obviously includes all the days for all the months.

My issue is that I don't know which months are included in each analysis hence I wanted to use a for loop that used a list that has that information in it to access the relevant csvs, calculate the mean temperature then save it all into one csv.

Input as text:

    Unnamed: 0  AirTemperature  AirHumidity SoilTemperature SoilMoisture    LightIntensity  WindSpeed   Year    Month   Day Hour    Minute  Second  TimeStamp   MonthCategorical    TimeOfDay
6   6   18  84  17  41  40  4   2016    1   1   6   1   1   10106   January Day
7   7   20  88  22  92  31  0   2016    1   1   7   1   1   10107   January Day
8   8   23  1   22  59  3   0   2016    1   1   8   1   1   10108   January Day
9   9   23  3   22  72  41  4   2016    1   1   9   1   1   10109   January Day
10  10  24  63  23  83  85  0   2016    1   1   10  1   1   10110   January Day
11  11  29  73  27  50  1   4   2016    1   1   11  1   1   10111   January Day

Solution

  • I would do it this way: first read up all your CSV files (but only the columns that you really need) into one DF, then make groupby(['Year','Month','Day']).mean() and save resulting DF into CSV file:

    import glob
    import pandas as pd
    
    fmask = 'MonthlyDataSplit/Day/Day*.csv'
    df = pd.concat((pd.read_csv(f, sep=',', usecols=['Year','Month','Day','AirTemperature']) for f in glob.glob(fmask)))
    df.groupby(['Year','Month','Day']).mean().to_csv('my_csv.csv')
    

    and if want to ignore the year:

    import glob
    import pandas as pd
    
    fmask = 'MonthlyDataSplit/Day/Day*.csv'
    df = pd.concat((pd.read_csv(f, sep=',', usecols=['Month','Day','AirTemperature']) for f in glob.glob(fmask)))
    df.groupby(['Month','Day']).mean().to_csv('my_csv.csv')
    

    Some details:

    (pd.read_csv(f, sep=',', usecols=['Month','Day','AirTemperature']) for f in glob.glob('*.csv'))
    

    will generate tuple of data frames from all your CSV files

    pd.concat(...)
    

    will concatenate them into resulting single DF

    df.groupby(['Year','Month','Day']).mean()
    

    will produce wanted report as a data frame, which might be saved into new CSV file:

    .to_csv('my_csv.csv')