I have a list of CSV filenames (which is in another CSV file called CSV_file_1). However, I want to put additional two columns in CSV_file_1, in which the row values will come from the thousands of individual CSV files.
CSV_file_1 contents are as follows:
1.csv
2.csv
3.csv
In the thousands of files I have in another folder, it contains values that I want to put in CSV_file_1. Example, 1.csv contains the following rows:
LATITUDE : ;13.63345
LONGITUDE : ;123.207083
2.csv contains the following rows:
LATITUDE : ;13.11111
LONGITUDE : ;123.22222
3.csv contains the following rows:
LATITUDE : ;13.22222
LONGITUDE : ;123.11111
and so on.
The result that I want to have for CSV_file_1 is as follows:
FILENAME: LATITUDE: LONGITUDE:
1.csv 13.63345 123.207083
2.csv 13.11111 123.22222
3.csv 13.22222 123.11111
I already managed to have my CSV_file_1 but without the LATITUDE AND LONGITUDE yet (which will come from individual files which are delimited as shown above).
My code is like this:
import pandas as pd
import glob
print(glob.glob("D:/2021/*.csv"))
#list of all the filenames collated and put in CSV_file_1
CSV_file_1 = pd.DataFrame(glob.glob("D:/2021/*.csv"))
#creating blank columns in CSV_file_1
CSV_file_1 ['Latitude'] = ""
CSV_file_1 ['Longitude'] = ""
#here im trying to access each file in the given folder(file name must correspond to the row in CSV_file_1), extract the data (latitude and longitude) and copy it to CSV_file_1
import csv
with open('D:/2021/*.csv','rt')as file:
data = csv.reader(file)
for row in file:
if glob.glob("D:/2021/*.csv") = CSV_file_1['FILENAME']:
CSV_file_1.iloc[i] ['LATITUDE:'] ==file.iloc[i]
CSV_file_1.to_csv('D:/2021/CSV_file_1.csv', index = False)
but I get invalid syntax.
if glob.glob("D:/2021/*.csv") = CSV_file_1['FILENAME']:
^
SyntaxError: invalid syntax
I am a python newbie so I would like to seek help to fix my code.
If I understand your problem correctly I think your approach is a little bit complex. I implemented a script that is creating the desired output.
First, the CSV file with the names of the other files is read directly into the first column of the data frame. Then, the file names are used to extract the longitude and latitude from each file. For this, I created a function, which you can see in the first part of the script. In the end, I add the extracted values to the data frame and store it in a file in the desired format.
import pandas as pd
import csv
# Function that takes
def get_lati_and_long_from_csv(csv_path):
with open(csv_path,'rt') as file:
# Read csv file content to list of rows
data = list(csv.reader(file, delimiter =';'))
# Take values from row zero and one
latitude = data[0][1]
longitude = data[1][1]
return (latitude, longitude)
def main():
# Define path of first csv file
csv_file_1_path = "CSV_file_1.csv"
# Read data frame from csv file and create correct column name
CSV_file_1 = pd.read_csv(csv_file_1_path, header=None)
CSV_file_1.columns = ['FILENAME:']
# Create list of files to read the coordinates
list_of_csvs = list(CSV_file_1['FILENAME:'])
# Define empty lists to add the coordinates
lat_list = []
lon_list = []
# Iterate over all csv files and extract longitude and latitude
for csv_path in list_of_csvs:
lat, lon = get_lati_and_long_from_csv(csv_path)
lat_list.append(lat)
lon_list.append(lon)
# Add coordinates to the data frame
CSV_file_1['Latitude:'] = lat_list
CSV_file_1['Longitude:'] = lon_list
# Save final data frame to csv file
CSV_file_1.to_csv(csv_file_1_path+'.out', index = False, sep='\t')
if __name__ == "__main__":
main()
Test input file content:
1.csv
2.csv
3.csv
Test output file content:
FILENAME: Latitude: Longitude:
1.csv 13.63345 123.207083
2.csv 13.11111 123.22222
3.csv 13.22222 123.11111
EDIT:
If your files do not contain any other data, I would suggest simplifying things and removing pandas as it is not needed. The following main()
function produces the same result but uses only the CSV module.
def main():
# Define path of first csv file
csv_file_1_path = "CSV_file_1.csv"
# Read file to list containing the paths of the other csv files
with open(csv_file_1_path,'rt') as file:
list_of_csvs = file.read().splitlines()
print(list_of_csvs)
# Define empty lists to add the coordinates
lat_list = []
lon_list = []
# Iterate over all csv files and extract longitude and latitude
for csv_path in list_of_csvs:
lat, lon = get_lati_and_long_from_csv(csv_path)
lat_list.append(lat)
lon_list.append(lon)
# Combine the three different lists to create the rows of the new csv file
data = list(zip(list_of_csvs, lat_list, lon_list))
# Create the headers and combine them with the other rows
rows = [['FILENAME:', 'Latitude:', 'Longitude:']]
rows.extend(data)
# Write everything to the final csv file
with open(csv_file_1_path + '.out','w') as file:
csv_writer = csv.writer(file, dialect='excel', delimiter='\t')
csv_writer.writerows(rows)