I have a ~4GB txt file which is pipe delimited. I am trying to import this text to MongoDB but as you know MongoDB supports only JSON and CSV files. Below is the code so far.
import pandas as pd
import csv
from pymongo import MongoClient
url = "mongodb://localhost:27017"
client = MongoClient(url)
# Creating Database Office
db = client.Office
# Creating Collection Customers
customers = db.Customers
filename = "Names.txt"
data_df = pd.read_fwf(filename, sep="|", engine="python", encoding="latin-1")
fileout = "Names.csv"
output = data_df.to_csv(fileout, sep=",")
print("Finished")
fin = open("Names.csv", "r")
file_data = fin.read()
file_csv = csv.reader(file_data)
Customers.insert_many(file_csv)
The input file "Name.txt" looks like this
Reg|Name|DOB|Friend|Nationality|Profession^M
1122|Sam|01/01/2001|John|USA|Lawyer^M
2456|George|05/10/1999|Pit|Canada|Engineer^M
5645|Brad|02/06/2000|Adam|UK|Doctor^M
If the provided text file is CSV then simply import it to MongoDB or if the txt file is pipe delimited or any other delimited then import it to MongoDB after only after processing the text file to a CSV file. The CSV file that I get in fileout, when imported manually to MongoDB the result looks like this.
col1 col2
id Reg|Name|DOB|Friend|Nationality|Profession
1 1122|Sam|01/01/2001|John|USA|Lawyer
2 2456|George|05/10/1999|Pit|Canada|Engineer
3 5645|Brad|02/06/2000|Adam|UK|Doctor
What I want to achieve is shown below. This was done with the sed
command. First I replaced any "," if in the txt file with "-" using the command
sed -i 's/,/-/g' Names.txt
then I replaced the pipe delimiter with ",":
sed -i 's/|/,/g' Names.txt
col1 col2 col3 col4 col5 col6 col7
id Reg Name DOB Friend Nationality Profession
1 1122 Sam 01/01/2001 John USA Lawyer
2 2456 George 05/10/1999 Pit Canada Engineer
3 5645 Brad 02/06/2000 Adam UK Doctor
I know that the code is not doing anything. But I can't figure out how to make it work.
I am new to all type of programming and I have searched through various answers regarding this question and various other related questions in the site, but none fits my needs.
UPDATE
import csv
import json
from pymongo import MongoClient
url = "mongodb://localhost:27017"
client = MongoClient(url)
db = client.Office
customer = db.Customer
jsonArray = []
with open("Names.txt", "r") as csv_file:
csv_reader = csv.DictReader(csv_file, dialect='excel', delimiter='|', quoting=csv.QUOTE_NONE)
for row in csv_reader:
jsonArray.append(row)
jsonString = json.dumps(jsonArray, indent=1, separators=(",", ":"))
jsonfile = json.loads(jsonString)
customer.insert_many(jsonfile)
This is the new code I came up with after getting some ideas from comments. But now the only problem is I get this error.
Traceback (most recent call last):
File "E:\Anaconda Projects\Mongo Projects\Office Tool\csvtojson.py", line 16, in <module>
jsonString = json.dumps(jsonArray, indent=1, separators=(",", ":"))
File "C:\Users\Predator\anaconda3\lib\json\__init__.py", line 234, in dumps
return cls(
File "C:\Users\Predator\anaconda3\lib\json\encoder.py", line 201, in encode
chunks = list(chunks)
MemoryError
Finally found the solution.
I tested it on a 5GB file although slow it still works. It imports all data from a pipe delimited txt file to MongoDB.
import csv
import json
from pymongo import MongoClient
url_mongo = "mongodb://localhost:27017"
client = MongoClient(url_mongo)
db = client.Office
customer = db.Customer
jsonArray = []
file_txt = "Text.txt"
rowcount = 0
with open(file_txt, "r") as txt_file:
csv_reader = csv.DictReader(txt_file, dialect="excel", delimiter="|", quoting=csv.QUOTE_NONE)
for row in csv_reader:
rowcount += 1
jsonArray.append(row)
for i in range(rowcount):
jsonString = json.dumps(jsonArray[i], indent=1, separators=(",", ":"))
jsonfile = json.loads(jsonString)
customer.insert_one(jsonfile)
print("Finished")
Thank You All for your Ideas