Consider the following file (stock_prices.csv) in CSV format:
Symbol,Price
RY,96.61
NA,58.69
BNS,80.35
When using the odo function in Python to insert the CSV file into a SQLite database table, the NA ticker appears to be inserted as a None value.
from odo import odo, dshape
input_csv = 'stock_prices.csv'
output_sqlite = 'sqlite:///stocks.db::stock_prices'
ds = dshape('var * {Symbol: string, Price: float64}')
odo(input_csv, output_sqlite, dshape=ds)
Here is the code I used to query the SQLite database.
DB_PATH = 'stocks.db'
cn = sqlite3.connect(DB_PATH)
c = cn.cursor()
c.execute("SELECT * FROM stock_prices")
for row in c.fetchall():
print(row)
Results were as follows:
('RY', 96.61)
(None, 58.69)
('BNS', 80.35)
While I could update each row whose Symbol is None with 'NA', I would rather insert the row correctly the first time.
Note: I am using odo function because for my actual project the files I will be inserting into the table are as large as several gigabytes and contain around 15-20 columns. odo seemed to me to be the quickest way for me to accomplish what I need to do in a short period of time.
One way to get around this is to read the file in using pandas
and specifying na_filter=False
.
>>> import pandas as pd
>>> df = pd.read_csv('stock_prices.csv',na_filter=False)
>>> df
Ticker Price
0 RY 96.61
1 NA 58.69
2 BNS 80.35
And import:
odo(df, output_sqlite, dshape=ds)
Results:
>>> for row in c.fetchall():
... print(row)
...
(u'RY', 96.61)
(u'NA', 58.69)
(u'BNS', 80.35)