So I'm trying import a .csv file directly into mysql using load data local infile
command but I'm running into fields that are remnants an excel formula field and I don't know how to get rid of them. The content for the first field in the example below has an =
in front of it.
The table structure is setup to allow the first field to be a VARCHAR(100)
but I'd like to make it an INT
if possible. Here is a sample of the csv content that is being uploaded.
"MID","DBA Name","Partner ID","Partner Name","Sub Partner ID","Sub Partner Name","Active Months","Bonus Amount","Bonus Applied Date","Partner Percentage","Partner Share","Total Payment"
="0008788014065741","company2","7968","me,"11839","Joe Blow","0","$50.00","","","","$350.64"
="0008788014065756","company2","7968","you","11839","Joe Blow","0","$50.00","","","","$294.60"
And here is the mysql load command that I'm using to import the data:
sql = """
LOAD DATA LOCAL INFILE '%(upload)s' IGNORE INTO TABLE `%(table)s`
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"'
LINES TERMINATED BY '\\r\\n'
IGNORE 1 LINES ;
""" % {"upload": file, "table": report}
self.db.query( sql )
Is there something that can be done on the import in terms of a regex or something? I don't know, I'm just grasping at straws here...
Thanks for the input!
You can do it two ways with LOAD DATA INFILE
.
First read first field values as is and then strip equal sign =
and double quotes from it in SET
clause. Also you most likely want to do other transformations while you're loading data like:
NULL
s when your fields are emptyLOAD DATA LOCAL INFILE '/path/to/your/file.csv'
IGNORE INTO TABLE table_name
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES
(@MID, DBAName, PartnerID, PartnerName, SubPartnerID, SubPartnerName, ActiveMonths,
@BonusAmount, @BonusAppliedDate, @PartnerPercentage, @PartnerShare, @TotalPayment)
SET MID = TRIM(BOTH '"' FROM SUBSTR(@MID, 2)), -- here we get rid of equal sign and double quotes
BonusAmount = TRIM(LEADING '$' FROM NULLIF(@BonusAmount, '')),
BonusAppliedDate = NULLIF(@BonusAppliedDate, ''),
PartnerPercentage = NULLIF(@PartnerPercentage, ''),
PartnerShare = TRIM(LEADING '$' FROM NULLIF(@PartnerShare, '')),
TotalPayment = TRIM(LEADING '$' FROM NULLIF(@TotalPayment, ''))
Second approach is to leverage LINES STARTING BY
clause
LOAD DATA LOCAL INFILE '/path/to/your/file.csv'
IGNORE INTO TABLE table_name
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\r\n' STARTING BY '='
IGNORE 1 LINES
(MID, DBAName, PartnerID, PartnerName, SubPartnerID, SubPartnerName, ActiveMonths, @BonusAmount, BonusAppliedDate, PartnerPercentage, PartnerShare, @TotalPayment)
SET BonusAmount = TRIM(LEADING '$' FROM NULLIF(@BonusAmount, '')),
BonusAppliedDate = NULLIF(@BonusAppliedDate, ''),
PartnerPercentage = NULLIF(@PartnerPercentage, ''),
PartnerShare = TRIM(LEADING '$' FROM NULLIF(@PartnerShare, '')),
TotalPayment = TRIM(LEADING '$' FROM NULLIF(@TotalPayment, ''))
Now if your target table schema looks something like
CREATE TABLE table_name
(
MID BIGINT,
DBAName VARCHAR(100),
PartnerID INT,
PartnerName VARCHAR(100),
SubPartnerID INT,
SubPartnerName VARCHAR(100),
ActiveMonths INT,
BonusAmount DECIMAL(19, 2),
BonusAppliedDate DATE,
PartnerPercentage DECIMAL(3, 2),
PartnerShare DECIMAL(19, 2),
TotalPayment DECIMAL(19, 2)
);
Then after loading with either method here is what we get in the table
mysql> select * from table_name; +---------------+----------+-----------+-------------+--------------+----------------+--------------+-------------+------------------+-------------------+--------------+--------------+ | MID | DBAName | PartnerID | PartnerName | SubPartnerID | SubPartnerName | ActiveMonths | BonusAmount | BonusAppliedDate | PartnerPercentage | PartnerShare | TotalPayment | +---------------+----------+-----------+-------------+--------------+----------------+--------------+-------------+------------------+-------------------+--------------+--------------+ | 8788014065741 | company2 | 7968 | me | 11839 | Joe Blow | 0 | 50.00 | NULL | NULL | NULL | 350.64 | | 8788014065756 | company2 | 7968 | you | 11839 | Joe Blow | 0 | 50.00 | NULL | NULL | NULL | 294.60 | +---------------+----------+-----------+-------------+--------------+----------------+--------------+-------------+------------------+-------------------+--------------+--------------+ 2 rows in set (0.00 sec)