Search code examples
mysqldatetimemariadbpreprocessor

Reformat input DATE data


I have a rather large (over 200,000 records) file that I am inserting into a table with 59 columns. The data contains several DATETIME fields. The input dates are in the form '10/06/2019 10:45:58'. How can I input these fields as DATETIME (or maybe just DATE, since the TIME field is irrelevant for my purposes)?

If I set the various date fields to type DATETIME, the fields come up as 0000-00-00 00:00:00 after loading. That's because the database does not know what to do with the input format.

I see two different approaches, but each has issues:

Preprocessing:
I create a script to detect the date fields using Regex and reformat them to the expected form using something like Perl's DateTime::Format::DBI. There is a risk here since the records include freeform TEXT fields that may contain embedded commas and quotes. Positively identifying the DATE fields is difficult just because of scale.

Post-processing:
Create the table with the date fields as VARCHAR and use the STR_TO_DATE SQL function populate the date columns.

INSERT INTO mytable(DATELastDetected, DATEFirstDetected) 
  SELECT STR_TO_DATE(LastDetected, '%c/%e/%Y %H:%i'), 
  STR_TO_DATE(FirstDetected, '%c/%e/%Y %H:%i') 
from mytable;

Third Option?
I've wondered whether I might specify the expected format of the input data for that DATE columns in the CREATE TABLE statement, which would render the whole discussion moot. I've seen another question that mentions the use of DATEFORMAT in a CREATE TABLE statement, but I have not found the right syntax to use.

Appreciate any thoughts.


Solution

  • @ben-personick answered it with his comment. Here's what my Load statement looks like:

        LOAD DATA INFILE '/opt/mysql/work/report.csv'
            INTO TABLE `my_db`.`tbl_reportdata`
        CHARACTER SET utf8mb4
        FIELDS TERMINATED BY ','
            OPTIONALLY ENCLOSED BY '"'
            ESCAPED BY '"'
            LINES TERMINATED BY '\r\n'
            IGNORE 1
            LINES (`IP`,
    [...]
                    `OS`,
                    @FirstDetectedVar,# This field is defined as DATETIME
                    @LastDetectedVar, # This field is defined as DATETIME
    [...]
                    `Category`)
            set
            `FirstDetected` = str_to_date(@FirstDetectedVar, '%m/%d/%Y %H:%i'),
            `LastDetected` = str_to_date(@LstDetectedVar, '%m/%d/%Y %H:%i');
    

    I figured the answer was out there. Hopefully this working example will help someone else.