Search code examples
mysqlcsvyahoo-finance

Update MySQL table with CSV data containing literal values in an unsupported format


Someone recently deleted a question that I thought could be quite helpful to the community. I paraphrase:

I am obtaining a CSV file from the Yahoo! Finance API, with which I then wish to update the following MySQL table:

CREATE TABLE yahoo.static (
  symbol         VARCHAR(10) NOT NULL,
  exchange       VARCHAR(200),
  name           VARCHAR(300),
  capitalization DOUBLE,
  div_pay_date   DATE,
  book_value     DOUBLE,
  float_shares   BIGINT UNSIGNED,
  PRIMARY KEY (symbol)
);

The CSV file has the following format:

"AAUKF","AAUKF","Other OTC","AAUKF","ANGLO AMERICAN OR","AAUKF",29.271B,"AAUKF","26-Apr-12","AAUKF",26.69,"AAUKF",  1134107000

Particular problems include:

  • repeated, superfluous values;

  • suffixes (e.g. K, M and B) that indicate order of magnitude; and

  • dates not formatted in a supported literal format.

How can I update the table from such a CSV file?


Solution

  • Within the LOAD DATA command, one can assign columns to user variables and then perform appropriate operations accordingly:

    LOAD DATA INFILE '/tmp/ystaticB.csv'
        REPLACE
        INTO TABLE yahoo.static
        CHARACTER SET utf8              -- or however the file is encoded
        FIELDS
            TERMINATED BY ','
            OPTIONALLY ENCLOSED BY '"'  -- quotes appear to be optional
        LINES
            TERMINATED BY '\r\n'        -- or however newlines are encoded
        (symbol,@z,exchange,@z,name,@z,@c,@z,@d,@z,book_value,@z,float_shares)
        SET
            capitalization = @c * POW(10, CASE RIGHT(@c,1)
                               WHEN 'K' THEN 3
                               WHEN 'M' THEN 6
                               WHEN 'B' THEN 9
                             END),
            div_pay_date   = STR_TO_DATE(@d, '%e-%b-%y')