Search code examples
sql-serversql-server-2008t-sqlcsvcsv-import

Importing Maxmind CSV into SQL Server


I have downloaded the GeoLiteCountry CSV file from Maxmind - http://www.maxmind.com/app/geolitecountry. Using the format given to me as standard (so that this can become an automated task) I am attempting import all the data into a table.

I created a new table IPCountries2 which has columns exactly matching the columns provided:

FromIP       varchar(50),
ToIP         varchar(50),
BeginNum     bigint,
EndNum       bigint,
CountryCode  varchar(50),
CountryName  varchar(250)

Using the various chunks of code I could find, I was unable to get it working using the field terminator and row terminator:

BULK
INSERT CSVTest
FROM 'c:\csvtest.txt'
WITH
(
    FIELDTERMINATOR = '","',
    ROWTERMINATOR = '\n'
)
GO

The result of this was a single row inserted, all correct except the last one had overflowed with the next lines (presumably the whole database if I didn't have a limit). Also, the first cell had a quote at the start.

I looked around and found something called a format file (never used these). Made one which looks like:

10.0
6
1   SQLCHAR     0   50  "," 1   FromIP      ""
2   SQLCHAR     0   50  "," 2   ToIP        ""
3   SQLBIGINT   0   19  "," 3   BeginNum    ""
4   SQLBIGINT   0   19  "," 4   EndNum      ""
5   SQLCHAR     0   50  "," 5   CountryCode ""
6   SQLCHAR     0   250 "\n"    6   CountryName ""

but this errors on the bigint lines:

Msg 4867, Level 16, State 1, Line 1
Bulk load data conversion error (overflow) for row 1, column 3 (BeginNum).

It does that 10 times and then stops because of maximum error count.

I was able to get the first method working if I took it into Excel and re-saved, this removed the quotes. However, I don't want to rely on this method as I want this to update automatically every week and not have to open and re-save manually.

I don't mind which of the two methods I use ultimately, just so long as it works with a clean file. I had a look at their documentation but they only have code for PHP or MS Access.

Edit

Some lines from the CSV file:

"1.0.0.0","1.0.0.255","16777216","16777471","AU","Australia"
"1.0.1.0","1.0.3.255","16777472","16778239","CN","China"
"1.0.4.0","1.0.7.255","16778240","16779263","AU","Australia"
"1.0.8.0","1.0.15.255","16779264","16781311","CN","China"
"1.0.16.0","1.0.31.255","16781312","16785407","JP","Japan"
"1.0.32.0","1.0.63.255","16785408","16793599","CN","China"
"1.0.64.0","1.0.127.255","16793600","16809983","JP","Japan"
"1.0.128.0","1.0.255.255","16809984","16842751","TH","Thailand"
"1.1.0.0","1.1.0.255","16842752","16843007","CN","China"
"1.1.1.0","1.1.1.255","16843008","16843263","AU","Australia"
"1.1.2.0","1.1.63.255","16843264","16859135","CN","China"
"1.1.64.0","1.1.127.255","16859136","16875519","JP","Japan"
"1.1.128.0","1.1.255.255","16875520","16908287","TH","Thailand"

Update

After some persisting I was able to get things working 95% with the original method (without the format document). However, it was changed slightly to look like so:

BULK INSERT IPCountries2
FROM 'c:\Temp\GeoIPCountryWhois.csv'
WITH
(
    FIELDTERMINATOR = '","',
    ROWTERMINATOR = '"'
)
GO

Everything goes in the right fields as they should, the only issue I have is in the first column there is a quote at the beginning. Some sample data:

FromIP  ToIP    BeginNum    EndNum  CountryCode Country
 "2.21.248.0    2.21.253.255    34994176    34995711    FR  France
 "2.21.254.0    2.21.254.255    34995712    34995967    EU  Europe
 "2.21.255.0    2.21.255.255    34995968    34996223    NL  Netherlands

Solution

  • Success. Searching around and some help from another forum finally got me to my solution. For those in need of a similar solution, keep reading:

    I ended up using the format file method - whether it would be possible to use fieldterminators and row terminators I'm not sure.

    My SQL code looks like:

    CREATE TABLE #TempTable
    (
        DuffColumn  varchar(50),
        FromIP      varchar(50),
        ToIP        varchar(50),
        BeginNum    bigint,
        EndNum      bigint,
        CountryCode varchar(50),
        CountryName varchar(250)
    )
    
    BULK
    INSERT #TempTable
    FROM 'c:\Temp\GeoIPCountryWhois.csv'
    WITH
    (
        FORMATFILE = 'C:\Temp\format.fmt'
    )
    
    INSERT INTO IPCountries2 (FromIP, ToIP, BeginNum, EndNum, CountryCode, Country)
        SELECT FromIP, ToIP, BeginNum, EndNum, CountryCode, CountryName FROM #TempTable
    

    As found in my research, it was necessary to have a useless column which simply captured the first quote.

    My format file looks like:

    10.0
    7
    1   SQLCHAR     0   1   ""      1   DuffColumn  ""
    2   SQLCHAR     0   50  "\",\"" 2   FromIP      ""
    3   SQLCHAR     0   50  "\",\"" 3   ToIP        ""
    4   SQLCHAR     0   19  "\",\"" 4   BeginNum    ""
    5   SQLCHAR     0   19  "\",\"" 5   EndNum      ""
    6   SQLCHAR     0   50  "\",\"" 6   CountryCode ""
    7   SQLCHAR     0   250 "\"\n"  7   CountryName ""
    

    To note, despite eventually being stored as a BIGINT, BeginNum and EndNum are both passed in as SQLCHARS, otherwise the insert does an odd multiplication on the numbers (something about reading it as bytes rather than digits, I didn't entirely understand it).

    And that's about it. The last thing to automate this script fully is to truncate the table first so as to clear out old records. However that might not be to everyones needs.