I have downloaded the GeoLiteCountry
CSV file from Maxmind - http://www.maxmind.com/app/geolitecountry. Using the format given to me as standard (so that this can become an automated task) I am attempting import all the data into a table.
I created a new table IPCountries2
which has columns exactly matching the columns provided:
FromIP varchar(50),
ToIP varchar(50),
BeginNum bigint,
EndNum bigint,
CountryCode varchar(50),
CountryName varchar(250)
Using the various chunks of code I could find, I was unable to get it working using the field terminator and row terminator:
BULK
INSERT CSVTest
FROM 'c:\csvtest.txt'
WITH
(
FIELDTERMINATOR = '","',
ROWTERMINATOR = '\n'
)
GO
The result of this was a single row inserted, all correct except the last one had overflowed with the next lines (presumably the whole database if I didn't have a limit). Also, the first cell had a quote at the start.
I looked around and found something called a format file (never used these). Made one which looks like:
10.0
6
1 SQLCHAR 0 50 "," 1 FromIP ""
2 SQLCHAR 0 50 "," 2 ToIP ""
3 SQLBIGINT 0 19 "," 3 BeginNum ""
4 SQLBIGINT 0 19 "," 4 EndNum ""
5 SQLCHAR 0 50 "," 5 CountryCode ""
6 SQLCHAR 0 250 "\n" 6 CountryName ""
but this errors on the bigint lines:
Msg 4867, Level 16, State 1, Line 1
Bulk load data conversion error (overflow) for row 1, column 3 (BeginNum).
It does that 10 times and then stops because of maximum error count.
I was able to get the first method working if I took it into Excel and re-saved, this removed the quotes. However, I don't want to rely on this method as I want this to update automatically every week and not have to open and re-save manually.
I don't mind which of the two methods I use ultimately, just so long as it works with a clean file. I had a look at their documentation but they only have code for PHP or MS Access.
Edit
Some lines from the CSV file:
"1.0.0.0","1.0.0.255","16777216","16777471","AU","Australia"
"1.0.1.0","1.0.3.255","16777472","16778239","CN","China"
"1.0.4.0","1.0.7.255","16778240","16779263","AU","Australia"
"1.0.8.0","1.0.15.255","16779264","16781311","CN","China"
"1.0.16.0","1.0.31.255","16781312","16785407","JP","Japan"
"1.0.32.0","1.0.63.255","16785408","16793599","CN","China"
"1.0.64.0","1.0.127.255","16793600","16809983","JP","Japan"
"1.0.128.0","1.0.255.255","16809984","16842751","TH","Thailand"
"1.1.0.0","1.1.0.255","16842752","16843007","CN","China"
"1.1.1.0","1.1.1.255","16843008","16843263","AU","Australia"
"1.1.2.0","1.1.63.255","16843264","16859135","CN","China"
"1.1.64.0","1.1.127.255","16859136","16875519","JP","Japan"
"1.1.128.0","1.1.255.255","16875520","16908287","TH","Thailand"
Update
After some persisting I was able to get things working 95% with the original method (without the format document). However, it was changed slightly to look like so:
BULK INSERT IPCountries2
FROM 'c:\Temp\GeoIPCountryWhois.csv'
WITH
(
FIELDTERMINATOR = '","',
ROWTERMINATOR = '"'
)
GO
Everything goes in the right fields as they should, the only issue I have is in the first column there is a quote at the beginning. Some sample data:
FromIP ToIP BeginNum EndNum CountryCode Country
"2.21.248.0 2.21.253.255 34994176 34995711 FR France
"2.21.254.0 2.21.254.255 34995712 34995967 EU Europe
"2.21.255.0 2.21.255.255 34995968 34996223 NL Netherlands
Success. Searching around and some help from another forum finally got me to my solution. For those in need of a similar solution, keep reading:
I ended up using the format file method - whether it would be possible to use fieldterminators and row terminators I'm not sure.
My SQL code looks like:
CREATE TABLE #TempTable
(
DuffColumn varchar(50),
FromIP varchar(50),
ToIP varchar(50),
BeginNum bigint,
EndNum bigint,
CountryCode varchar(50),
CountryName varchar(250)
)
BULK
INSERT #TempTable
FROM 'c:\Temp\GeoIPCountryWhois.csv'
WITH
(
FORMATFILE = 'C:\Temp\format.fmt'
)
INSERT INTO IPCountries2 (FromIP, ToIP, BeginNum, EndNum, CountryCode, Country)
SELECT FromIP, ToIP, BeginNum, EndNum, CountryCode, CountryName FROM #TempTable
As found in my research, it was necessary to have a useless column which simply captured the first quote.
My format file looks like:
10.0
7
1 SQLCHAR 0 1 "" 1 DuffColumn ""
2 SQLCHAR 0 50 "\",\"" 2 FromIP ""
3 SQLCHAR 0 50 "\",\"" 3 ToIP ""
4 SQLCHAR 0 19 "\",\"" 4 BeginNum ""
5 SQLCHAR 0 19 "\",\"" 5 EndNum ""
6 SQLCHAR 0 50 "\",\"" 6 CountryCode ""
7 SQLCHAR 0 250 "\"\n" 7 CountryName ""
To note, despite eventually being stored as a BIGINT, BeginNum and EndNum are both passed in as SQLCHARS, otherwise the insert does an odd multiplication on the numbers (something about reading it as bytes rather than digits, I didn't entirely understand it).
And that's about it. The last thing to automate this script fully is to truncate the table first so as to clear out old records. However that might not be to everyones needs.