I have a CSV file encoded as UTF-8, which I downloaded off IMDb.com. I would like to import this data into SSMS 2016 (or 2014) using the Import Wizard. Here is a sample of what the CSV looks like (note the director of Dallas Buyers Club is Jean-Marc Vallée):
"position","const","created","modified","description","Title","Title type","Directors","You rated","IMDb Rating","Runtime (mins)","Year","Genres","Num. Votes","Release Date (month/day/year)","URL"
"38","tt1636826","Tue Feb 16 00:00:00 2016","","","Project X","Feature Film","Nima Nourizadeh","6","6.7","88","2012","comedy, crime","155628","2012-03-01","http://www.imdb.com/title/tt1636826/"
"39","tt0119528","Tue Feb 16 00:00:00 2016","","","Liar Liar","Feature Film","Tom Shadyac","6","6.8","86","1997","comedy, fantasy, romance","217817","1997-03-18","http://www.imdb.com/title/tt0119528/"
"40","tt0790636","Tue Feb 16 00:00:00 2016","","","Dallas Buyers Club","Feature Film","Jean-Marc Vallée","7","8.0","117","2013","biography, drama","321602","2013-09-07","http://www.imdb.com/title/tt0790636/"
I select Flat File Source in the Import Wizard, select my file, and go with the default options (while adding a " as the text qualifier). However, this is an example of what I'm seeing: https://i.sstatic.net/nL4n8.jpg
The diacritic character é is being turned into é. I tried selecting Unicode next to "Locale" in the Import Wizard, but it converted everything to Chinese characters and placed it all in a single cell.
Any idea what is going on here?
Change the encoding on the flat file connection to codepage 65001 (UTF-8) and ensure that the data type is unicode string DT_WSTR.
This link has more of a step by step directions for the process. https://www.mssqltips.com/sqlservertip/3119/import-utf8-unicode-special-characters-with-sql-server-integration-services/