Search code examples
c#winformsc#-4.0c#-3.0fastercsv

StreamReader not reading japanese characters from csv file


Reading csv file using stream reader with fields in double quotes of Japanese characters. Its not reading Japanese characters and taking it as unicode characters. I tried different encoding types but its not working for me. Please share me some idea or some other solution to tackle this issue. Or is there a better way to do it.

public DataTable ReadDataFromCSV(string path, char delim)
{
    string fulltext;
    string[] arrColumnNames;
    string[] arrColumnValues;
    string[] arrRows;
    int i, j, n;

    System.Data.DataTable dt = new System.Data.DataTable();
    DataRow row;
    if (delim.ToString().Length < 1)
    {
        delim = ',';
    }
    try
    {
        //' check that the file exists before opening it
        if (File.Exists(path))
        {
            using (TextReader sr = new StreamReader(path,Encoding.UTF8))
            {
            fulltext = sr.ReadToEnd();
            arrRows = fulltext.Split('\n');
            arrColumnNames = arrRows[0].Replace('"', ' ').Trim().Split(delim);
            //'add columns to a datatable
            for (n = 0; n < arrColumnNames.Length - 1; n++)
            {
                dt.Columns.Add(new DataColumn(arrColumnNames[n], System.Type.GetType("System.String")));
            }//next
            for (i = 1; i < arrRows.Length - 1; i++)
            {
                arrColumnValues = arrRows[i].Replace('"', ' ').Trim().Split(delim);
                row = dt.NewRow();
                for (j = 0; j < (arrColumnNames.Length - 1); j++)
                {
                    try
                    {
                        if (!(arrColumnValues[j] == null))
                        {
                            row[arrColumnNames[j]] =                      arrColumnValues[j].Replace('"', ' ').Trim();
                        }
                        else
                        {
                            row[arrColumnNames[j]] = "";
                        }//End If
                    }

                    catch (Exception ex)
                    {
                        Console.Write("ERROR: " + ex.Message);
                    }
                }//next
                dt.Rows.Add(row);
            }//next

        }
        }//End if
    }
    catch (Exception ex)
    {
        Console.Write("ERROR: " + ex.Message);
    }

    finally
    {

    }//End Try

    return dt;
} 

enter image description here


Solution

  • There are lots of diamonds visible in your screenshots so the only thing that's crystal is that the text file is not encoded in utf-8. You should very strongly consider getting in touch with the programmer that generated the file and ask for a fix. Not using a Unicode encoding these days, particularly for a language like Japanese which has many encodings, none of them dominant, is a huge mistake. It was so bad that the language got its own word for the misery it caused.

    Possibilities:

    • 932: Shift-JIS on Windows
    • 20932, 51932: EUC on Unix
    • 50220, 50221, 50222: ISO 2022
    • several EBCDIC code pages, none you should put up with.