Search code examples
c#ssisetlflat-filescript-task

'C# script task to remove quadruple quotes before loading .CSV file"


I have a fairly basic SSIS package that will load a .csv file into a SQL table. However, when the package is attempting to read the .csv source in the data flow task I receive the error message: "The column delimiter for column 'X' was not found. An error occurred while processing file "file.csv" on data row 'Y'."

In this case, what is happening is there are a few rows out of thousands that contain a string within quadruple quotes, i.e. "Jane "Jill" Doe." Manually removing the quotes from these rows in UltraEdit works, however, I am attempting to automate these packages. A derived column did not work as it is an issue with the delimiter.

Turns out I need a script task to remove the quadruple quotes before the package can load the file properly. The code below (which I pieced together from various sources) is accepted by SSIS as errorless but encounters a DTS Script Task Runtime Error upon execution:

#region Namespaces
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
#endregion

namespace ST_a881d570d1a6495e84824a72bd28f44f
 {
[Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute]
public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
{
    public void Main()
    {
        // TODO: Add your code here
        var fileContents = System.IO.File.ReadAllText(@"C:\\File.csv");

        fileContents = fileContents.Replace("<body>", "<body onload='jsFx();' />");
        fileContents = fileContents.Replace("</body>", "</body>");

        System.IO.File.WriteAllText(@"C:\\File.csv", fileContents);

    }

    #region ScriptResults declaration
    /// <summary>
    /// This enum provides a convenient shorthand within the scope of this class for setting the
    /// result of the script.
    /// 
    /// This code was generated automatically.
    /// </summary>
    enum ScriptResults
    {
        Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
        Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
    };
    #endregion

    }
}

The alternative script I have is:

{
string filepath = (string)Dts.Variables[@C:\\"File.csv"].Value;
var fileContents = System.IO.File.ReadAllText(filepath);
fileContents = fileContents.Replace("\"\"", "");

System.IO.File.WriteAllText(@C:\\"File.csv", fileContents);

}

What am I doing wrong?


Solution

  • The following C# example will search a csv file, remove any double quotes that are contained with double quoted text, and then write the modified contents back out to the file. The regex returns a match on any double quote that either is not at the beginning or end of the string, or doesn't have a comma directly before/after it and replaces the double quote with an empty string. You're probably already doing this, but make sure the variable holding the file path is listed in the ReadOnlyVariables field of the Script Task.

    using System.Collections.Generic;
    using System.IO;
    using System.Text.RegularExpressions;
    
    
    string filePath = Dts.Variables["User::FilePath"].Value.ToString();
    
    List<String> outputRecords = new List<String>();
    if (File.Exists(filePath))
    {
     using (StreamReader rdr = new StreamReader(filePath))
     {
      string line;
      while ((line = rdr.ReadLine()) != null)
      {
          if (line.Contains(","))
          {
              string[] split = line.Split(',');
    
           //replace double qoutes between text
           line = Regex.Replace(line, "(?<!(,|^))\"(?!($|,))", x => x.Value.Replace("\"", ""));
    
          }
          outputRecords.Add(line);
        }
     }
    
     using (StreamWriter sw = new StreamWriter(filePath, false))
     {
         //write filtered records back to file
         foreach (string s in outputRecords)
             sw.WriteLine(s);
      }
    }