SSIS: How to Remove Occasional Quotes and Replace The Column Delimiter

Recently I was assigned a task of helping a developer to cope with an unusual input file processing.

The file we needed to load into SQL Server had double quotes surrounding those fields that contained comma(s).

So a short excerpt from it would resemble something like:

123,ABC
456,"D,E,F"
"7,89",GHS

Since SSIS’s Flat File Connection does not support alternating text qualifiers, the chief difficulty stems from the fact if we even only remove the quotes, the file structure becomes broken because the extra commas would break such a field into additional columns. I proposed to replace the comma as our delimiter to the vertical pipe (|) character thus leaving commas intact. The quotes become then unnecessary.

Initially we wanted to pre-process the file by using a PowerShell script batch, but while testing it turned out that unless you use .Net objects in PowerShell it cannot operate on relatively large files, we were getting an out of memory errors from the PowerShell when using Get-Content or similar methods. The next attempt was to use the Data Flow Task’s Script Component set as a Transformation. It worked, here is how:

We created a new DFT to just reformat the input file

Transforming_DFT

The payload is basically implemented inside the Script Transformation, but I am showing how the input is set first that came from the flat file set to be consumed as a single row. I want to stress this out – it is important to configure your Flat File Source to “see” the input file as a single column file:

InputCols

The next step was to add the output that will be our modified row:

Input_And_Output

So nothing fancy is going on here and we are now ready to code:

 1: ' Microsoft SQL Server Integration Services user script component
 2: ' This is your new script component in Microsoft Visual Basic .NET
 3: ' ScriptMain is the entrypoint class for script components
 4:  
 5: Imports System
 6: Imports System.Data
 7: Imports System.Math
 8: Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
 9: Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
 10:  
 11: Public Class ScriptMain
 12:     Inherits UserComponent
 13:  
 14:     Public Overrides Sub UnTransformedInput_ProcessInputRow(ByVal Row As UnTransformedInputBuffer)
 15:         '
 16:         ' Unquote and replace the comma to pipe (|) to become the delimiter
 17:         '
 18:  
 19:         Dim raw_string As String = String.Empty
 20:         Dim seacrhedPos As Int32
 21:         Dim firstCommaPos As Int32
 22:         Dim nextQuotePos As Int32
 23:         Dim section_to_modify As String
 24:  
 25:         raw_string = Row.UnTransformedOrgRow
 26:  
 27:         'MesssageBox to see the initial value
 28:         System.Windows.Forms.MessageBox.Show(raw_string)
 29:  
 30:         If raw_string.Length > 0 Then
 31:  
 32:             seacrhedPos = raw_string.IndexOf(",""", 0)
 33:  
 34:             While seacrhedPos <> -1
 35:                 firstCommaPos = raw_string.IndexOf(",", seacrhedPos + 1)
 36:  
 37:                 nextQuotePos = raw_string.IndexOf(""",", firstCommaPos + 1)
 38:  
 39:                 If nextQuotePos > 0 Then
 40:                     section_to_modify = raw_string.Substring(seacrhedPos + 1, raw_string.Length - seacrhedPos - (raw_string.Length - nextQuotePos))
 41:  
 42:                     section_to_modify = section_to_modify.Replace(",", "~").Replace("""", "")
 43:  
 44:                     'Replace 
 45:                     raw_string = raw_string.Substring(0, seacrhedPos + 1) & section_to_modify & _
 46:                     raw_string.Substring(nextQuotePos + 1, raw_string.Length - nextQuotePos - 1)
 47:  
 48:                 Else
 49:                     section_to_modify = raw_string.Substring(seacrhedPos + 1, raw_string.Length - seacrhedPos - 1)
 50:  
 51:                     section_to_modify = section_to_modify.Replace(",", "~").Replace("""", "")
 52:  
 53:                     'Replace 
 54:                     raw_string = raw_string.Substring(0, seacrhedPos + 1) & section_to_modify
 55:  
 56:                 End If
 57:  
 58:                 'Get the next set
 59:                 seacrhedPos = raw_string.IndexOf(",""", 0)
 60:             End While
 61:  
 62:             raw_string = raw_string.Replace(",", "|").Replace("~", ",")
 63:  
 64:         End If
 65:  
 66:         'MesssageBox to see the final value
 67:         System.Windows.Forms.MessageBox.Show(raw_string)
 68:  
 69:         Row.TransformedRow = raw_string
 70:     End Sub
 71:  
 72: End Class
 73:  

This code will pop up a message box showing the pre-processed record, and then the end result, you can safely remove these two MessageBox.Show calls before productionizing your package.

The end result of this DFT will be a new flat file in this format:

123|ABC
456|D,E,F
7,89|GHS
Advertisements

Posted on September 19, 2011, in Uncategorized. Bookmark the permalink. Leave a comment.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: