Search code examples
sql-serverssisbidsbiml

Create table before the dataflow in BIML


I am using BIML and BIDSHelper to create SSIS package. I am trying to import data from csv to sql server. I want to create table in the destination database before the dataflow happens. Here is my code:

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Connections>       
    <OleDbConnection Name="CM_OLE" 
                     ConnectionString="Data Source=(localdb)\projects;Initial Catalog=test;Integrated Security=SSPI;Provider=SQLNCLI11">
    </OleDbConnection>
    <FlatFileConnection
            Name="FF Source"
            FileFormat="FFF Source"
            FilePath="F:\test.csv"
            CreateInProject="false" />
</Connections>
<FileFormats>
    <FlatFileFormat
            Name="FFF Source"
            CodePage="1252"
            RowDelimiter="CRLF"
            ColumnNamesInFirstDataRow="true"
            IsUnicode="false"
            FlatFileType="Delimited"
            TextQualifer="_x0022_"
            HeaderRowsToSkip="0">
        <Columns>               
            <Column Name="Column1" Length="50" InputLength="50" MaximumWidth="50" DataType="AnsiString"  ColumnType="Delimited"  CodePage="1252" Delimiter="," TextQualified="true" />
            <Column Name="Column2" Precision="10" Scale="2"  DataType="Decimal"  ColumnType="Delimited"  CodePage="1252" Delimiter="CRLF" TextQualified="true"  />
        </Columns>
    </FlatFileFormat>
</FileFormats>  
<Packages>      
    <Package ConstraintMode="Linear" Name="NumericParsingFromFlatFileInsertIdentity">
        <Tasks> 
            <ExecuteSQL Name="Create table sometablename" ConnectionName="CM_OLE">
                 <DirectInput>
                      CREATE TABLE sometablename(column1 varchar(50) NOT NULL, column2 varchar(10,2) NOT NULL);
                      GO 
                 </DirectInput>
            </ExecuteSQL>
            <Dataflow Name="DFT Source">
                <Transformations>
                    <FlatFileSource ConnectionName="FF Source" Name="FF Source" />
                    <OleDbDestination ConnectionName="CM_OLE" Name="OLEDB DST">
                        <ExternalTableOutput Table="sometablename"></ExternalTableOutput>
                    </OleDbDestination>                     
                </Transformations>
            </Dataflow>         
        </Tasks>
    </Package>
</Packages>

When I try to generate package it says cannot execute query select * from sometablename invalid object name. I understand that table sometablename doesnot exist so it throws the error. So, How could I create the table automatically? I have read the series BI Thoughts and Theories. Part 2 shows way to create table. My understanding is that at the end it also create ExecuteSQl to create the table. I am confused how to run table creation script before the dataflow or what other alternative BIML has to offer?

Thanks in advance


Solution

  • It seems what you're trying to do is not possible with BIML.

    SSIS dataflows require ALL external column metadata to be available at design time. There is no way around this, so the Biml compiler is required to query the data source to get this information, which is then emitted into the package. BIDS/SSDT does this validation constantly as you are working. Biml does it only at build time.

    The purpose of ValidateExternalMetadata=false is actually for SSIS to refrain from checking that the external columns defined in the dataflow metadata still match the external data source during the validation phase when the package is run. But at design/build time, we still need that metadata to exist so that we can create the external column metadata in the first place. To be clear, this is true both for native BIDS/SSDT and for Biml.

    ValidateExternalMetadata was provided by the SSIS team for scenarios such as dynamically creating tables or files that will match a predetermined schema. Usually you would have the schema prebuilt on your dev environment (which you build against) and then dynamically create the same schema on production as it's needed. Disabling validation means that you can do the dynamic creation as part of the same package that reads from or loads into those dynamically created objects.

    We do recognize that there's a need to do builds without having the schema materialized in Dev either. One of the things we're looking at doing in a future release is an "Offline Metadata" feature that would allow you to use Biml to declare your dataflow metadata without having to retrieve it at build time. There would be some scripting work on the user's part to construct the metadata to match what it will look like at run time, but if they get that right, scenarios like yours will be enabled.

    What you could do is add the ValidateExternalMetadata="false" to your OLE DB Destination. Create the table manually on your development environment and then generate the package.

    It should execute without problems on any other environment because you set ValidateExternalMetadata to false.