home


SSIS Generic file importer
Author Nigel Rivett

Generic SSIS file importer


Objective

To create a system to efficiently import csv files which is resistent to changes in the file structure.
Also to minimise the changes needed for a new file format - this should be controlled by table data.


Method


SSIS is an efficient method of importing data to a database.
I would expect the import throughput to be of the order of a Gigabyte per minute in a fairly low specified system.

It is a product that has become to be considered a part of sql server and a developer would be expected to have skills in the product.
It is best to import from files as:
It makes the system easy to test as it just needs a file store and files
Tests are easily repeatable.
There is no to requirement for the source and destination systems to be available at the same time.
The import does not impact the source system
The source system can be in control of the file export and schedule
The destination system can be in control of the file import and processing
The import can be rerun on failure
There are automatic backups of the import data for recovery
There are a few problems with the SSIS product though:

It is used to control processing which complicates the system and causes contention issues
It needs to be configured for the source and destinations at design time
Packages tend to proliferate with import requirements.
There is a tendency to attempt uncontrolled parallel processing which can impact performance and memory usage

This implementation describes a generic importer which will create a package at run time to import from any file.
It imports into a staging table consisting of 100 * 8000 character columns named c001,...,c100 plus a file ID to form a clustered index.
Following the import, the processing SP will need to interpret the columns perform the processing and delete the staging data.
Note: It would be easy to convert the importer to import into different tables with different datatypes and relevant names.
Note: Similarly the process could be used to output tables to files using a definition table
Note: The process can be extended to encompass larger (text) datatypes.

There are many means of creating a product to perform this task but this utilises the methods with which a sql server developer would be expected to have
expertise. It therefore just uses SQL and SSIS to perform the import.
A possibility would be to create a script task or BIML to perform the import but this requires expertise in the product and probably takes more time to get reasonable performance.

This method creates a configured package to import the data at run time.
As such it will prevent issues with process control and give a consistent approach to file import including logging and error handling.
Release control does not need to consider SSIS packages apart from the initial release.
Note: the same product can be easily altered to import from any source and to any destination.
It will mean that you never need to create another file import package.

This article is split into several parts as a development tutorial but the final product can be used and the article treated as documentation.
The objective is to take entries from a table that contains a description of the data to be imported
Use this with a data template table to create the package data to import the file into a generic table.

Tables

PackageTemplate - Static package data and rows to be replaced
FileType - Definition of the file
FileControl - Files to be imported with FileType_id
DataImport - generic table with 100 columns of varchar(8000)

Stored Procedures

s_PackageData - Output the package data



Steps to develop the product

Proof of concept

Can a package be created and executed.

Create an import package: text file import
Execute the package to import data
Import the package xml data into a table
Export the package data to a file
Execute the created package to import data

Find the configurable components of a package

Create simple packages and compare for differences that need to be configured
Single text column import
Two text column import

Package template table

Create a simple package with the required import processing

Import single column to staging table
Set file ID
Log result

Import the package to a table
Identify the configurable rows and data items that need to be replaced
Populate a table with replacement rows for each item that needs to be updated
Create a template table with place holders for the replacement items

File type table

Defines the file types and the column data types to be imported

FileControl table

Defines the files to be imported and references the file type table.
Allocates a FileID

Import configuration table

For the import define the columns that need to be imported

Package generation

SP to take the template and configuration and output the output package data
Input: file type, file id
Output: package data

Package execution control

Input: FileID
Call package generation SP
Create import package
Execute import package



home