To create a system to efficiently import csv files which is resistent to changes in the file structure.
Also to minimise the changes needed for a new file format - this should be controlled by table data.
SSIS is an efficient method of importing data to a database.
I would expect the import throughput to be of the order of a Gigabyte per minute in a fairly low specified system.
It is a product that has become to be considered a part of sql server and a developer would be expected to have skills in the product.
It is best to import from files as:
It makes the system easy to test as it just needs a file store and files
Tests are easily repeatable.
There is no to requirement for the source and destination systems to be available at the same time.
The import does not impact the source system
The source system can be in control of the file export and schedule
The destination system can be in control of the file import and processing
The import can be rerun on failure
There are automatic backups of the import data for recovery
There are a few problems with the SSIS product though:
It is used to control processing which complicates the system and causes contention issues
It needs to be configured for the source and destinations at design time
Packages tend to proliferate with import requirements.
There is a tendency to attempt uncontrolled parallel processing which can impact performance and memory usage
PackageTemplate - Static package data and rows to be replaced
FileType - Definition of the file
FileControl - Files to be imported with FileType_id
DataImport - generic table with 100 columns of varchar(8000)
s_PackageData - Output the package data
Proof of concept
Can a package be created and executed.
Create an import package: text file import
Execute the package to import data
Import the package xml data into a table
Export the package data to a file
Execute the created package to import data
Find the configurable components of a package
Create simple packages and compare for differences that need to be configured
Single text column import
Two text column import
Package template table
Create a simple package with the required import processing
Import single column to staging table
Set file ID
Log result
Import the package to a table
Identify the configurable rows and data items that need to be replaced
Populate a table with replacement rows for each item that needs to be updated
Create a template table with place holders for the replacement items
File type table
Defines the file types and the column data types to be imported
FileControl table
Defines the files to be imported and references the file type table.
Allocates a FileID
Import configuration table
For the import define the columns that need to be imported
Package generation
SP to take the template and configuration and output the output package data
Input: file type, file id
Output: package data
Package execution control
Input: FileID
Call package generation SP
Create import package
Execute import package