Thursday, March 8, 2012

Bug in Flat File Connection Object: Risky!!!!

I was wondering if any others on this list are getting a bug with the Flat File connection object --where in both csv (coma delimited) or flat files (tab delimited) where strange characters(like two bold vertical lines) are appearing in the file viewer while setting up a connection forcing the CRLF to end in the wrong place. They look like two bold vertical lines

Im testing the same files with both the old DTS from SQL Server 2000 and getting no issues which tells me there is a bug in SSIS. Strange thing the bug does not happen with every file I receive, only a few. I’m wondering if SSIS Flat File Connection object could be a little flaky with respect to how it reads files compared to DTS text connection object.

email me d2ba@.xtra.co.nz for a screen shot of tests

Thanks in advance

Dave

This is due to the way the SSIS flat file parser is designed. Missing delimiters break the parsing. See the following post fo more details:

http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=67416&SiteID=1

Thanks.

|||

Bob,

I had a read --all I can say is Risky to implement the flat file connection object this way in an ETL tool. Risky Risky Risky

The best workaround is to use DTS 2000 to import the flat files.Simply run the DTS 2000 package via SQL Server 2005 Agent using the backwards compatibily.

Once the data is in the SQL Server 2005 staging tables use SSIS for the rest of the package.

Cheers Dave

|||

I have to say I am puzzled by the term "risky" - the behaviour of the current flat file parser is entirely predictable and was introduced in part because the old DTS behaviour could introduce unexpected results - it was also much slower.

The two black characters you are seeing are probably undisplayable characters such as <CR> or <LF>. I expect they are in some lines in some files and not in others - but you might not be able to tell that in other editors as they are undisplayable. I would think it much more risky to behave as if they were not there or to make assumptions as to what was intended when a certain parsing was selected. And indeed, we have plenty of issues in our bug database and customer feedback concerned about the DTS parsing for these reasons.

That's not to say that we could not introduce more flexibility into the current design - although all the flexibility one could want is available through the script source component in 2005.

Perhaps there are some specific aspects of the older DTS parsing you would like to retain, and I would be glad to hear of those. We're always open to suggestions.

Donald

|||I also saw this. Nothing is perfect. I will prefer SSIS to DTS within six months no right now, when I've got as a dutie move hundreds of packages to SSIS, you know. Old habits...|||

Here is what I would like to see in the next release:Until then Im stuck with using DTS imported into SSIS for the dataflow task on certain file types we recieve--unless somebody can show me a script component that can to this. Sometimes I have first row as column names in file and other times I dont have column names in file.

the row delimiter used to be read before the column delimiter in the DTS world... it would be great to be able to set this as an explicitly selected option

Problem Statement:

I have a tab delimited flat file with say 60 columns. All columns can have null values. The file contains a blank tab for nulls.

Now, if a row has null value for last 10 columns, ideally, there should be 10 blank tabs padded to the end of that row. But, this is not the case with this file. There is line break after 50 columns of that row, in the above example.

Extrapolating this behavior, most of the rows have null values for last 'x' columns where 'x' is variable for each row.

When I try to customize a flat file connection for this file, SSIS is not padding up the missing tabs. In the example above, it should insert null in the last 10 columns for that row. Instead, and weirdly, it is continuing to read next row. So, column 1 of next row is inserted into 51 st columns of the above row.

Even Excel 2000 can handle this same file easily with the same delimiters (row - LF and column - tab).

Reason Why SSIS behaves this way:

The flat file parser parses data per-column using current column delimiters, and then advancing to the next one. The row delimiter is considered as nothing more then a column delimiter of the last column.

We considered adding what you are asking for; always check if we get the row delimiter in addition to checking for the current column delimiter, and if the row delimiter comes before expected column delimiter pad the remaining columns of that row.

We decided not to implement this by default for performance reasons. It is likely we will have it in the next version of the product, but only as an option that will need to be explicitly selected.

|||

I'm sorry I don't have time to craft a full script sample for you - perhaps someone out there in the community may be able to.

Is there a specific issue with writing the script that is causing problems? I may be able to advise.

Meanwhile - we'll continue to consider alternative parsing methods for future releases.

Donald

|||

Dave,

I am not an advanced user of SSIS at all BUT things that used to be simple in DTS are often more convoluted than I'd expect in SSIS. Indeed, I ran into various problems with their Flat File Connection object but I presumed that it was my lack of understanding of how to use it. They just seem to have re-architected the way some familiar things worked and have not done a very good job of telling people how they have changed.

Barkingdog.

No comments:

Post a Comment