Monday, March 19, 2012

BUG(?): Distinct + variables TSQL

Hi All,
the following TSQL statment doesn't return expected value:
declare @.tmp varchar(500)
set @.tmp = ''
select distinct @.tmp = @.tmp + ',' + Col
from ( select 'A' as Col union all select 'B' union all select 'A' ) x
select @.tmp
expected: ',A,B', returns: ',B'
This bug (?) relates to SQL2000 and SQL2005
Regards
Marcin Zacharzewskideclare @.tmp varchar(500)
set @.tmp = ''
select @.tmp = @.tmp + ',' + Col
from
( select 'A' as Col
union
select 'B'
union
select 'A' ) x
select @.tmp
Hope this helps.
mzacharzewski@.linksoft.pl wrote:
> Hi All,
> the following TSQL statment doesn't return expected value:
> declare @.tmp varchar(500)
> set @.tmp = ''
> select distinct @.tmp = @.tmp + ',' + Col
> from ( select 'A' as Col union all select 'B' union all select 'A' ) x
> select @.tmp
> expected: ',A,B', returns: ',B'
> This bug (?) relates to SQL2000 and SQL2005
> Regards
> Marcin Zacharzewski|||Thank you, for your help,
but I just wanted to warn everybody, that such a bug exists in SQL200x
- In my particular situation I found a different walkaround.
select @.tmp = @.tmp + Col from
( select distinct Col from some_table) x
But this unexpected behaviour caused me some problems with my dynamic
SQL.
Hope MS will patch it in following SPs.
Regards:
Marcin Zacharzewski
gandhimanisha@.gmail.com napisal(a):
> declare @.tmp varchar(500)
> set @.tmp = ''
> select @.tmp = @.tmp + ',' + Col
> from
> ( select 'A' as Col
> union
> select 'B'
> union
> select 'A' ) x
> select @.tmp
> Hope this helps.
>
> mzacharzewski@.linksoft.pl wrote:|||mzacharzewski@.linksoft.pl wrote:
> Thank you, for your help,
> but I just wanted to warn everybody, that such a bug exists in SQL200x
> - In my particular situation I found a different walkaround.
> select @.tmp = @.tmp + Col from
> ( select distinct Col from some_table) x
> But this unexpected behaviour caused me some problems with my dynamic
> SQL.
> Hope MS will patch it in following SPs.
> Regards:
> Marcin Zacharzewski
>
Officially, it's not a bug. The correct result of an assignment in a
SELECT statement that returns multiple rows is undefined, so it's
dangerous to rely on it in any case. In fact Books Online says only
that the "last" value returned should be assigned to the variable.
Arguably therefore the result of the query you posted is correct.
David Portas, SQL Server MVP
Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.
SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--|||>> the following TSQL statment doesn't return expected value:
That is not valid syntax and hence we shouldn't have any expectation.
It is unfortunate that such methods are widely used and even promoted by
some as valid shortcuts to generate concatenated lists of column values from
multiple rows.
Anith|||Thanks David,
I my opinion common sense/programming practices suggest this should
work and shame on MS, it doesn't...
There shouldn't be such an unexpected behaviour - without DISTINCT
everything works great, using DISTINCT SQL Server returns one value -
definitely an error should be raised instead. I still consider this as
a bug.
I used:
SELECT @.var = @.var + col FROM table
syntax very often before - instead of other more complicated/longer
statements and that always worked as expected.
TSQL programmer shouldn't waste his time on checking (books online)
whether sth is possible and will work as expected - he should rely on
his programming practice and errors reported instead.
In my particular case this bug made me lots of problems with a
complicated report - consisting of a few views definitions which are
dynamically constucted based on rules defined in a table..
Regards:
Marcin Zacharzewski ( MCT, MCDBA, MCSD, MCSE)
David Portas napisal(a):
> mzacharzewski@.linksoft.pl wrote:
> Officially, it's not a bug. The correct result of an assignment in a
> SELECT statement that returns multiple rows is undefined, so it's
> dangerous to rely on it in any case. In fact Books Online says only
> that the "last" value returned should be assigned to the variable.
> Arguably therefore the result of the query you posted is correct.
> --
> David Portas, SQL Server MVP
> Whenever possible please post enough code to reproduce your problem.
> Including CREATE TABLE and INSERT statements usually helps.
> State what version of SQL Server you are using and specify the content
> of any error messages.
> SQL Server Books Online:
> http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
> --|||mzacharzewski@.linksoft.pl wrote:
> Thanks David,
> I my opinion common sense/programming practices suggest this should
> work and shame on MS, it doesn't...
> There shouldn't be such an unexpected behaviour - without DISTINCT
> everything works great, using DISTINCT SQL Server returns one value -
> definitely an error should be raised instead. I still consider this as
> a bug.
I'll agree that MS should in future disallow these assignments
altogether. BOTH the queries you posted ought to return a syntax error.
It's always seemed obvious to me that there was something faulty about
the @.tmp = @.tmp + ',' + Col syntax in a query with multiple rows. The
result is a string that depends on the evaluation order. But in SQL
there is no way to control the evaluation order so the result is always
going to be unpredictable. Unfortunately, it has turned out that many
people don't find this so obvious. The lesson is that idiosyncratic
"features", however convenient, shouldn't be viewed as a substitute for
good logical design on the part of the developer.
There are some reliable alternatives in SQL Server 2000 and you can
Google for them. In 2005 we have some other options. The following are
proper aggregations (they can be grouped) and you can control the order
of concatenation so as to ensure a deterministic result.
CREATE TABLE tbl (col1 INT NOT NULL, col2 VARCHAR(10) NOT NULL, PRIMARY
KEY (col1,col2));
INSERT INTO tbl (col1,col2) VALUES (1,'ABC');
INSERT INTO tbl (col1,col2) VALUES (1,'DEF');
INSERT INTO tbl (col1,col2) VALUES (2,'GHI');
INSERT INTO tbl (col1,col2) VALUES (2,'JKL');
WITH t AS (
SELECT col1, col2,
ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2) AS row_no
FROM tbl)
SELECT col1,
MAX(CASE WHEN row_no = 1 THEN col2 END)+
MAX(CASE WHEN row_no = 2 THEN ','+col2 ELSE '' END)+
MAX(CASE WHEN row_no = 3 THEN ','+col2 ELSE '' END)+
MAX(CASE WHEN row_no = 4 THEN ','+col2 ELSE '' END)
FROM t
GROUP BY col1 ;
SELECT DISTINCT col1,
SUBSTRING(
(SELECT ','+col2 AS [text()]
FROM tbl
WHERE col1 = T.col1
ORDER BY col2
FOR XML PATH( '' )
), 2,100) AS concat
FROM tbl AS T
ORDER BY col1 ;
David Portas, SQL Server MVP
Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.
SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--|||(mzacharzewski@.linksoft.pl) writes:
> I my opinion common sense/programming practices suggest this should
> work and shame on MS, it doesn't...
> There shouldn't be such an unexpected behaviour - without DISTINCT
> everything works great, using DISTINCT SQL Server returns one value -
> definitely an error should be raised instead. I still consider this as
> a bug.
No matter what your opinion is, it is not a bug, just undefined behaviour.

> I used:
> SELECT @.var = @.var + col FROM table
> syntax very often before - instead of other more complicated/longer
> statements and that always worked as expected.
> TSQL programmer shouldn't waste his time on checking (books online)
Here I must take strong exception. If you think that reading Books
Online is a waste of time, then you have a serious problem.

> whether sth is possible and will work as expected - he should rely on
> his programming practice and errors reported instead.
Relying on programming practice can lead you seriously astray. Consider
this statment:
SELECT ...
FROM tbl
WHERE b <> 0
AND a/b > 1
A programer who is new to SQL but have done a lot of C/C++ following
his programming practice only would gladly assume this would shortcut
and be safe. But he would be very wrong on that point, because SQL does
not shortcut.
Different language has different practices, and a good prorgammer must
check the documentation for the tool he is currently using.
But I can agree that it would be a good thing if SELECT @.x = @.x + col FROM
produced a warning that you are on dangerous grounds. Or even produced an
error (depending on compatibility level). Or simply the behaviour would be
the one that everyone expects.
(The correct way of coding the above is:
SELECT ...
FROM tbl
WHERE CASE WHEN b <> 0 THEN a/b END > 1
)
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||Hi Erland,
I can agree only with your last sentence:
>Erland Sommarskog napisal(a):
>...
> But I can agree that it would be a good thing if SELECT @.x = @.x + col FROM
> produced a warning that you are on dangerous grounds. Or even produced an
> error (depending on compatibility level). Or simply the behaviour would be
> the one that everyone expects.
thats what I meant - the problem is that SQL allows such a
sensible-looking statements and doesn't raise an error/warning.
You misunderstood me - reading SQL Books online is not a waste of time
- I use this great help really often, but I think a programmer
shouldn't waste his time bothering (and digging BOL) whether a
sensible-looking statement is valid - especially considering that:
select @.tmp = @.tmp + col from table
works well in any scenario, and I've been using this for years.
My programming rule tells me:
everythnig that is not an error/warning, and looks-sensible - is
allowed and should give definend/predicitble result.
BTW I don't understand your examples, WHERE clause is evaluated first,
so both your queries are allowed:
select * from table where b<>0 and a/b > 1
and
select * from table where CASE WHEN b <> 0 THEN a/b END > 1
in this particular example, the first one is (IMHO) even better:)
Of course you have to use CASE in order to get a/b value in SELECT
clause:
select CASE WHEN b <> 0 THEN a/b END from table
but I am sure you know it.
Regards:
Marcin Zacharzewski ( MCT, MCDBA, MCSD, MCSE)
Erland Sommarskog napisal(a):
> (mzacharzewski@.linksoft.pl) writes:
> No matter what your opinion is, it is not a bug, just undefined behaviour.|||mzacharzewski@.linksoft.pl wrote:
> I think a programmer
> shouldn't waste his time bothering (and digging BOL) whether a
> sensible-looking statement is valid - especially considering that:
> select @.tmp = @.tmp + col from table
> works well in any scenario, and I've been using this for years.
> My programming rule tells me:
> everythnig that is not an error/warning, and looks-sensible - is
> allowed and should give definend/predicitble result.
How can a string concatentation possibly give a defined and predictable
result if the concatenation order is not specified? Maybe you think
that if you include ORDER BY it will affect the result. But ORDER BY
only applies to multiple row result sets, not to the execution order of
a query. So while it may *seem* to work for you (most of the time),
this undocumented curiosity is just that. It isn't sensible at all in
my book and I would generally recommend you avoid it.
David Portas, SQL Server MVP
Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.
SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--

No comments:

Post a Comment