Is Distinct A Synonym To Order By In MS SQL Server ?
This afternoon, I was chatting with Antoine Gémis about T-SQL, and this subject came on the table.
He was asking if, in my opinion, using both a distinct and an order by in the same statement would make the query worst due to the fact that distinct is already performing a sort on his own.
Working mostly with Oracle, and being sure distinct is not doing any sort, this was my first answer : "As far as I know, distinct is not performing any sort".
But he invited me to check, which I did using this small query :
(
SELECT 1 union ALL
SELECT nm + 1 FROM CTE5 WHERE nm <5
)
SELECT DISTINCT * FROM CTE5;
nm
-----------
1
2
3
4
5
Well, the output seems sorted.
More than that, the explain plan is quite hopeless to me : "Sort (distinct sort)".
In the XML Plan I found those lines :
<OrderBy>
<OrderByColumn Ascending="true">
<ColumnReference Column="Recr1003" />
</OrderByColumn>
</OrderBy>
...
</Sort>
I told him I'll try to prove it wrong, as Tom Kytes did many many times to people who thought that group by made a sort prior to Oracle 10g.
My first guess was right, mixing inline top N queries with multiple distinct totally mess the way the datas are ordered.
Here are three executions of the same query, which have a distinct :
(
SELECT 1 union ALL
SELECT nm + 1 FROM CTE5 WHERE nm <5
)
SELECT DISTINCT
SR.nm, SR.nw
FROM
(
SELECT DISTINCT top 5 nm, NEWID() nw
FROM CTE5
ORDER BY NEWID() DESC
) AS SR;
nm nw
----------- ------------------------------------
5 8BDFFF62-5F80-4D37-B3FB-F8D79AFCD131
1 2A83E5BB-C76B-4D86-A3D3-D01958D558CE
3 81BEE874-1771-4377-941C-CCBCC82B0ACC
4 8A059912-B22B-4876-AECF-2F7A590A94DE
2 8666117D-9AE3-48B6-A810-1421B70F7240
nm nw
----------- ------------------------------------
5 DCE13676-FDE6-432E-8AE5-E970895B8958
4 D7DFDD7D-D15C-4438-958D-6469DDD12FFF
3 7E4700C2-851E-4A90-A31B-2EE29EC3B463
1 0F1ADBA1-52A0-47D8-A91E-2E428ED35ED1
2 2174F91F-EDB2-4473-B9DD-265A9BE7066D
nm nw
----------- ------------------------------------
1 EA5FB757-8386-4D95-9CAE-DD378CD62782
5 0562FCC0-2405-41C9-BA65-788759C26999
3 BE2F9938-1CC7-4893-9DE6-3E9AEAC85174
4 11D5F426-DB47-4DD5-8B9D-37E62725D10C
2 9DB2948C-B917-4180-A112-08A444EA1FA1
The output is obviously not sorted by nm.
Just to be sure, i tried to sort the newid values of the last output :
SELECT 5 , '0562FCC0-2405-41C9-BA65-788759C26999' union ALL
SELECT 3 , 'BE2F9938-1CC7-4893-9DE6-3E9AEAC85174' union ALL
SELECT 4 , '11D5F426-DB47-4DD5-8B9D-37E62725D10C' union ALL
SELECT 2 , '9DB2948C-B917-4180-A112-08A444EA1FA1'
ORDER BY 2 DESC;
nm nw
----------- ------------------------------------
1 EA5FB757-8386-4D95-9CAE-DD378CD62782
3 BE2F9938-1CC7-4893-9DE6-3E9AEAC85174
2 9DB2948C-B917-4180-A112-08A444EA1FA1
4 11D5F426-DB47-4DD5-8B9D-37E62725D10C
5 0562FCC0-2405-41C9-BA65-788759C269991
So, it's neither not sorted by the nw column.
What about the explain plan which seemed hopeless earlier ?
There is the same "Sort (distinct sort)" action at the inline view level, but there is a "Top" action after it.
I'm not sure about what is pertinent so this is a generous portion of the explain plan :
<OutputList>
<ColumnReference Column="Recr1003" />
<ColumnReference Column="Expr1004" />
</OutputList>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRows="5" ActualEndOfScans="1" ActualExecutions="1" />
</RunTimeInformation>
<Top RowCount="false" IsPercent="false" WithTies="false">
<TopExpression>
<ScalarOperator ScalarString="(5)">
<Const ConstValue="(5)" />
</ScalarOperator>
</TopExpression>
<RelOp AvgRowSize="27" EstimateCPU="0.000103147" EstimateIO="0.0112613" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="2" LogicalOp="Distinct Sort" NodeId="1" Parallel="false" PhysicalOp="Sort" EstimatedTotalSubtreeCost="0.0113709">
<OutputList>
<ColumnReference Column="Recr1003" />
<ColumnReference Column="Expr1004" />
</OutputList>
<MemoryFractions Input="1" Output="1" />
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRebinds="1" ActualRewinds="0" ActualRows="5" ActualEndOfScans="0" ActualExecutions="1" />
</RunTimeInformation>
<Sort Distinct="true">
<OrderBy>
<OrderByColumn Ascending="false">
<ColumnReference Column="Expr1004" />
</OrderByColumn>
<OrderByColumn Ascending="true">
<ColumnReference Column="Recr1003" />
</OrderByColumn>
</OrderBy>
So did I trick the SQL Server engine ? Maybe.
But I'm sure of one thing, with this query :
(
SELECT 1 union ALL
SELECT nm + 1 FROM CTE5 WHERE nm <5
)
SELECT DISTINCT
SR.nm, SR.nw
FROM
(
SELECT DISTINCT top 5 nm, NEWID() nw
FROM CTE5
ORDER BY NEWID() DESC
) AS SR
ORDER BY
SR.nm ASC;
nm nw
----------- ------------------------------------
1 74874923-A63C-4FAD-8CBD-A4A3466347F7
2 9C8268E5-C290-4182-B12D-B705E19EF30C
3 67AE7311-739A-4590-B378-38321F304FFC
4 117EDDC9-CD67-4243-A257-65D79B76460A
5 975C2593-8107-455E-BD56-AFB2724274BB
the dataset is always sorted, and the last action shown in the explain plan is :
<OrderBy>
<OrderByColumn Ascending="true">
<ColumnReference Column="Recr1003" />
</OrderByColumn>
</OrderBy>
...
</Sort>
To conclude, I'll use Tom Kytes words (I didn't ask the permission but I'm sure he would allow me to proceed) :
IF YOU WANT YOUR DATA SORTED IN SOME ORDER THAT YOU CAN RELY ON YOU WILL USE ORDER BY.
This entry is filed under SQLing. And tagged with cte, distinct, Explain plan, Oracle, sort, SQL Server, Tom Kyte, TSQL. You can follow any responses to this entry through RSS 2.0. You can leave a response, or trackback from your own site.
Well done Waldar! Nice pragmatic approach. So, SQL is a declarative language, you could be lucky with DISTINCT SORT if it's the outermost part of a simple execution plan, but with no garantee as you do not manage execution plan. If you want to sort, ask for it :-)