Is Distinct A Synonym To Order By In MS SQL Server ?

1 Comment

This afternoon, I was chatting with Antoine Gémis about T-SQL, and this subject came on the table.

He was asking if, in my opinion, using both a distinct and an order by in the same statement would make the query worst due to the fact that distinct is already performing a sort on his own.

Working mostly with Oracle, and being sure distinct is not doing any sort, this was my first answer : "As far as I know, distinct is not performing any sort".

But he invited me to check, which I did using this small query :

WITH CTE5(nm) AS
(
  SELECT 1 union ALL
  SELECT nm + 1 FROM CTE5 WHERE nm <5
)
SELECT DISTINCT * FROM CTE5;

nm
-----------
1
2
3
4
5



Well, the output seems sorted.

More than that, the explain plan is quite hopeless to me : "Sort (distinct sort)".
In the XML Plan I found those lines :

<Sort Distinct="true">
  <OrderBy>
    <OrderByColumn Ascending="true">
      <ColumnReference Column="Recr1003" />
    </OrderByColumn>
  </OrderBy>
...
</Sort>



I told him I'll try to prove it wrong, as Tom Kytes did many many times to people who thought that group by made a sort prior to Oracle 10g.

My first guess was right, mixing inline top N queries with multiple distinct totally mess the way the datas are ordered.

Here are three executions of the same query, which have a distinct :

WITH CTE5(nm) AS
(
  SELECT 1 union ALL
  SELECT nm + 1 FROM CTE5 WHERE nm <5
)
SELECT DISTINCT
    SR.nm, SR.nw
FROM
  (
    SELECT DISTINCT top 5 nm, NEWID() nw
    FROM CTE5
    ORDER BY NEWID() DESC
  ) AS SR;

nm          nw
----------- ------------------------------------
5           8BDFFF62-5F80-4D37-B3FB-F8D79AFCD131
1           2A83E5BB-C76B-4D86-A3D3-D01958D558CE
3           81BEE874-1771-4377-941C-CCBCC82B0ACC
4           8A059912-B22B-4876-AECF-2F7A590A94DE
2           8666117D-9AE3-48B6-A810-1421B70F7240

nm          nw
----------- ------------------------------------
5           DCE13676-FDE6-432E-8AE5-E970895B8958
4           D7DFDD7D-D15C-4438-958D-6469DDD12FFF
3           7E4700C2-851E-4A90-A31B-2EE29EC3B463
1           0F1ADBA1-52A0-47D8-A91E-2E428ED35ED1
2           2174F91F-EDB2-4473-B9DD-265A9BE7066D

nm          nw
----------- ------------------------------------
1           EA5FB757-8386-4D95-9CAE-DD378CD62782
5           0562FCC0-2405-41C9-BA65-788759C26999
3           BE2F9938-1CC7-4893-9DE6-3E9AEAC85174
4           11D5F426-DB47-4DD5-8B9D-37E62725D10C
2           9DB2948C-B917-4180-A112-08A444EA1FA1



The output is obviously not sorted by nm.

Just to be sure, i tried to sort the newid values of the last output :

SELECT 1 AS nm, 'EA5FB757-8386-4D95-9CAE-DD378CD62782' nw union ALL
SELECT 5      , '0562FCC0-2405-41C9-BA65-788759C26999'    union ALL
SELECT 3      , 'BE2F9938-1CC7-4893-9DE6-3E9AEAC85174'    union ALL
SELECT 4      , '11D5F426-DB47-4DD5-8B9D-37E62725D10C'    union ALL
SELECT 2      , '9DB2948C-B917-4180-A112-08A444EA1FA1'   
ORDER BY 2 DESC;

nm          nw
----------- ------------------------------------
1           EA5FB757-8386-4D95-9CAE-DD378CD62782
3           BE2F9938-1CC7-4893-9DE6-3E9AEAC85174
2           9DB2948C-B917-4180-A112-08A444EA1FA1
4           11D5F426-DB47-4DD5-8B9D-37E62725D10C
5           0562FCC0-2405-41C9-BA65-788759C269991



So, it's neither not sorted by the nw column.

What about the explain plan which seemed hopeless earlier ?
There is the same "Sort (distinct sort)" action at the inline view level, but there is a "Top" action after it.

I'm not sure about what is pertinent so this is a generous portion of the explain plan :

<RelOp AvgRowSize="27" EstimateCPU="2E-07" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="2" LogicalOp="Top" NodeId="0" Parallel="false" PhysicalOp="Top" EstimatedTotalSubtreeCost="0.0113711">
  <OutputList>
    <ColumnReference Column="Recr1003" />
    <ColumnReference Column="Expr1004" />
  </OutputList>
  <RunTimeInformation>
    <RunTimeCountersPerThread Thread="0" ActualRows="5" ActualEndOfScans="1" ActualExecutions="1" />
  </RunTimeInformation>
  <Top RowCount="false" IsPercent="false" WithTies="false">
    <TopExpression>
      <ScalarOperator ScalarString="(5)">
        <Const ConstValue="(5)" />
      </ScalarOperator>
    </TopExpression>
    <RelOp AvgRowSize="27" EstimateCPU="0.000103147" EstimateIO="0.0112613" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="2" LogicalOp="Distinct Sort" NodeId="1" Parallel="false" PhysicalOp="Sort" EstimatedTotalSubtreeCost="0.0113709">
      <OutputList>
        <ColumnReference Column="Recr1003" />
        <ColumnReference Column="Expr1004" />
      </OutputList>
      <MemoryFractions Input="1" Output="1" />
      <RunTimeInformation>
        <RunTimeCountersPerThread Thread="0" ActualRebinds="1" ActualRewinds="0" ActualRows="5" ActualEndOfScans="0" ActualExecutions="1" />
      </RunTimeInformation>
      <Sort Distinct="true">
        <OrderBy>
          <OrderByColumn Ascending="false">
            <ColumnReference Column="Expr1004" />
          </OrderByColumn>
          <OrderByColumn Ascending="true">
            <ColumnReference Column="Recr1003" />
          </OrderByColumn>
        </OrderBy>



So did I trick the SQL Server engine ? Maybe.

But I'm sure of one thing, with this query :

WITH CTE5(nm) AS
(
  SELECT 1 union ALL
  SELECT nm + 1 FROM CTE5 WHERE nm <5
)
SELECT DISTINCT
    SR.nm, SR.nw
FROM
  (
    SELECT DISTINCT top 5 nm, NEWID() nw
    FROM CTE5
    ORDER BY NEWID() DESC
  ) AS SR
ORDER BY
    SR.nm ASC;

nm          nw
----------- ------------------------------------
1           74874923-A63C-4FAD-8CBD-A4A3466347F7
2           9C8268E5-C290-4182-B12D-B705E19EF30C
3           67AE7311-739A-4590-B378-38321F304FFC
4           117EDDC9-CD67-4243-A257-65D79B76460A
5           975C2593-8107-455E-BD56-AFB2724274BB



the dataset is always sorted, and the last action shown in the explain plan is :

<Sort Distinct="false">
  <OrderBy>
    <OrderByColumn Ascending="true">
      <ColumnReference Column="Recr1003" />
    </OrderByColumn>
  </OrderBy>
...
</Sort>



To conclude, I'll use Tom Kytes words (I didn't ask the permission but I'm sure he would allow me to proceed) :

IF YOU WANT YOUR DATA SORTED IN SOME ORDER THAT YOU CAN RELY ON YOU WILL USE ORDER BY.


Share and Enjoy:
  • Digg
  • del.icio.us
  • Google Bookmarks
  • Facebook
  • TwitThis
  • blogmarks
  • email
  • Furl
  • LinkedIn

This entry is filed under SQLing. And tagged with , , , , , , , . You can follow any responses to this entry through RSS 2.0. You can leave a response, or trackback from your own site.

1 Response to “Is Distinct A Synonym To Order By In MS SQL Server ?”


  1. Antoine Gémis

    Well done Waldar! Nice pragmatic approach. So, SQL is a declarative language, you could be lucky with DISTINCT SORT if it's the outermost part of a simple execution plan, but with no garantee as you do not manage execution plan. If you want to sort, ask for it :-)