You Should Like Analytics

1 Comment

I'm not speaking of Google Analytics - even if it's quite awesome - I'm speaking of SQL analytic functions, or window functions.

I first met them back in 2002, when the firm I was working in upgraded their database from Oracle 7.3.4 to Oracle 8i. At first, I just didn't understand the syntax. It felt so different from the regular SQL I was knowing about. At this time I wasn't running after the cool SQL tricks, I was still a beginner and most of my job was to adjust some sheets within Business Objects.

I'm not going to write a tutorial or even explain the rules of those function. I'll translate an easy problem in both classic-SQL and analytic SQL, that I did answer too on the french-speaking Développez.com forum.The analytic functions I'm going to use are only implemented in Oracle as far as I know. But they are into the SQL:2008 standard so I'm fine with that as other DBMS will adapt to a similar syntax.

The problem is the following : you want to know for each course who are the youngest and the oldest student ?

WITH students AS
(
SELECT 1 AS stud_id, 'JAMES' AS name, date '1989-07-14' AS birthdate FROM dual union ALL
SELECT 2, 'JOHN'   , date '1991-02-07' FROM dual union ALL
SELECT 3, 'ROBERT' , date '1986-08-23' FROM dual union ALL
SELECT 4, 'MICHAEL', date '1983-12-02' FROM dual union ALL
SELECT 5, 'WILLIAM', date '1988-03-02' FROM dual union ALL
SELECT 6, 'DAVID'  , date '1984-05-12' FROM dual union ALL
SELECT 7, 'RICHARD', date '1984-06-06' FROM dual union ALL
SELECT 8, 'CHARLES', date '1984-09-07' FROM dual
),
course AS
(
SELECT 1 AS cour_id, 1 AS stud_id FROM dual union ALL
SELECT 2, 2 FROM dual union ALL
SELECT 1, 3 FROM dual union ALL
SELECT 3, 4 FROM dual union ALL
SELECT 2, 5 FROM dual union ALL
SELECT 2, 6 FROM dual union ALL
SELECT 1, 7 FROM dual union ALL
SELECT 1, 8 FROM dual
)
SELECT *
FROM
    students st
    INNER JOIN course co
      ON co.stud_id = st.stud_id

Here are just some datas to work with. I won't repeat the CTEs in the following queries, but they are necessary.

The classic SQL makes us first find what are the min and max birthdate by course - via a subquery - and then finding the students born at those dates :

SELECT
    sq.cour_id,
    sy.name AS youngest,
    so.name AS oldest
FROM
    (
    SELECT
        co.cour_id,
        min(st.birthdate) AS bmin,
        max(st.birthdate) AS bmax
    FROM
        students st
        INNER JOIN course co
          ON co.stud_id = st.stud_id
    GROUP BY
        co.cour_id
    ) sq
    INNER JOIN students so
      ON so.birthdate = sq.bmin
    INNER JOIN students sy
      ON sy.birthdate = sq.bmax
ORDER BY
    sq.cour_id ASC

As you can see, the DMBS have to scan the student table three times.
Now with analytic SQL it's just one scan and a shorter query :

SELECT
    co.cour_id,
    max(st.name) keep (dense_rank last  ORDER BY st.birthdate ASC) AS youngest,
    max(st.name) keep (dense_rank first ORDER BY st.birthdate ASC) AS oldest
FROM
    students st
    INNER JOIN course co
      ON co.stud_id = st.stud_id
GROUP BY
    co.cour_id
ORDER BY
    co.cour_id ASC

Is'nt it very hot ?

Share and Enjoy:
  • Digg
  • del.icio.us
  • Google Bookmarks
  • Facebook
  • TwitThis
  • blogmarks
  • email
  • Furl
  • LinkedIn

This entry is filed under SQLing. And tagged with , , , , , . You can follow any responses to this entry through RSS 2.0. You can leave a response, or trackback from your own site.

1 Response to “You Should Like Analytics”


  1. Oracle KEEP function Order By Keyword at Waldar’s SQLing and Datawarehousing Place

    [...] For those who don't have a clue about this function, I wrote a small post on this subject. [...]