TSQL Challenge #13 Break The Batches, Pacmann Arithmetics Around Case
TSQL Challenge #13
My previous post was a very long one and it took me several hours just to write it properly, plus many hours to do the findings (which was a part of my day job, but most of the tests were done at home).
Having zero comment is a little frustrating to be honest, I hope my fellow readers just wait for part two !
I don't have yet an ETA for this, I plan it to be released in September - don't hack this site if it doesn't happen, as everyone I have a life to deal with !
That's enough for the whining part, in fact I just wanted to say I put more energy than usual in it and I didn't even write about T-SQL Challenge #13 !
Well, the rules are hard to sum up in three sentences so feel free to read the whole explanation over the official website.
For the short version, you have to create set of invoices (having multiples rows between one and ten) that can't be split between multiple sets.
A set can't possess more than ten rows but have to possess the maximum number of invoices.
The set number is reinitialized to one for every batch.
Every rows and columns from this following dataset are known except the set column which is the one to be found :
------ ----------- ------------- ----------- ------ ---
1 10000001 20001 2009-01-01 50.00 1
2 10000001 20001 2009-01-02 50.00 1
3 10000001 20001 2009-01-03 50.00 1
4 10000001 20001 2009-01-04 50.00 1
5 10000001 20002 2009-01-01 50.00 1
6 10000001 20002 2009-01-02 50.00 1
7 10000001 20002 2009-01-03 50.00 1
8 10000001 20003 2009-01-01 50.00 1
9 10000001 20003 2009-01-02 50.00 1
10 10000001 20004 2009-01-01 50.00 2
11 10000001 20004 2009-01-02 50.00 2
12 10000001 20004 2009-01-03 50.00 2
13 10000001 20004 2009-01-04 50.00 2
14 10000001 20004 2009-01-05 50.00 2
15 10000001 20004 2009-01-06 50.00 2
16 10000001 20005 2009-01-01 50.00 3
17 10000001 20005 2009-01-02 50.00 3
18 10000001 20005 2009-01-13 50.00 3
19 10000001 20005 2009-01-14 50.00 3
20 10000001 20005 2009-01-15 50.00 3
21 10000001 20005 2009-01-06 50.00 3
22 10000001 20005 2009-01-07 50.00 3
23 10000001 20005 2009-01-08 50.00 3
24 10000001 20006 2009-01-01 50.00 3
25 10000002 20007 2009-01-01 50.00 1
26 10000002 20007 2009-01-02 50.00 1
27 10000002 20007 2009-01-03 50.00 1
28 10000002 20008 2009-01-01 50.00 1
29 10000002 20008 2009-01-02 50.00 1
30 10000002 20008 2009-01-03 50.00 1
If you want to test your query, I'm providing this script that creates and populates a T-SQL Challenge #13 rules compatible table of one million rows.
It's in plain SQL (I took it as a challenge inside the challenge) but it's not very fast : on my home computer (which is very decent) it took about 15 minutes to create the table.
Feel free to share any comment regarding this script.
You have up to the 14th of September (GMT time) to send your query, following the new terms and guidelines.
Happy SQLing !
Pacmann Arithmetics
Pacmann opened his blog quite recently.
He writes about Oracle (he loves Oracle) and his articles are very interesting and filled up with SQL to illustrate his talkings.
They are written sometimes in English, sometimes in French or even in German.
Note that his titles are always in English.
I'm writing about his last entry (written in French), "We do not need this CASE instruction !".
Actually, you can understand the SQL parts.
He tries (and succeed) to recreate the CASE instruction about integers using arithmetics function.
He already did this for date functions in the comments of this topic.
I hope he will sum it up in a clean and well explained post that I could refer to !.
He warns you to do not try this at work if you do care about your job (that's the red bold framed text at the beginning).
He concludes with :
Under-performing, totally unreadable, useless... long life to cosmetic SQL !
But he made a huge mistake :
- Unreadable, well, I agree.
- Useless, not if you love mathematics but probably for an SQL developer.
- Under-performing ! PACMANN, come on ! Show me your tests and results !
I ran them... Here's the first query :
declare
a integer := 7;
b integer := 3;
c integer := 4;
d integer := 2;
x integer;
begin
FOR i IN 1..1e8
loop
x := CASE WHEN a>= 2 AND (b = 6 OR c = -1) AND d <> 5 THEN 5 ELSE 0 END;
end loop;
end;
/
16.40 seconds
-- arithmetics query
-- same declare
begin
FOR i IN 1..1e8
loop
x := trunc(1 - abs(a - 2 - abs(a-2)) / (abs(a - 2 - abs(a-2)) + 1)) *
(trunc(1 - abs(b - 6) / (abs(b - 6) + 1)) + trunc(1 - abs(c + 1) / (abs(c + 1 ) + 1)) - trunc(1 - abs(b - 6) / (abs(b - 6) + 1)) * trunc(1 - abs(c + 1) / (abs(c + 1 ) + 1))) *
(1 - trunc(1 - abs(d - 5) / (abs(d - 5) + 1))) * 5;
end loop;
end;
/
3.33 seconds
Ok, five times faster ! Can we conclude anything ?
Not now, let's run another test :
declare
a integer;
b integer;
c integer;
d integer;
x integer;
begin
FOR i IN 1..1e7
loop
a := round(dbms_random.value(-10, 10));
b := round(dbms_random.value(-10, 10));
c := round(dbms_random.value(-10, 10));
d := round(dbms_random.value(-10, 10));
x := CASE WHEN a>= 2 AND (b = 6 OR c = -1) AND d <> 5 THEN 5 ELSE 0 END;
end loop;
end;
/
46.27 seconds
-- arithmetics query
-- same declare
begin
FOR i IN 1..1e7
loop
a := round(dbms_random.value(-10, 10));
b := round(dbms_random.value(-10, 10));
c := round(dbms_random.value(-10, 10));
d := round(dbms_random.value(-10, 10));
x := trunc(1 - abs(a - 2 - abs(a-2)) / (abs(a - 2 - abs(a-2)) + 1)) *
(trunc(1 - abs(b - 6) / (abs(b - 6) + 1)) + trunc(1 - abs(c + 1) / (abs(c + 1 ) + 1)) - trunc(1 - abs(b - 6) / (abs(b - 6) + 1)) * trunc(1 - abs(c + 1) / (abs(c + 1 ) + 1))) *
(1 - trunc(1 - abs(d - 5) / (abs(d - 5) + 1))) * 5;
end loop;
end;
/
61.72 seconds
With random values now arithmetics are one and a half slower.
Hard to conclude anything here...
For the fun I tried on SQL Server, my biggest issue was the lack of the Oracle trunc function, you have to cast your number into another format which is not performing very well :
DECLARE @A INTEGER;
DECLARE @B INTEGER;
DECLARE @C INTEGER;
DECLARE @D INTEGER;
DECLARE @X INTEGER;
SET @I = 0;
SET @A = 7;
SET @B = 3;
SET @C = 4;
SET @D = 2;
WHILE @I <1e7
BEGIN
SET @X = CASE WHEN @A>= 2 AND (@B = 6 OR @C = -1) AND @D <> 5 THEN 5 ELSE 0 END;
SET @I = @I + 1;
END;
7.50 seconds
-- I don't how how to improve the timing, SET STATISTICS TIME ON times every query inside the while loop
-- Arithmetic query
-- Same declare
WHILE @I <1e7
BEGIN
SET @X = CAST(1.0 - abs(@A - 2.0 - abs(@A-2.0)) / (abs(@A - 2.0 - abs(@A-2.0)) + 1.0) AS INT) *
(CAST(1.0 - abs(@B - 6.0) / (abs(@B - 6.0) + 1.0) AS INT) + CAST(1.0 - abs(@C + 1.0) / (abs(@C + 1.0 ) + 1.0) AS INT) - CAST(1.0 - abs(@B - 6.0) / (abs(@B - 6.0) + 1.0) AS INT) * CAST(1.0 - abs(@C + 1.0) / (abs(@C + 1.0 ) + 1.0) AS INT)) *
(1.0 - CAST(1.0 - abs(@D - 5.0) / (abs(@D - 5.0) + 1.0) AS INT)) * 5.0;
SET @I = @I + 1;
END;
43.50 seconds
-- To test the performance, I also tried with the floor function, note that the results will be false
-- Arithmetic wrong query
-- Same declare
WHILE @I <1e7
BEGIN
SET @X = floor(1.0 - abs(@A - 2 - abs(@A-2)) / (abs(@A - 2 - abs(@A-2)) + 1.0)) *
(floor(1.0 - abs(@B - 6.0) / (abs(@B - 6.0) + 1.0)) + floor(1.0 - abs(@C + 1.0) / (abs(@C + 1.0 ) + 1.0)) - floor(1.0 - abs(@B - 6.0) / (abs(@B - 6.0) + 1.0)) * floor(1.0 - abs(@C + 1.0) / (abs(@C + 1.0 ) + 1.0))) *
(1.0 - floor(1.0 - abs(@D - 5.0) / (abs(@D - 5.0) + 1.0))) * 5.0;
SET @I = @I + 1;
END;
43.50 seconds
On SQL Server, the difference is very notable !
Against what I thought, the overhead of casting instead of flooring is non-existent.
I didn't ran the random value test (because random values are not fun to write in T-SQL).
I know those little tests aren't complete and therefore looking unprofessional, but I wrote them only to hand some gas over to Pacmann for him to look about it, it's his idea after all !
This entry is filed under SQLing. And tagged with Arithmetic, Challenges, Oracle, Pacmann, TSQL. You can follow any responses to this entry through RSS 2.0. You can leave a response, or trackback from your own site.
In fact, this time Oracle is the winner.
With the fixed integers test including the case statement, I ran 1e8 loops in Oracle in 16.4 seconds against 1e7 loops in SQL Server in 7.5 seconds.
Hi Waldar !
You're right, that's the way we should always do it... and of course Uncle Tom would agree with you !
"You assume : i don't care
Test, test, test, show me results and evidence !"
(Right, it's probably not an exact quote :))
But i'm a bit sad that once again, your results show that SQL Server performs better :) :) :)