NoCOUG’s First SQL Challenge !
I was recently randomly browsing Laurent Schneider's blog, and saw a link about a riddle to be solved in plain SQL. The challenge is clear, and very shiny solutions were already found by Laurent Schneider, Craig Martin, Rob van Wijk and Vadim Tropashko. You can find links to these solutions on the comments of Chen Shapira related topic.
I had the idea not to find what is the results of N*val and proba^N when properly self cross joined, but first to see how the faces are distributed when N throws have been done.
This allows me to cross join each face with the number of throw, and finding the final self value is becoming a basic multiplication. But i had problem with probabilities.
My math skills being far from what they used to be, i did a lot of google and manual searching on a spreadsheet, trying to find something. And manual finding won here, with a three sided dice (hmm, well, entity) i saw the developpement of (x+y+z)^n. Now i had a clearer way of finding stuff on wikipedia, this led me to Pascal's triangle, Binomial theorem, Multinomial theorem and finaly Multinomial Distribution.
My SQL was looking like this (Iggy Fernandez made some change in the form of the query, so i'll keep them) :
-- Compute factorials
f AS
(
SELECT 0 AS l,
1 AS f
FROM DUAL
UNION ALL
SELECT LEVEL AS l,
ROUND( EXP( SUM (LN (LEVEL)) OVER (ORDER BY LEVEL ASC))) AS f
FROM DUAL
CONNECT BY LEVEL <= :n
),
-- Compute multinomial coefficients
m AS
(
SELECT
f1.l AS l1,
f2.l AS l2,
f3.l AS l3,
f4.l AS l4,
f5.l AS l5,
f6.l AS l6,
f7.f / (f1.f * f2.f * f3.f * f4.f * f5.f * f6.f) AS m
FROM
f f1
CROSS JOIN f f2
CROSS JOIN f f3
CROSS JOIN f f4
CROSS JOIN f f5
CROSS JOIN f f6
CROSS JOIN (SELECT f FROM f WHERE l = :n) f7
WHERE
f1.l + f2.l <= :n
AND f1.l + f2.l + f3.l <= :n
AND f1.l + f2.l + f3.l + f4.l <= :n
AND f1.l + f2.l + f3.l + f4.l + f5.l <= :n
AND f1.l + f2.l + f3.l + f4.l + f5.l + f6.l = :n
)
-- Compute probabilities
SELECT
face_value,
SUM (probability) AS probability
FROM
(
SELECT
d1.face_value * m.l1 + d2.face_value * m.l2 + d3.face_value * m.l3 + d4.face_value * m.l4 + d5.face_value * m.l5 + d6.face_value * m.l6 AS face_value,
POWER (d1.probability, m.l1) * POWER (d2.probability, m.l2) * POWER (d3.probability, m.l3) * POWER (d4.probability, m.l4) * POWER (d5.probability, m.l5) * POWER (d6.probability, m.l6) * m.m AS probability
FROM
m
CROSS JOIN (SELECT face_value, probability FROM die WHERE face_id = 1) d1
CROSS JOIN (SELECT face_value, probability FROM die WHERE face_id = 2) d2
CROSS JOIN (SELECT face_value, probability FROM die WHERE face_id = 3) d3
CROSS JOIN (SELECT face_value, probability FROM die WHERE face_id = 4) d4
CROSS JOIN (SELECT face_value, probability FROM die WHERE face_id = 5) d5
CROSS JOIN (SELECT face_value, probability FROM die WHERE face_id = 6) d6
) SR
GROUP BY
face_value
ORDER BY
face_value ASC
The Compute Factorial part is not ANSI SQL, it is Oracle SQL. As I ran most of my test at home on SQL Server 2008 Express, one can use this CTE subquery instead :
(
SELECT 0 AS l,
cast(1 AS float(53)) AS f
UNION ALL
SELECT l+1,
cast((l+1)*f AS float(53))
FROM f
WHERE l+1 <= :n
)
My results seems good, but the query is far from matching Alberto Dell'Era's solution in execution time, with an awesome O(N*log(N)) complexity.
But i do have some valid points, not using Fourier transforms, according to Iggy Fernandez, make my probabilities more accurate due to less number computations, and it still a polynomial complexity : O(N^k), k being the number of faces of the dice. Also my query can run in more RDBMS !
If you check at low value of N, results are coming almost instantaneously (say, N <= 20). The highest i checked was N=72, execution time on a quite good computer was around 11 minutes.
I have a follow-up to write about this subject : with the same algorithm, i tried to fake the cross join - one of the bottleneck of the query - with more restrictive conditions and a shifting strategy. The less I can tell is that I had very interesting results.
This entry is filed under SQLing. And tagged with cross join, NoCOUG, Oracle, SQL, TSQL. You can follow any responses to this entry through RSS 2.0. You can leave a response, or trackback from your own site.
[...] seventh solution—by Fabien Contaminard from France—was based on the multinomial probability distribution, an [...]
[...] Craig Martin (USA), Rob van Wijk (Netherlands), Vadim Tropashko (USA), Alberto Dell’Era (Italy), Fabien Contaminard (France), Cd-MaN (Romania), and André Araujo (Australia) for their original solutions. [...]
[...] My own solution was somewhere between O(n^(k-1)) and O(n^k) complexity, with k being the number of faces of the dice. I couldn’t run test past 72 dices throws, I was already on the ten minutes mark ! [...]
[...] one can learn a lot from those Challenges, that's why i like them. My last "wow" thing is about NoCoug's [...]
[...] « NoCOUG’s First SQL Challenge ! [...]