TSQL Challenge #11 Using Recursive CTE
Here is my proposal about the TSQL Challenge #11.
Given a product table, and a coupon table, we have to find within one or two coupons which combination gives the best discount, with additional rules :
- the discount price can not be less than 70% of the original price
- the total amount of the discount can not exceed 30$
Here are the products :
-- ------- ---------
1 PROD 1 100,00
2 PROD 2 220,00
3 PROD 3 15,00
4 PROD 4 70,00
5 PROD 5 150,00
And the coupons :
-- ----------- ------ ----------
1 CP 1 : -15$ 15 0
2 CP 2 : -5$ 5 0
3 CP 3 : -10% 10 1
4 CP 4 : -12$ 12 0
Regular joins
My first query was using regular joins : cross joins the products and the coupons, filter on the two rules, and make it an outer joins on the other coupons to calculate every requested fields :
(
SELECT
P.ID AS ID,
P.NAME AS NAME,
P.PRICE AS PRICE,
C1.ID AS C1_ID,
C1.NAME AS C1_NAME,
case C1.IS_PERCENT
when 1 then P.PRICE * (100-C1.VALUE)/100
when 0 then P.PRICE - C1.VALUE
end AS C1_PRICE,
P.PRICE - case C1.IS_PERCENT when 1 then P.PRICE * (100-C1.VALUE)/100 when 0 then P.PRICE - C1.VALUE end AS DISC1,
case C1.IS_PERCENT when 1 then P.PRICE * (100-C1.VALUE)/100 when 0 then P.PRICE - C1.VALUE end / P.PRICE AS RATE1
FROM
@T AS P
INNER JOIN @C AS C1
ON P.PRICE - case C1.IS_PERCENT when 1 then P.PRICE * (100-C1.VALUE)/100 when 0 then P.PRICE - C1.VALUE end <= 30
AND case C1.IS_PERCENT when 1 then P.PRICE * (100-C1.VALUE)/100 when 0 then P.PRICE - C1.VALUE end / P.PRICE>= 0.7
), D2 AS
(
SELECT
D1.ID AS ID,
D1.NAME AS NAME,
D1.PRICE AS PRICE,
D1.C1_NAME,
D1.DISC1,
D1.RATE1,
D1.C1_PRICE,
C2.NAME AS C2_NAMES,
case C2.IS_PERCENT
when 1 then D1.C1_PRICE * (100-C2.VALUE)/100
when 0 then D1.C1_PRICE - C2.VALUE
end AS C2_PRICE,
D1.PRICE - case C2.IS_PERCENT when 1 then D1.C1_PRICE * (100-C2.VALUE)/100 when 0 then D1.C1_PRICE - C2.VALUE end AS DISC2,
case C2.IS_PERCENT when 1 then D1.C1_PRICE * (100-C2.VALUE)/100 when 0 then D1.C1_PRICE - C2.VALUE end / D1.PRICE AS RATE2,
ROW_NUMBER() over(PARTITION BY D1.ID ORDER BY COALESCE(D1.PRICE - case C2.IS_PERCENT when 1 then D1.C1_PRICE * (100-C2.VALUE)/100 when 0 then D1.C1_PRICE - C2.VALUE end ,D1.DISC1) DESC) AS RN
FROM
D1
LEFT OUTER JOIN @C AS C2
ON D1.C1_ID <> C2.ID
AND D1.PRICE - case C2.IS_PERCENT when 1 then D1.C1_PRICE * (100-C2.VALUE)/100 when 0 then D1.C1_PRICE - C2.VALUE end <= 30
AND case C2.IS_PERCENT when 1 then D1.C1_PRICE * (100-C2.VALUE)/100 when 0 then D1.C1_PRICE - C2.VALUE end / D1.PRICE>= 0.7
)
SELECT
D2.ID,
D2.NAME,
D2.PRICE,
coalesce(D2.C2_PRICE, D2.C1_PRICE) AS DISC_PRICE,
coalesce(D2.DISC2, D2.DISC1) AS TOT_DISC,
(1-coalesce(D2.RATE2, D2.RATE1))*100 AS RATE,
D2.C1_NAME + coalesce(' + ' + D2.C2_NAMES, '') AS COUPON_NAMES
FROM
D2
WHERE
D2.RN = 1
ORDER BY
D2.ID ASC,
coalesce(D2.DISC2, D2.DISC1) DESC;
The query produces the expected results but I don't like it : that's why I'm not commenting it.
It is a one-shot statement that can't be easily reused if the boss change his mind.
Also, with 58 lines the query is fairly long, and the same formulas are coded twice.
And when you code the same thing over and over, you should think about recursiveness.
First Recursive CTE
I know what field I'm expecting in the final output so all of these, or at last a way to calculate them, should be found into the CTE.
I need the initial values, the final price, and the coupons used. The total discount and rate can be calculated in the outer query.
So here his the anchor part of the CTE :
(
SELECT
-- Initial product value, won't be modified inside the CTE
T.ID, T.NAME, T.PRICE,
-- Init of query results, will be modified inside the CTE
T.PRICE ,
CAST('' AS NVARCHAR(1000)),
-- Init of CTE-needed values, will be modified inside the CTE, won't be querried on
0, 1
FROM @T AS T
-- No filter has we want to track every products
UNION ALL
The C_ID column will be used to store the last coupon id used. It is asked to not reuse the same coupon, but the maximum of them being two, keeping the last one will be ok.
The LVL column will be used to count the number of coupons used. It'll allow me to stop the recursive thing after two.
The DISC_PRICE column is evaluated depending of the coupons being either a dollar value or a percent value.
The COUPON_NAMES column is storing the coupons names on itself.
I also have to filter on the rules, so I'll have to evaluate the discount price and rate within the CTE to filter on them :
D.ID, D.NAME, D.PRICE,
-- Coupon formula depending on the IS_PERCENT
CASE C.IS_PERCENT WHEN 1 THEN D.DISC_PRICE * (100 - C.VALUE)/100 WHEN 0 THEN D.DISC_PRICE - C.VALUE END,
CAST(D.COUPON_NAMES + ' + ' + C.NAME AS NVARCHAR(1000)),
C.ID,
D.LVL + 1
FROM
DISC AS D
INNER JOIN @C AS C
-- Don't use the same coupon twice
ON C.ID <> D.C_ID
WHERE
2 >= D.LVL
AND 30 >= D.PRICE - CASE C.IS_PERCENT WHEN 1 THEN D.DISC_PRICE * (100 - C.VALUE)/100 WHEN 0 THEN D.DISC_PRICE - C.VALUE END
AND 70 <= 100 - ((D.PRICE - CASE C.IS_PERCENT WHEN 1 THEN D.DISC_PRICE * (100 - C.VALUE)/100 WHEN 0 THEN D.DISC_PRICE - C.VALUE END)*100/D.PRICE)
)
Ok now I've got my recursive CTE, producing every possible combination within the rules.
To keep only the best results, I'm using those elements :
- an analytic function to sort the best price per product
- an order by on this function and the product
- a Top 5 query to only keep the best results
The final outer query is :
ID,
NAME,
-- Conversion of the results for desired output
CAST(PRICE AS VARCHAR(6)) + '$' AS PRICE,
CAST(DISC_PRICE AS VARCHAR(6)) + '$' AS DISC_PRICE,
CAST(PRICE - DISC_PRICE AS VARCHAR(6)) + '$' AS TOT_DISC,
CAST(100*(PRICE - DISC_PRICE)/PRICE AS VARCHAR(6)) + '%' AS RATE,
SUBSTRING(COUPON_NAMES, 4, 100) AS COUPON_NAMES
FROM DISC
ORDER BY
-- Sorting by the lower price per product and by product
-- This combined to the top provide only the good result set without subquery
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY DISC_PRICE ASC) ASC,
ID ASC;
Second Recursive CTE
Ok my query is cool but I know how the boss works. They ask you for a thing, and then changes their mind and modify rules on the fly.
So I decided to write this very same query but being a lot more parameterizable.
I wanted my query to work with :
- More products
- More coupons
- The application of more than two coupons
- Different maximum discount and maximum rate
For my query to work with more products, I just have to count them in the TOP query.
And this is allowed in SQL.
For my query to work with different number of coupons, maximum discount and rate, I just have to use parameters.
To avoid any truncation in the COUPON_NAMES column, I'll use the LVL column of the CTE to make the good substring.
The hard part was to make it work with more coupons AND to not reuse anyone of them.
I thought about an XML column to use string unaggregration but you can't make a CTE inside a CTE (I tried).
Maybe I could used some king of array table but I just don't know how to do that.
Finally, I thought using some kind of base-2 mathematics.
If I'm using coupons #2 and #5, I can store the value 10010.
For each new coupon, I just have to sum this value with a nth power of ten, n being the id of the coupon minus one.
Unfortunately, this method works with 19 coupons max, after the number used for storage is too big to be evaluated.
Maybe I could improve with more maths or some BITAND function, but considering I'm already off bounds I'll stick to that solution.
My final-submitted parameterizable query is :
-- They are configured according to the challenge
DECLARE @MAX_COUPON INT;
DECLARE @MAX_DISC DECIMAL(5,2);
DECLARE @MAX_RATE DECIMAL(5,2);
SELECT @MAX_COUPON = 2 ;
SELECT @MAX_DISC = 30.00;
SELECT @MAX_RATE = 70.00;
-- Product table
DECLARE @T TABLE (ID INT IDENTITY, NAME NVARCHAR(20),PRICE MONEY);
INSERT INTO @T (NAME,PRICE) VALUES ('PROD 1',100);
INSERT INTO @T (NAME,PRICE) VALUES ('PROD 2',220);
INSERT INTO @T (NAME,PRICE) VALUES ('PROD 3', 15);
INSERT INTO @T (NAME,PRICE) VALUES ('PROD 4', 70);
INSERT INTO @T (NAME,PRICE) VALUES ('PROD 5',150);
-- Coupon table
DECLARE @C TABLE (ID INT IDENTITY, NAME NVARCHAR(20), VALUE INT, IS_PERCENT BIT) ;
INSERT INTO @C (NAME,VALUE,IS_PERCENT) VALUES ('CP 1 : -15$',15,0);
INSERT INTO @C (NAME,VALUE,IS_PERCENT) VALUES ('CP 2 : - 5$', 5,0);
INSERT INTO @C (NAME,VALUE,IS_PERCENT) VALUES ('CP 3 : -10%',10,1);
INSERT INTO @C (NAME,VALUE,IS_PERCENT) VALUES ('CP 4 : -12$',12,0);
-- Query
WITH DISC (ID, NAME, PRICE, DISC_PRICE, COUPON_NAMES, C_ID, LVL) AS
(
SELECT
T.ID, T.NAME, T.PRICE,
T.PRICE ,
CAST('' AS NVARCHAR(1000)),
CAST(0 AS BIGINT), 1
FROM @T AS T
UNION ALL
SELECT
D.ID, D.NAME, D.PRICE,
CASE C.IS_PERCENT WHEN 1 THEN D.DISC_PRICE * (100 - C.VALUE)/100 WHEN 0 THEN D.DISC_PRICE - C.VALUE END,
CAST(D.COUPON_NAMES + ' + ' + C.NAME AS NVARCHAR(1000)),
D.C_ID + POWER(CAST(10 AS BIGINT), C.ID-1),
D.LVL + 1
FROM
DISC AS D
INNER JOIN @C AS C
ON SUBSTRING(REVERSE(CAST(D.C_ID AS VARCHAR(1000))), C.ID, 1) <> '1'
WHERE
@MAX_COUPON>= D.LVL
AND @MAX_DISC >= D.PRICE - CASE C.IS_PERCENT WHEN 1 THEN D.DISC_PRICE * (100 - C.VALUE)/100 WHEN 0 THEN D.DISC_PRICE - C.VALUE END
AND @MAX_RATE <= 100 - ((D.PRICE - CASE C.IS_PERCENT WHEN 1 THEN D.DISC_PRICE * (100 - C.VALUE)/100 WHEN 0 THEN D.DISC_PRICE - C.VALUE END)*100/D.PRICE)
)
SELECT TOP (SELECT COUNT(*) FROM @T)
ID,
NAME,
CAST(PRICE AS VARCHAR(6)) + '$' AS PRICE,
CAST(DISC_PRICE AS VARCHAR(6)) + '$' AS DISC_PRICE,
CAST(PRICE - DISC_PRICE AS VARCHAR(6)) + '$' AS TOT_DISC,
CAST(100*(PRICE - DISC_PRICE)/PRICE AS VARCHAR(6)) + '%' AS RATE,
SUBSTRING(COUPON_NAMES, 4, 23*(LVL-1)-3) AS COUPON_NAMES
FROM DISC
ORDER BY
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY DISC_PRICE ASC) ASC,
ID ASC;
You can insert more products, more coupons, and play with the parameters with this same query !
This is a 36 lines query, once and half shorter than the ones using joins, and I find it a lot more readable.
That was another very good challenge from the TC team !
This entry is filed under SQLing. And tagged with Challenges, cte, order by, outer join, TSQL. You can follow any responses to this entry through RSS 2.0. You can leave a response, or trackback from your own site.