Why is this column order in my non-clustered index better for my query?How should row-specific metadata be...
Single-row INSERT...SELECT much slower than separate SELECT
How IPsec tunnel mode work without GRE
What kind of places would goblins live in a fantasy setting with strong states?
Not a Long-Winded Riddle
Need help with a circuit diagram where the motor does not seem to have any connection to ground. Error with diagram? Or am i missing something?
Why didn't Tom Riddle take the presence of Fawkes and the Sorting Hat as more of a threat?
Plausible reason for gold-digging ant
Stuck on a Geometry Puzzle
How much light is too much?
Eww, those bytes are gross
Do authors have to be politically correct in article-writing?
What makes papers publishable in top-tier journals?
Possible issue with my W4 and tax return
What species should be used for storage of human minds?
How to not let the Identify spell spoil everything?
How to create a label containing values from different layers in QGIS
The No-Straight Maze
How to deal with an underperforming subordinate?
Word for something that's always reliable, but never the best?
Why did Ylvis use "go" instead of "say" in phrases like "Dog goes 'woof'"?
Book where a space ship journeys to the center of the galaxy to find all the stars had gone supernova
How do you funnel food off a cutting board?
Why is 'diphthong' not pronounced otherwise?
Website seeing my Facebook data?
Why is this column order in my non-clustered index better for my query?
How should row-specific metadata be created handled for an outer join view?Clustered index always better than Non-Clustered index?Parent-Child Tree Hierarchical ORDERSHOWPLAN does not display a warning but “Include Execution Plan” does for the same querySQL Server suggestion to create nonclustered index - on 2 columns, reverseddeteriorating stored procedure running timeswhere to add a column with low cardinality, decimal type into an index?Is the WHERE-JOIN-ORDER-(SELECT) rule for index column order wrong?Optimize delete query in SQL Server 2008 R2 SP1Speed up INSERT procedure
I am working on a query for a table that contains movie tickets. The database holds 380k rows. A rows represents a showing of the movie (which cinema, when, how many tickets and at what price among other things).
I need to compute a few totals for every row: Admissions Paid
, Admissions Revenue
, Admissions Free
and Total Admissions
.
For a given row Admissions Paid
is the sum of all tickets for that movie up until that point where price>0
. The other 3 columns are computed similarly.
I wrote a query and created an index:
SELECT [ID]
,[cinema_name]
,[movie_title]
,[price]
,[quantity]
,[start_date_time]
,* --I need all the columns for reporting
,(select SUM(quantity)
from [movies] i
where i.movie_title=o.movie_title
and i.start_date_time<=o.start_date_time
and price=0) as [Admissions Free]
,(select SUM(quantity)
from [movies] i
where i.movie_title=o.movie_title
and i.start_date_time<=o.start_date_time
and price>0) as [Admissions Paid]
,(select SUM(quantity*price)
from [movies] i
where i.movie_title=o.movie_title
and i.start_date_time<=o.start_date_time
and price>0) as [Admissions Revenue]
,(select SUM(quantity)
from [movies] i
where i.movie_title=o.movie_title
and i.start_date_time<=o.start_date_time) as [Total Admissions]
FROM [movies] o
I created the following index which brought the query time down to 5 minutes:
CREATE NONCLUSTERED INDEX [startdatetime_movietitle_price] ON [dbo].[movies]
(
[movie_title] ASC,
[start_date_time] ASC,
[price] DESC
)
INCLUDE ( [quantity]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
But this index brought the query time down to 1:30:
CREATE NONCLUSTERED INDEX [startdatetime_movietitle_price] ON [dbo].[movies]
(
[start_date_time] ASC,
[movie_title] ASC,
[price] DESC
)
INCLUDE ( [quantity]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
So my question is: why? From my understanding, it makes more sense to first gather all the movie titles and then look at the start times because there are more start times then there are movies. Distinct movies
: 51, distinct start_date_times
: 8786
Doesn't the underling B-Tree not cut off more branches if it eliminates the unnecessary start_date_times
first?
Here are the execution plans:
The first picture shows the execution plan for the index with movie_title
first, the other picture shows start_date_time
first.
sql-server sql-server-2014
New contributor
|
show 2 more comments
I am working on a query for a table that contains movie tickets. The database holds 380k rows. A rows represents a showing of the movie (which cinema, when, how many tickets and at what price among other things).
I need to compute a few totals for every row: Admissions Paid
, Admissions Revenue
, Admissions Free
and Total Admissions
.
For a given row Admissions Paid
is the sum of all tickets for that movie up until that point where price>0
. The other 3 columns are computed similarly.
I wrote a query and created an index:
SELECT [ID]
,[cinema_name]
,[movie_title]
,[price]
,[quantity]
,[start_date_time]
,* --I need all the columns for reporting
,(select SUM(quantity)
from [movies] i
where i.movie_title=o.movie_title
and i.start_date_time<=o.start_date_time
and price=0) as [Admissions Free]
,(select SUM(quantity)
from [movies] i
where i.movie_title=o.movie_title
and i.start_date_time<=o.start_date_time
and price>0) as [Admissions Paid]
,(select SUM(quantity*price)
from [movies] i
where i.movie_title=o.movie_title
and i.start_date_time<=o.start_date_time
and price>0) as [Admissions Revenue]
,(select SUM(quantity)
from [movies] i
where i.movie_title=o.movie_title
and i.start_date_time<=o.start_date_time) as [Total Admissions]
FROM [movies] o
I created the following index which brought the query time down to 5 minutes:
CREATE NONCLUSTERED INDEX [startdatetime_movietitle_price] ON [dbo].[movies]
(
[movie_title] ASC,
[start_date_time] ASC,
[price] DESC
)
INCLUDE ( [quantity]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
But this index brought the query time down to 1:30:
CREATE NONCLUSTERED INDEX [startdatetime_movietitle_price] ON [dbo].[movies]
(
[start_date_time] ASC,
[movie_title] ASC,
[price] DESC
)
INCLUDE ( [quantity]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
So my question is: why? From my understanding, it makes more sense to first gather all the movie titles and then look at the start times because there are more start times then there are movies. Distinct movies
: 51, distinct start_date_times
: 8786
Doesn't the underling B-Tree not cut off more branches if it eliminates the unnecessary start_date_times
first?
Here are the execution plans:
The first picture shows the execution plan for the index with movie_title
first, the other picture shows start_date_time
first.
sql-server sql-server-2014
New contributor
1
Could you provide execution plans for both indexing options?
– vonPryz
9 hours ago
1
Subqueries are too strange... It seems they can be converted to window-type SUM().
– Akina
9 hours ago
@SabinBio o is the outer table. I edited the question.
– dakes
9 hours ago
1
The key is hereDistinct movies: 51, distinct start_date_times: 8786
; some hints could be at the logical reads and in the execution plan(s). For an index, it's very important the first column and how selective it is
– Sabin Bio
9 hours ago
2
Could you post a link to the individual plans via Paste The Plan? You can anonymise information using Sentry One's Plan Explorer if need be.
– hot2use
4 hours ago
|
show 2 more comments
I am working on a query for a table that contains movie tickets. The database holds 380k rows. A rows represents a showing of the movie (which cinema, when, how many tickets and at what price among other things).
I need to compute a few totals for every row: Admissions Paid
, Admissions Revenue
, Admissions Free
and Total Admissions
.
For a given row Admissions Paid
is the sum of all tickets for that movie up until that point where price>0
. The other 3 columns are computed similarly.
I wrote a query and created an index:
SELECT [ID]
,[cinema_name]
,[movie_title]
,[price]
,[quantity]
,[start_date_time]
,* --I need all the columns for reporting
,(select SUM(quantity)
from [movies] i
where i.movie_title=o.movie_title
and i.start_date_time<=o.start_date_time
and price=0) as [Admissions Free]
,(select SUM(quantity)
from [movies] i
where i.movie_title=o.movie_title
and i.start_date_time<=o.start_date_time
and price>0) as [Admissions Paid]
,(select SUM(quantity*price)
from [movies] i
where i.movie_title=o.movie_title
and i.start_date_time<=o.start_date_time
and price>0) as [Admissions Revenue]
,(select SUM(quantity)
from [movies] i
where i.movie_title=o.movie_title
and i.start_date_time<=o.start_date_time) as [Total Admissions]
FROM [movies] o
I created the following index which brought the query time down to 5 minutes:
CREATE NONCLUSTERED INDEX [startdatetime_movietitle_price] ON [dbo].[movies]
(
[movie_title] ASC,
[start_date_time] ASC,
[price] DESC
)
INCLUDE ( [quantity]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
But this index brought the query time down to 1:30:
CREATE NONCLUSTERED INDEX [startdatetime_movietitle_price] ON [dbo].[movies]
(
[start_date_time] ASC,
[movie_title] ASC,
[price] DESC
)
INCLUDE ( [quantity]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
So my question is: why? From my understanding, it makes more sense to first gather all the movie titles and then look at the start times because there are more start times then there are movies. Distinct movies
: 51, distinct start_date_times
: 8786
Doesn't the underling B-Tree not cut off more branches if it eliminates the unnecessary start_date_times
first?
Here are the execution plans:
The first picture shows the execution plan for the index with movie_title
first, the other picture shows start_date_time
first.
sql-server sql-server-2014
New contributor
I am working on a query for a table that contains movie tickets. The database holds 380k rows. A rows represents a showing of the movie (which cinema, when, how many tickets and at what price among other things).
I need to compute a few totals for every row: Admissions Paid
, Admissions Revenue
, Admissions Free
and Total Admissions
.
For a given row Admissions Paid
is the sum of all tickets for that movie up until that point where price>0
. The other 3 columns are computed similarly.
I wrote a query and created an index:
SELECT [ID]
,[cinema_name]
,[movie_title]
,[price]
,[quantity]
,[start_date_time]
,* --I need all the columns for reporting
,(select SUM(quantity)
from [movies] i
where i.movie_title=o.movie_title
and i.start_date_time<=o.start_date_time
and price=0) as [Admissions Free]
,(select SUM(quantity)
from [movies] i
where i.movie_title=o.movie_title
and i.start_date_time<=o.start_date_time
and price>0) as [Admissions Paid]
,(select SUM(quantity*price)
from [movies] i
where i.movie_title=o.movie_title
and i.start_date_time<=o.start_date_time
and price>0) as [Admissions Revenue]
,(select SUM(quantity)
from [movies] i
where i.movie_title=o.movie_title
and i.start_date_time<=o.start_date_time) as [Total Admissions]
FROM [movies] o
I created the following index which brought the query time down to 5 minutes:
CREATE NONCLUSTERED INDEX [startdatetime_movietitle_price] ON [dbo].[movies]
(
[movie_title] ASC,
[start_date_time] ASC,
[price] DESC
)
INCLUDE ( [quantity]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
But this index brought the query time down to 1:30:
CREATE NONCLUSTERED INDEX [startdatetime_movietitle_price] ON [dbo].[movies]
(
[start_date_time] ASC,
[movie_title] ASC,
[price] DESC
)
INCLUDE ( [quantity]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
So my question is: why? From my understanding, it makes more sense to first gather all the movie titles and then look at the start times because there are more start times then there are movies. Distinct movies
: 51, distinct start_date_times
: 8786
Doesn't the underling B-Tree not cut off more branches if it eliminates the unnecessary start_date_times
first?
Here are the execution plans:
The first picture shows the execution plan for the index with movie_title
first, the other picture shows start_date_time
first.
sql-server sql-server-2014
sql-server sql-server-2014
New contributor
New contributor
edited 24 mins ago
MDCCL
6,84331745
6,84331745
New contributor
asked 10 hours ago
dakesdakes
183
183
New contributor
New contributor
1
Could you provide execution plans for both indexing options?
– vonPryz
9 hours ago
1
Subqueries are too strange... It seems they can be converted to window-type SUM().
– Akina
9 hours ago
@SabinBio o is the outer table. I edited the question.
– dakes
9 hours ago
1
The key is hereDistinct movies: 51, distinct start_date_times: 8786
; some hints could be at the logical reads and in the execution plan(s). For an index, it's very important the first column and how selective it is
– Sabin Bio
9 hours ago
2
Could you post a link to the individual plans via Paste The Plan? You can anonymise information using Sentry One's Plan Explorer if need be.
– hot2use
4 hours ago
|
show 2 more comments
1
Could you provide execution plans for both indexing options?
– vonPryz
9 hours ago
1
Subqueries are too strange... It seems they can be converted to window-type SUM().
– Akina
9 hours ago
@SabinBio o is the outer table. I edited the question.
– dakes
9 hours ago
1
The key is hereDistinct movies: 51, distinct start_date_times: 8786
; some hints could be at the logical reads and in the execution plan(s). For an index, it's very important the first column and how selective it is
– Sabin Bio
9 hours ago
2
Could you post a link to the individual plans via Paste The Plan? You can anonymise information using Sentry One's Plan Explorer if need be.
– hot2use
4 hours ago
1
1
Could you provide execution plans for both indexing options?
– vonPryz
9 hours ago
Could you provide execution plans for both indexing options?
– vonPryz
9 hours ago
1
1
Subqueries are too strange... It seems they can be converted to window-type SUM().
– Akina
9 hours ago
Subqueries are too strange... It seems they can be converted to window-type SUM().
– Akina
9 hours ago
@SabinBio o is the outer table. I edited the question.
– dakes
9 hours ago
@SabinBio o is the outer table. I edited the question.
– dakes
9 hours ago
1
1
The key is here
Distinct movies: 51, distinct start_date_times: 8786
; some hints could be at the logical reads and in the execution plan(s). For an index, it's very important the first column and how selective it is– Sabin Bio
9 hours ago
The key is here
Distinct movies: 51, distinct start_date_times: 8786
; some hints could be at the logical reads and in the execution plan(s). For an index, it's very important the first column and how selective it is– Sabin Bio
9 hours ago
2
2
Could you post a link to the individual plans via Paste The Plan? You can anonymise information using Sentry One's Plan Explorer if need be.
– hot2use
4 hours ago
Could you post a link to the individual plans via Paste The Plan? You can anonymise information using Sentry One's Plan Explorer if need be.
– hot2use
4 hours ago
|
show 2 more comments
2 Answers
2
active
oldest
votes
The first index does look like a better fit for the query. Please provide the actual execution plans.
I would try using window functions instead of the four correlated subqueries. Or a single correlated subquery (with OUTER APPLY
) and see which of the two indexes is used.
Both ideas are to coerce the optimizer to use a single index scan to gather the rolling sums instead of 4 (that both your plans do).
It would also be worth checking and comparing the two execution plans, when asking for all the columns and when asking for only the columns in the index:
using window functions:
-- window functions
SELECT
-- m.*,
movie_title, start_date_time,
price, quantity,
SUM(CASE WHEN price = 0 THEN quantity ELSE 0 END)
OVER
(PARTITION BY movie_title
ORDER BY start_date_time
RANGE BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) AS [Admissions Free],
SUM(CASE WHEN price > 0 THEN quantity ELSE 0 END)
OVER
(PARTITION BY movie_title
ORDER BY start_date_time
RANGE BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) AS [Admissions Paid],
SUM(CASE WHEN price > 0 THEN quantity * price ELSE 0 END)
OVER
(PARTITION BY movie_title
ORDER BY start_date_time
RANGE BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) AS [Admissions Revenue],
SUM(quantity)
OVER
(PARTITION BY movie_title
ORDER BY start_date_time
RANGE BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) AS [Total Admissions]
FROM
[movies] AS m ;
*: If there is a UNIQUE
constraint on (movie_title, start_date_time)
, then you could use ROWS
instead of RANGE
for the window frames (it's usually more efficient). From the comments, there is no such constraint and there could be many rows with same title and datetime, so RANGE
is required above.
using OUTER APPLY
:
-- using OUTER APPLY
SELECT
-- m.*,
m.movie_title, m.start_date_time,
m.price, m.quantity,
c.[Admissions Free],
c.[Admissions Paid],
c.[Admissions Revenue],
c.[Total Admissions]
FROM
[movies] AS m
OUTER APPLY
( SELECT
SUM(CASE WHEN i.price = 0 THEN i.quantity ELSE 0 END)
AS [Admissions Free],
SUM(CASE WHEN i.price > 0 THEN i.quantity ELSE 0 END)
AS [Admissions Paid],
SUM(CASE WHEN i.price > 0 THEN i.quantity * i.price ELSE 0 END)
AS [Admissions Revenue],
SUM(i.quantity)
AS [Total Admissions]
FROM [movies] AS i
WHERE i.movie_title = o.movie_title
AND i.start_date_time <= o.start_date_time
) AS c ;
This index may be a little better than the first one:
(
movie_title ASC,
start_date_time ASC
)
INCLUDE (price, quantity)
Thank you very much, this is indeed much much faster. Execution time for the first index is 3 sec while it's 4 sec with the second index. I added the execution plans to the post, btw. Can you maybe explain why this is so much faster?
– dakes
8 hours ago
I tested the query and while it is faster it's not what I need to show. I'll usePaid Admissions
as an example. Assume there are three shows starting at the same time. Your query computes differentPaid Admissions
for all three columns, but I need them to be the same, namely the sum of all paid admissions at that time. I assume this is because it looks at rows in the OVER part one after the other.
– dakes
6 hours ago
@dakes oh, I intended to add a note about that. Se the edit. (in short, useRANGE
notROWS
, for the window fames)
– ypercubeᵀᴹ
6 hours ago
I was torn which answer I should accept because @sepupic s answer actually answers my question, but this one was much more helpful in the long run. SO says "The bottom line is that you should accept the answer that you found to be the most helpful to you, personally.", so I accepted this one. I hope that's correct.
– dakes
5 hours ago
add a comment |
I agree with ypercubeᵀᴹ answer, the query should be rewritten.
Can you maybe explain why this is so much faster?
The query that use the second index
is faster only because it's executing in parallel
.
Try to add option(maxdop 1)
and the use of the first index will be faster.
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
dakes is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f230639%2fwhy-is-this-column-order-in-my-non-clustered-index-better-for-my-query%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
The first index does look like a better fit for the query. Please provide the actual execution plans.
I would try using window functions instead of the four correlated subqueries. Or a single correlated subquery (with OUTER APPLY
) and see which of the two indexes is used.
Both ideas are to coerce the optimizer to use a single index scan to gather the rolling sums instead of 4 (that both your plans do).
It would also be worth checking and comparing the two execution plans, when asking for all the columns and when asking for only the columns in the index:
using window functions:
-- window functions
SELECT
-- m.*,
movie_title, start_date_time,
price, quantity,
SUM(CASE WHEN price = 0 THEN quantity ELSE 0 END)
OVER
(PARTITION BY movie_title
ORDER BY start_date_time
RANGE BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) AS [Admissions Free],
SUM(CASE WHEN price > 0 THEN quantity ELSE 0 END)
OVER
(PARTITION BY movie_title
ORDER BY start_date_time
RANGE BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) AS [Admissions Paid],
SUM(CASE WHEN price > 0 THEN quantity * price ELSE 0 END)
OVER
(PARTITION BY movie_title
ORDER BY start_date_time
RANGE BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) AS [Admissions Revenue],
SUM(quantity)
OVER
(PARTITION BY movie_title
ORDER BY start_date_time
RANGE BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) AS [Total Admissions]
FROM
[movies] AS m ;
*: If there is a UNIQUE
constraint on (movie_title, start_date_time)
, then you could use ROWS
instead of RANGE
for the window frames (it's usually more efficient). From the comments, there is no such constraint and there could be many rows with same title and datetime, so RANGE
is required above.
using OUTER APPLY
:
-- using OUTER APPLY
SELECT
-- m.*,
m.movie_title, m.start_date_time,
m.price, m.quantity,
c.[Admissions Free],
c.[Admissions Paid],
c.[Admissions Revenue],
c.[Total Admissions]
FROM
[movies] AS m
OUTER APPLY
( SELECT
SUM(CASE WHEN i.price = 0 THEN i.quantity ELSE 0 END)
AS [Admissions Free],
SUM(CASE WHEN i.price > 0 THEN i.quantity ELSE 0 END)
AS [Admissions Paid],
SUM(CASE WHEN i.price > 0 THEN i.quantity * i.price ELSE 0 END)
AS [Admissions Revenue],
SUM(i.quantity)
AS [Total Admissions]
FROM [movies] AS i
WHERE i.movie_title = o.movie_title
AND i.start_date_time <= o.start_date_time
) AS c ;
This index may be a little better than the first one:
(
movie_title ASC,
start_date_time ASC
)
INCLUDE (price, quantity)
Thank you very much, this is indeed much much faster. Execution time for the first index is 3 sec while it's 4 sec with the second index. I added the execution plans to the post, btw. Can you maybe explain why this is so much faster?
– dakes
8 hours ago
I tested the query and while it is faster it's not what I need to show. I'll usePaid Admissions
as an example. Assume there are three shows starting at the same time. Your query computes differentPaid Admissions
for all three columns, but I need them to be the same, namely the sum of all paid admissions at that time. I assume this is because it looks at rows in the OVER part one after the other.
– dakes
6 hours ago
@dakes oh, I intended to add a note about that. Se the edit. (in short, useRANGE
notROWS
, for the window fames)
– ypercubeᵀᴹ
6 hours ago
I was torn which answer I should accept because @sepupic s answer actually answers my question, but this one was much more helpful in the long run. SO says "The bottom line is that you should accept the answer that you found to be the most helpful to you, personally.", so I accepted this one. I hope that's correct.
– dakes
5 hours ago
add a comment |
The first index does look like a better fit for the query. Please provide the actual execution plans.
I would try using window functions instead of the four correlated subqueries. Or a single correlated subquery (with OUTER APPLY
) and see which of the two indexes is used.
Both ideas are to coerce the optimizer to use a single index scan to gather the rolling sums instead of 4 (that both your plans do).
It would also be worth checking and comparing the two execution plans, when asking for all the columns and when asking for only the columns in the index:
using window functions:
-- window functions
SELECT
-- m.*,
movie_title, start_date_time,
price, quantity,
SUM(CASE WHEN price = 0 THEN quantity ELSE 0 END)
OVER
(PARTITION BY movie_title
ORDER BY start_date_time
RANGE BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) AS [Admissions Free],
SUM(CASE WHEN price > 0 THEN quantity ELSE 0 END)
OVER
(PARTITION BY movie_title
ORDER BY start_date_time
RANGE BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) AS [Admissions Paid],
SUM(CASE WHEN price > 0 THEN quantity * price ELSE 0 END)
OVER
(PARTITION BY movie_title
ORDER BY start_date_time
RANGE BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) AS [Admissions Revenue],
SUM(quantity)
OVER
(PARTITION BY movie_title
ORDER BY start_date_time
RANGE BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) AS [Total Admissions]
FROM
[movies] AS m ;
*: If there is a UNIQUE
constraint on (movie_title, start_date_time)
, then you could use ROWS
instead of RANGE
for the window frames (it's usually more efficient). From the comments, there is no such constraint and there could be many rows with same title and datetime, so RANGE
is required above.
using OUTER APPLY
:
-- using OUTER APPLY
SELECT
-- m.*,
m.movie_title, m.start_date_time,
m.price, m.quantity,
c.[Admissions Free],
c.[Admissions Paid],
c.[Admissions Revenue],
c.[Total Admissions]
FROM
[movies] AS m
OUTER APPLY
( SELECT
SUM(CASE WHEN i.price = 0 THEN i.quantity ELSE 0 END)
AS [Admissions Free],
SUM(CASE WHEN i.price > 0 THEN i.quantity ELSE 0 END)
AS [Admissions Paid],
SUM(CASE WHEN i.price > 0 THEN i.quantity * i.price ELSE 0 END)
AS [Admissions Revenue],
SUM(i.quantity)
AS [Total Admissions]
FROM [movies] AS i
WHERE i.movie_title = o.movie_title
AND i.start_date_time <= o.start_date_time
) AS c ;
This index may be a little better than the first one:
(
movie_title ASC,
start_date_time ASC
)
INCLUDE (price, quantity)
Thank you very much, this is indeed much much faster. Execution time for the first index is 3 sec while it's 4 sec with the second index. I added the execution plans to the post, btw. Can you maybe explain why this is so much faster?
– dakes
8 hours ago
I tested the query and while it is faster it's not what I need to show. I'll usePaid Admissions
as an example. Assume there are three shows starting at the same time. Your query computes differentPaid Admissions
for all three columns, but I need them to be the same, namely the sum of all paid admissions at that time. I assume this is because it looks at rows in the OVER part one after the other.
– dakes
6 hours ago
@dakes oh, I intended to add a note about that. Se the edit. (in short, useRANGE
notROWS
, for the window fames)
– ypercubeᵀᴹ
6 hours ago
I was torn which answer I should accept because @sepupic s answer actually answers my question, but this one was much more helpful in the long run. SO says "The bottom line is that you should accept the answer that you found to be the most helpful to you, personally.", so I accepted this one. I hope that's correct.
– dakes
5 hours ago
add a comment |
The first index does look like a better fit for the query. Please provide the actual execution plans.
I would try using window functions instead of the four correlated subqueries. Or a single correlated subquery (with OUTER APPLY
) and see which of the two indexes is used.
Both ideas are to coerce the optimizer to use a single index scan to gather the rolling sums instead of 4 (that both your plans do).
It would also be worth checking and comparing the two execution plans, when asking for all the columns and when asking for only the columns in the index:
using window functions:
-- window functions
SELECT
-- m.*,
movie_title, start_date_time,
price, quantity,
SUM(CASE WHEN price = 0 THEN quantity ELSE 0 END)
OVER
(PARTITION BY movie_title
ORDER BY start_date_time
RANGE BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) AS [Admissions Free],
SUM(CASE WHEN price > 0 THEN quantity ELSE 0 END)
OVER
(PARTITION BY movie_title
ORDER BY start_date_time
RANGE BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) AS [Admissions Paid],
SUM(CASE WHEN price > 0 THEN quantity * price ELSE 0 END)
OVER
(PARTITION BY movie_title
ORDER BY start_date_time
RANGE BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) AS [Admissions Revenue],
SUM(quantity)
OVER
(PARTITION BY movie_title
ORDER BY start_date_time
RANGE BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) AS [Total Admissions]
FROM
[movies] AS m ;
*: If there is a UNIQUE
constraint on (movie_title, start_date_time)
, then you could use ROWS
instead of RANGE
for the window frames (it's usually more efficient). From the comments, there is no such constraint and there could be many rows with same title and datetime, so RANGE
is required above.
using OUTER APPLY
:
-- using OUTER APPLY
SELECT
-- m.*,
m.movie_title, m.start_date_time,
m.price, m.quantity,
c.[Admissions Free],
c.[Admissions Paid],
c.[Admissions Revenue],
c.[Total Admissions]
FROM
[movies] AS m
OUTER APPLY
( SELECT
SUM(CASE WHEN i.price = 0 THEN i.quantity ELSE 0 END)
AS [Admissions Free],
SUM(CASE WHEN i.price > 0 THEN i.quantity ELSE 0 END)
AS [Admissions Paid],
SUM(CASE WHEN i.price > 0 THEN i.quantity * i.price ELSE 0 END)
AS [Admissions Revenue],
SUM(i.quantity)
AS [Total Admissions]
FROM [movies] AS i
WHERE i.movie_title = o.movie_title
AND i.start_date_time <= o.start_date_time
) AS c ;
This index may be a little better than the first one:
(
movie_title ASC,
start_date_time ASC
)
INCLUDE (price, quantity)
The first index does look like a better fit for the query. Please provide the actual execution plans.
I would try using window functions instead of the four correlated subqueries. Or a single correlated subquery (with OUTER APPLY
) and see which of the two indexes is used.
Both ideas are to coerce the optimizer to use a single index scan to gather the rolling sums instead of 4 (that both your plans do).
It would also be worth checking and comparing the two execution plans, when asking for all the columns and when asking for only the columns in the index:
using window functions:
-- window functions
SELECT
-- m.*,
movie_title, start_date_time,
price, quantity,
SUM(CASE WHEN price = 0 THEN quantity ELSE 0 END)
OVER
(PARTITION BY movie_title
ORDER BY start_date_time
RANGE BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) AS [Admissions Free],
SUM(CASE WHEN price > 0 THEN quantity ELSE 0 END)
OVER
(PARTITION BY movie_title
ORDER BY start_date_time
RANGE BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) AS [Admissions Paid],
SUM(CASE WHEN price > 0 THEN quantity * price ELSE 0 END)
OVER
(PARTITION BY movie_title
ORDER BY start_date_time
RANGE BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) AS [Admissions Revenue],
SUM(quantity)
OVER
(PARTITION BY movie_title
ORDER BY start_date_time
RANGE BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) AS [Total Admissions]
FROM
[movies] AS m ;
*: If there is a UNIQUE
constraint on (movie_title, start_date_time)
, then you could use ROWS
instead of RANGE
for the window frames (it's usually more efficient). From the comments, there is no such constraint and there could be many rows with same title and datetime, so RANGE
is required above.
using OUTER APPLY
:
-- using OUTER APPLY
SELECT
-- m.*,
m.movie_title, m.start_date_time,
m.price, m.quantity,
c.[Admissions Free],
c.[Admissions Paid],
c.[Admissions Revenue],
c.[Total Admissions]
FROM
[movies] AS m
OUTER APPLY
( SELECT
SUM(CASE WHEN i.price = 0 THEN i.quantity ELSE 0 END)
AS [Admissions Free],
SUM(CASE WHEN i.price > 0 THEN i.quantity ELSE 0 END)
AS [Admissions Paid],
SUM(CASE WHEN i.price > 0 THEN i.quantity * i.price ELSE 0 END)
AS [Admissions Revenue],
SUM(i.quantity)
AS [Total Admissions]
FROM [movies] AS i
WHERE i.movie_title = o.movie_title
AND i.start_date_time <= o.start_date_time
) AS c ;
This index may be a little better than the first one:
(
movie_title ASC,
start_date_time ASC
)
INCLUDE (price, quantity)
edited 6 hours ago
answered 8 hours ago
ypercubeᵀᴹypercubeᵀᴹ
76.9k11134214
76.9k11134214
Thank you very much, this is indeed much much faster. Execution time for the first index is 3 sec while it's 4 sec with the second index. I added the execution plans to the post, btw. Can you maybe explain why this is so much faster?
– dakes
8 hours ago
I tested the query and while it is faster it's not what I need to show. I'll usePaid Admissions
as an example. Assume there are three shows starting at the same time. Your query computes differentPaid Admissions
for all three columns, but I need them to be the same, namely the sum of all paid admissions at that time. I assume this is because it looks at rows in the OVER part one after the other.
– dakes
6 hours ago
@dakes oh, I intended to add a note about that. Se the edit. (in short, useRANGE
notROWS
, for the window fames)
– ypercubeᵀᴹ
6 hours ago
I was torn which answer I should accept because @sepupic s answer actually answers my question, but this one was much more helpful in the long run. SO says "The bottom line is that you should accept the answer that you found to be the most helpful to you, personally.", so I accepted this one. I hope that's correct.
– dakes
5 hours ago
add a comment |
Thank you very much, this is indeed much much faster. Execution time for the first index is 3 sec while it's 4 sec with the second index. I added the execution plans to the post, btw. Can you maybe explain why this is so much faster?
– dakes
8 hours ago
I tested the query and while it is faster it's not what I need to show. I'll usePaid Admissions
as an example. Assume there are three shows starting at the same time. Your query computes differentPaid Admissions
for all three columns, but I need them to be the same, namely the sum of all paid admissions at that time. I assume this is because it looks at rows in the OVER part one after the other.
– dakes
6 hours ago
@dakes oh, I intended to add a note about that. Se the edit. (in short, useRANGE
notROWS
, for the window fames)
– ypercubeᵀᴹ
6 hours ago
I was torn which answer I should accept because @sepupic s answer actually answers my question, but this one was much more helpful in the long run. SO says "The bottom line is that you should accept the answer that you found to be the most helpful to you, personally.", so I accepted this one. I hope that's correct.
– dakes
5 hours ago
Thank you very much, this is indeed much much faster. Execution time for the first index is 3 sec while it's 4 sec with the second index. I added the execution plans to the post, btw. Can you maybe explain why this is so much faster?
– dakes
8 hours ago
Thank you very much, this is indeed much much faster. Execution time for the first index is 3 sec while it's 4 sec with the second index. I added the execution plans to the post, btw. Can you maybe explain why this is so much faster?
– dakes
8 hours ago
I tested the query and while it is faster it's not what I need to show. I'll use
Paid Admissions
as an example. Assume there are three shows starting at the same time. Your query computes different Paid Admissions
for all three columns, but I need them to be the same, namely the sum of all paid admissions at that time. I assume this is because it looks at rows in the OVER part one after the other.– dakes
6 hours ago
I tested the query and while it is faster it's not what I need to show. I'll use
Paid Admissions
as an example. Assume there are three shows starting at the same time. Your query computes different Paid Admissions
for all three columns, but I need them to be the same, namely the sum of all paid admissions at that time. I assume this is because it looks at rows in the OVER part one after the other.– dakes
6 hours ago
@dakes oh, I intended to add a note about that. Se the edit. (in short, use
RANGE
not ROWS
, for the window fames)– ypercubeᵀᴹ
6 hours ago
@dakes oh, I intended to add a note about that. Se the edit. (in short, use
RANGE
not ROWS
, for the window fames)– ypercubeᵀᴹ
6 hours ago
I was torn which answer I should accept because @sepupic s answer actually answers my question, but this one was much more helpful in the long run. SO says "The bottom line is that you should accept the answer that you found to be the most helpful to you, personally.", so I accepted this one. I hope that's correct.
– dakes
5 hours ago
I was torn which answer I should accept because @sepupic s answer actually answers my question, but this one was much more helpful in the long run. SO says "The bottom line is that you should accept the answer that you found to be the most helpful to you, personally.", so I accepted this one. I hope that's correct.
– dakes
5 hours ago
add a comment |
I agree with ypercubeᵀᴹ answer, the query should be rewritten.
Can you maybe explain why this is so much faster?
The query that use the second index
is faster only because it's executing in parallel
.
Try to add option(maxdop 1)
and the use of the first index will be faster.
add a comment |
I agree with ypercubeᵀᴹ answer, the query should be rewritten.
Can you maybe explain why this is so much faster?
The query that use the second index
is faster only because it's executing in parallel
.
Try to add option(maxdop 1)
and the use of the first index will be faster.
add a comment |
I agree with ypercubeᵀᴹ answer, the query should be rewritten.
Can you maybe explain why this is so much faster?
The query that use the second index
is faster only because it's executing in parallel
.
Try to add option(maxdop 1)
and the use of the first index will be faster.
I agree with ypercubeᵀᴹ answer, the query should be rewritten.
Can you maybe explain why this is so much faster?
The query that use the second index
is faster only because it's executing in parallel
.
Try to add option(maxdop 1)
and the use of the first index will be faster.
answered 7 hours ago
sepupicsepupic
7,533819
7,533819
add a comment |
add a comment |
dakes is a new contributor. Be nice, and check out our Code of Conduct.
dakes is a new contributor. Be nice, and check out our Code of Conduct.
dakes is a new contributor. Be nice, and check out our Code of Conduct.
dakes is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f230639%2fwhy-is-this-column-order-in-my-non-clustered-index-better-for-my-query%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Could you provide execution plans for both indexing options?
– vonPryz
9 hours ago
1
Subqueries are too strange... It seems they can be converted to window-type SUM().
– Akina
9 hours ago
@SabinBio o is the outer table. I edited the question.
– dakes
9 hours ago
1
The key is here
Distinct movies: 51, distinct start_date_times: 8786
; some hints could be at the logical reads and in the execution plan(s). For an index, it's very important the first column and how selective it is– Sabin Bio
9 hours ago
2
Could you post a link to the individual plans via Paste The Plan? You can anonymise information using Sentry One's Plan Explorer if need be.
– hot2use
4 hours ago