Hi I am having a little trouble trying to find a simple explanation for bind aware cursor matching in oracle.. Is bind aware cursor matching basically having Oracle monitor a query with a bind variable over time and seeing if there's an increase in CPU when using some variables. Then from doing this it almost generates a more suitable execution plan say a full table scan then marks the query as bind aware then the next time the query is executed there is a choice of two execution plans? Any help will be greatly appreciated! Cheers!
In the simplest case, imagine that you have an ORDERS
table. In that table is a status
column. There are only a handful of status
values and some are very, very popular while others are very rare. Imagine that the table has 10 million rows. For our purposes, say that 93% are "COMPLETE", 5% are "CANCELLED", and the remaining 2% are spread between a 8 different statuses that track the order flow (INCOMPLETE, COMPLETE, IN FULFILLMENT, IN TRANSIT, etc.).
If you have the most basic statistics on your table, the optimizer knows that there are 10 million rows and 10 distinct statuses. It doesn't know that some status
values are more popular than others so it guesses that each status corresponds to 1 million rows. So when it sees a query like
SELECT *
FROM orders
WHERE status = :1
it guesses that it needs to fetch 1 million rows from the table regardless of the bind variable value so it decides to use a full table scan.
Now, a human comes along wondering why Oracle is being silly and doing a full table scan when he asks for the handful of orders
that are in an IN TRANSIT status-- clearly an index scan would be preferable there. That human realizes that the optimizer needs more information in order to learn that some status
values are more popular than others so that human decides to gather a histogram (there are options that cause Oracle to gather histograms on certain columns automatically as well but I'm ignoring those options to try to keep the story simple).
Once the histogram is gathered, the optimizer knows that the status
value is highly skewed-- there are lots of COMPLETED orders but very few IN TRANSIT orders. If it sees a query that is using literals rather than bind variables, i.e.
SELECT *
FROM orders
WHERE status = 'IN TRANSIT'
vs
SELECT *
FROM orders
WHERE status = 'COMPLETED'
then it is very easy for the optimizer to decide to use an index in the first case and table scan in the second. When you have a bind variable, though, the optimizer's job is more difficult-- how is it supposed to determine whether to use the index or to do a table scan...
Oracle's first solution was known as "bind variable peeking". In this approach, when the optimizer sees something like
SELECT *
FROM orders
WHERE status = :1
where it knows (because of the histogram on status
) that the query plan should depend on the value passed in for the bind variable, Oracle "peeks" at the first value that is passed in to determine how to optimize the statement. If the first bind variable value is 'IN TRANSIT', an index scan will be used. If the first bind variable value is 'COMPLETE`, a table scan will be used.
For a lot of cases, this works pretty well. Lots of queries really only make sense for either very popular or very rare values. In our example, it's pretty unlikely that anyone would ever really want a list of all 9 million COMPLETE orders but someone might want a list of the couple thousand orders in one of the various transitory states.
But bind variable peeking doesn't work well in other cases. If you have a system where the application sometimes binds very popular values and sometimes binds very rare values, you end up with a situation where application performance depends heavily on who happens to run a query first. If the first person to run the query uses a very rare value, the index scan plan will be generated and cached. If the second person to run the query uses the very common value, the cached plan will be used and you'll get an index scan that takes forever. If the roles are reversed, the second person uses the rare value, gets the cached plan that does a full table scan, and has to scan the entire table to get the couple hundred rows they're interested in. This sort of non-deterministic behavior tends to drive DBAs and developers mad because it can be maddingly hard to diagnose and can lead to rather odd explanations-- Tom Kyte has an excellent example of a customer that concluded they needed reboot the database in the afternoon if it rained Monday morning.
Bind aware cursor matching is the solution to the bind variable peeking problem. Now, when Oracle sees the query
SELECT *
FROM orders
WHERE status = :1
and sees that there is a histogram on status
that indicates that some values are more common than others, it is smart enough to make that cursor "bind aware". That means that when you bind a value of IN FULFILLMENT, the optimizer is smart enough to conclude that this is one of the rare values and give you the index plan. When you bind a value of COMPLETE, the optimizer is smart enough to conclude that this is one of the common values and give you the plan with the table scan. So the optimizer now knows of two different plans for the same query and when it sees a new bind value like IN TRANSIT, it checks to see whether that value is similar to others that it has seen before and either gives you one of the existing plans or creates another new plan. In this case, it would decide that IN TRANSIT is roughly as common as IN FULFILLMENT so it re-uses the plan with the index scan rather than generating a third query plan. This, hopefully, leads to everyone getting their preferred plan without having to generate and cache query plans every time a bind variable value changes.
Of course, in reality, there are lots of additional caveats, corner cases, considerations, and complications that I'm intentionally (and unintentionally) glossing over here. But that's the basic idea of what the optimizer is trying to accomplish.