General Database Question - Indexed ID field Performance vs Limit (Top) Performance (Salesforce)

I am troubleshooting (or at least identifying) possible causes of poor query performance within our Saleforce org. I suspect my question, however, will be applicable to most database engines.

We have fairly large volumes of data within our Salesforce org. Our problematic objects have rouhgly 15M records. We built an API on top of our org which pulls back data and we are really struggling with performance. Please know, that we understand there are things that can be done in the long term, but my question is more about just understanding the underlying query engine for databases.

The issue is this: We can run a query:

Select X From Account Where [IndexedField] Like 'value%' LIMIT 1000

for some reason, that query will outperform a query such as:

Select X From Account Where [Id] = 'IdValue'

both fields in the where clause are indexed, but for some reason, it takes a while for a single Id filter to return the result, where you might get the top X records faster.

Understanding that SFDC uses a hybrid EAV approach to their data, but their data is stored in an Oracle DB, I suspect this has to do with the generic way that Oracle performs their search.

Why might it take longer to find a single record by ID, than it would to find [X] records that meet a given criteria? I have looked at the query plans, both have low costs and obviously the ID has the lowest cost.

Need to explain to our UI guys the impact of searching across 15M records to the best of my ability.

Solution

The answer could be a number of things. Are the table statistics up to date? What hardware is the database running on? Performance tuning of an Oracle database on an X7 Exadata machine will be different compared to one running on an old V2. The version of Oracle you're running also has an impact on this, as there differences in the optimizer. Different versions of Oracle can handle identical queries in different ways, even if they are on identical hardware.

Why might it take longer to find a single record by ID, than it would to find [X] records that meet a given criteria? I have looked at the query plans, both have low costs and obviously the ID has the lowest cost.

Three words: Full Table Scan. Does your query gathering a single ID require a full table scan? The explain plan will tell you. A query that requires a full table scan can take longer than a query that takes advantage of an index scan. However, the inverse can also be true depending on your environment. The first query that gathers a range of values is obviously utilizing an index. Why isn't the second query also using that indexed column? That is likely part of the equation.

Here is a great AskTom answer that touches on this subject and explains in detail how index scans work, albeit the question is from the opposite perspective of why index scans can be slower than full table scans.

https://asktom.oracle.com/pls/asktom/asktom.search?tag=blocks-read-using-index-vs-full-table-scan