Search code examples
androidandroid-contentresolver

Debugging slow ContentResolver query calls spending lots of time in "context switch"


Occasionally Android system calls, such as ContentResolver.query(...), can take 1000x times longer than normal, for example ~10 seconds for a single call that might normally complete in ~10 milliseconds. My question is how to identify cases like this, understand why it's happening and resolve the problem.

The example below is retrieving the "user profile" on an Android system. On my Nexus 6 it normally takes ~10 milliseconds to do this. Older devices were similarly quick. Occasionally though it can 20+ seconds. Using traceview, it shows that virtually all the time is spent in "context switch":

enter image description here

My device unusually has ~7,000 contacts. For the "user profile" that I'm requesting, the result is only a single row. It's requested using Uri.withAppendedPath(Profile.CONTENT_URI, Data.CONTENT_DIRECTORY) - which I presume would be a common, optimized case. At the moment, the queries are done on the main thread, which may be complicating things... the Android documentation mentions that "you should do queries asynchronously on a separate thread". I will migrate to that in the future but I'm skeptical of that being the cause as it was working fine previously.

Another surprising factor is the strange consistency. I have multiple queries in the app. For a sequence of runs, the "user profile" will be consistently slow then start being fast again and I can't reproduce the issue. Likewise other contact queries that aren't the user profile, will be fast and then become slow. For the past six months, all the queries have been fast through the Nexus 5 and Nexus 6 devices - this only appeared in the last week.

The query:

Uri uri = Uri.withAppendedPath(Profile.CONTENT_URI, Data.CONTENT_DIRECTORY),
Log.d(TAG, "getOwner: " + uri.toString());
Cursor cur = mResolver.query(
    uri,
    PROJECTION,  // 6 columns
    null,
    null,
    Data.IS_PRIMARY + " DESC");
Log.d(TAG, "getOwner: query count -  " + cur.getCount());

Log Output:

// NOTE 18 second gap between log entries
19:20:33.134 D/ConnectActivity﹕ getOwner: URI: content://com.android.contacts/profile/data
19:20:51.779 D/ConnectActivity﹕ getOwner: query count -  1

I still find this behaviour surprising and don't understand why it occurs.


Solution

  • Levels of debugging and improvements (the last two are specifically for slow "context switch"):

    1. Query Improvements: query Uri, Projection size and column indices caching more details
    2. Query Filtering: reduce size of the query result through better filtering
    3. Asynchronous: slow queries shouldn't block the UI thread, use
    4. Observe Logs: around the slowdown to see if it has any clues more details on logs
    5. Traceview: to identify where the call is spending time
    6. Context Switch: explanation

    Query Improvements & Filtering

    In mosts cases, querying the right Uri, reducing the number of columns in the projection and caching column indices are the primary improvements: Getting name and email from contact list is very slow

    This example is for obtaining the name from the user profile:

    Cursor cur = mResolver.query(
        // Uri will only return user profile rather than everything in contacts
        Uri.withAppendedPath(Profile.CONTENT_URI, Data.CONTENT_DIRECTORY),
        // Projection only has 2 columns to reduce the data returned
        new String[] {
                Data.MIMETYPE,
                ContactsContract.Data.DISPLAY_NAME};
        // Filter matches on specific mimetype to reduce result size
        Data.MIMETYPE + "=?",
        // Filter matches on specific mimetype to reduce result size
        new String[]{StructuredName.CONTENT_ITEM_TYPE},
        // No sorting may improve performance
        null);
    

    Asynchronous

    The Android Documentation suggests using a CursorLoader:

    For the sake of clarity, the code snippets in this section call ContentResolver.query() on the "UI thread"". In actual code, however, you should do queries asynchronously on a separate thread. One way to do this is to use the CursorLoader class, which is described in more detail in the Loaders guide. Also, the lines of code are snippets only; they don't show a complete application.

    Observe Logs

    You should look at lot around the slowdown to both quantify it and see if it has only clues to the cause. ContentResolver.query(...) calls often take 10-100 milliseconds and anything within that range may be improved as above but is fairly expected. In the range 1-10 seconds, this may be caused by a context switch to other processes. When the context switch occurs, ART (default Runtime in Android 5.0), will output a log line indicating the context switch and the time taken for it:

    Traceview

    Android's traceview can be used to identify where time is being spent. The basic usage is to wrap the method you wish to profile as follows:

    Debug.startMethodTracing("trace-name");
    // Code to profile
    Debug.stopMethodTracing();
    

    Next pull the logs from the device (I believe this is the default path):

    adb pull /storage/emulated/legacy/dmtrace.trace
    

    Android Device Monitor can be used to open the file and view. From Android Studio, select Tools -> Android -> Android Device Monitor, then open the file. Often it is helpful to rearrange the Incl Real Time and Incl Cpu Time columns to the far left. These show the time taken by a method (and it's children methods) both as cpu time and real time (which also including blocking IO calls that aren't measured with cpu time in the app process - more details). The most suspicious entries are where Incl Real Time vastly exceeds the Incl Cpu Time. That means that means the method took a long time but didn't do much in your code, so it must have been blocked by something outside of the app process.

    You can see that a list methods and the time for each. Starting at "0 (top level)" click the triangle to see the "children" and open that. Each child will list a method number that you can use to find the method elsewhere in traceview and looks at their children, descending through the hierarchy. Looking for cases where Incl Real Time >> Incl Cpu Time, you will often find it ends with "(context switch)". This is the long blocking event that's slowing down your process.

    traceview of ContentResolver.query with slows context switch

    Context Switch

    Wikipedia description:

    In computing, a context switch is the process of storing and restoring the state (context) of a process or thread so that execution can be resumed from the same point at a later time. This enables multiple processes to share a single CPU and is an essential feature of a multitasking operating system. What constitutes the context is determined by the processor and the operating system.

    In the case of Android, this indicates the OS has chosen to preserve the state of your app, remove it from the CPU so that it can execute another process. When the context switch is over, the process may be reinstated on the CPU and resume execution.