Search code examples
google-analyticsgoogle-apigoogle-analytics-apisampling

Google Analytics reporting - wider data range filters out the result


I am trying to get a GA's client id stored in a custom dimension by using other custom dimension value filter. The problem is that I don't know why when I change start-date=2019-01-01 to start-date=2016-01-01 or start-date=2006-01-01 the result I get with start-date=2019-01-01 is gone. Why it happens? I would like to search for all the users. Is there other method just to find a user based on a dimension, I don't need any metrics.

https://ga-dev-tools.appspot.com/query-explorer/?start-date=2019-01-01&end-date=2019-01-28&metrics=ga%3Ausers&dimensions=ga%3Adimension16%2Cga%3Adimension65&filters=ga%3Adimension16%3D%3DUMM8SBTCS0U7HIZL&include-empty-rows=true

Java:

    DateRange dateRange = new DateRange();
    dateRange.setStartDate("2018-01-01");
    dateRange.setEndDate("2019-01-28");

    final Dimension euciDimension = new Dimension().setName("ga:dimension65");
    final Dimension gaDimension = new Dimension().setName("ga:dimension16");

     ReportRequest request = new ReportRequest()
            .setViewId(VIEW_ID)

            .setDimensions(Arrays.asList(euciDimension,gaDimension))
            .setDateRanges(Arrays.asList(dateRange))
            .setMetrics(Arrays.asList(sessionsMetrics)).setPageSize(1000).setIncludeEmptyRows(true)
            .setSamplingLevel("LARGE")
           .setFiltersExpression("ga:dimension16==XYZ");

    ArrayList<ReportRequest> requests = new ArrayList<ReportRequest>();
    requests.add(request);

    // Create the GetReportsRequest object.
    GetReportsRequest getReport = new GetReportsRequest()
            .setReportRequests(requests);

    // Call the batchGet method.
    GetReportsResponse response = service.reports().batchGet(getReport).execute();

Solution

  • Sampling

    About data sampling

    In data analysis, sampling is the practice of analyzing a subset of all data in order to uncover the meaningful information in the larger data set. For example, if you wanted to estimate the number of trees in a 100-acre area where the distribution of trees was fairly uniform, you could count the number of trees in 1 acre and multiply by 100, or count the trees in a half acre and multiply by 200 to get an accurate representation of the entire 100 acres.

    There is no way to disable data sampling in Google analytics api or website. The only way to get around it is to use smaller date ranges. Sampling for the last 12 years will likely always result in sampling unless well you started your site less than a year ago.

    You can check the response to see if your data is sampled and then just reduce the number of data until it stops showing up sampled.

    Note Big query: you can export the data to a big query account if you have access to that it removes the sampling.

    Missing data

    If you only started sending a custom dimension yesterday then the data for last week does not contain any values for this custom dimension so the data will not be returned. There is no way to do analytics on against data that did not exist at that time.