Search code examples
ssasmdx

Why would you want to use Process Data vs Process Full for processing a partition


As stated here : Process data will process data only without building aggregations or indexes. If there is data in the partition, it will be dropped before re-populating the partition with source data. On the other hand, for Process Full Analysis Services will also drop all data in the partition, and then process the partition. So, if attribute hierarchies connected to the partition are set to AttributeHierarchyOptimizedState = true Analysis Services will build indexes for the attribute hierarchy to improve query performance. I was able to confirm this behavior by running following query against Adventure Works database:

SELECT
DIMENSION_NAME, ATTRIBUTE_NAME, ATTRIBUTE_INDEXED,
ATTRIBUTE_COUNT_MIN, ATTRIBUTE_COUNT_MAX
FROM SystemRestrictSchema($system.discover_partition_dimension_stat
        ,DATABASE_NAME = 'AdventureWorksDW2014Multidimensional-EE'
        ,CUBE_NAME = 'Adventure Works'
        ,MEASURE_GROUP_NAME = 'Internet Sales'
        ,PARTITION_NAME = 'Internet_Sales_2013') 

When I Process data ATTRIBUTE_INDEXED column is set to false for all attributes. When I do process full ATTRIBUTE_INDEXED column is set to true for the most of the attributes.

So I was wondering why would someone want to process data only, and hence sacrifice query performance, to gain what? Thanks.


Solution

  • In a nutshell -- to reduce cube downtime and cube preparation time. At cost, as you stated, query performance degradation in period between ProcessData and ProcessIndex.
    Processing usually takes cube offline, at least on committing changes, which can take considerable amount of time. To reduce cube offline time due to processing, especially on big cubes, partitions (parts) of cube with updated data are processed with Process Date option. Having completed this processing, new data in cube is available for queries, but with performance penalty. After that Process Indexes on affected partitions is started, which builds all indexes and aggregations on the objects.
    From my experience, it is used and has sense on big projects only. Another finding - running Process Default on cube builds missing indexes and aggregations without specifying exact measure group etc.