Search code examples
data-processingdimensional-modeling

Not quite understand a concept in Kimball's dimensional modeling


I have read through the idea "Behavior Tag Time Series" several times but couldn't understand

Here is the explanation in the book, but still not make sense: "Almost all text in a data warehouse is descriptive text in dimension tables. Data mining customer cluster analyses typically results in textual behavior tags, often identified on a periodic basis. In this case, the customers’ behavior measurements over time become a sequence of these behavior tags; this time series should be stored as positional attributes in the customer dimension, along with an optional text string for the complete sequence of tags. The behavior tags are modeled in a positional design because the behavior tags are the target of complex simultaneous queries rather than numeric computations."

Can anyone has this knowledge give me an easy example in daily data process work?


Solution

  • I think that all this is saying is that you decide how many behavioural attributes you are going to track per customer and you create a column for each one - with potentially a single column that consolidates them all (and possibly includes additional attributes that are not being tracked individually. So if you had 10 behavioural attributes being tracked (BH1-10) and another 2 not being individually tracked, your customer dimension might look something like this:

    CUSTOMER_SK BH1 BH2 BH3 BH4 BH5 BH6 BH7 BH8 BH9 BH10 BH_SUMMARY
    1234 A B C D E F G H I J ABCDEFGHIJ-1-2
    5678 P Q R S T U V W X Y PQRSTUVWXY-9-7