Search code examples
analyticskeen-io

Page metadata in keen.io


I have a question around best practices for attaching metadata to our keen.io pageview events. Internally we use 3 different keyword categories to identify a piece of content, and those keywords live in tags on every page. A good example would be something like this:

<meta name="namespace:tier1" content="Programming" />
<meta name="namespace:tier2" content="Web Development, Web Operations" />
<meta name="namespace:tier3" content="JavaScript, Analytics, jQuery, HTML, CSS" />

We want to be able to segment our users based on those tiers, and do queries like this:

  • See all traffic segmented by tier1 keywords
  • See the most popular tier2 keywords that belong to a specific tier1 keyword
  • ... and so on.

Here's my question: It seems like we could just send this metadata along with the pageview event, but we'll end up having a lot of redundant data that could live in a separate place. For example, if we scraped the keywords every day for our pages, we could index them by URL, and not have all that duplicate meta data in keen.io.

How would you approach this? Am I stuck in SQL land, and should I just don't worry about the duplicate data?

A related question is that our keywords are basically lists, and the keen.io documentation says that we should stay away from lists. Would I need to create a Metadata event for every single word then? Seems like a bit of overkill to send +10 requests on every pageview.


Solution

  • Short answer – don’t worry about the duplication. When it comes to event data, denormalization is your friend. Keen's query interface is designed to be the most powerful when each event contains a lot of properties – effectively the state of the world at that time.

    Michelle wrote a guide to thinking about event data that contrasts it with relational data. Many of us (including me) have been stuck in SQL land before and have found this guide to be helpful :)

    As far as lists - it's mostly lists of objects that you want to avoid. In this case your list is one of strings, so you can still do a fair amount of querying against that property.

    For more information about Keen & lists of objects check out this SO question: Nested JSON Objects In Keen IO.