Search code examples
sql-serverssisssas

Populate fact table with foreign keys


I'm working on a project where I need to analyze Apache logs using SSAS. I've already loaded data into temporary table. I created dimension tables (primary key and attibute_name), empty fact table (foreign keys for each dimension table and fact_attribute) and created relations between them. Then I split data from that table into dimension tables using

INSERT INTO DimIP (IP) SELECT DISTINCT RemoteHostName FROM tmp

...and so on.

Now I need to populate Fact table with foreign keys, but I don't have any idea how to do this with single query. I tried something like this:

INSERT INTO Facts (DimDateID, DimIPID, DimRefererID, DimRequestID, DimStatusCodeID, DimUserAgentID)
SELECT DimDate.ID WHERE (DimDate.Data = tmp.DateTime)
SELECT DimIP.ID WHERE (DimIP.IP = tmp.RemoteHostName)
SELECT DimReferer.ID WHERE (DimReferer.Referer = tmp.Referer)
SELECT DimRequest.ID WHERE (DimRequest.Request = tmp.Request)
SELECT DimStatusCode.ID WHERE (DimStatusCode.StatusCode = tmp.StatusCode)
SELECT DimUserAgent.ID WHERE (DimUserAgent.UserAgent = tmp.UserAgent)

But it doesn't work (it says insert list contains fewer items than select list), probably I can't use such syntax.

I tried doing it one by one, like this:

INSERT INTO Facts (DimDateID)
SELECT DimDate.ID WHERE (DimDate.Data = tmp.DateTime)

But sometimes it says that other column can't be NULL (ex. DimUserAgentID), so query fails, sometimes it executes query, says "806000 rows affected" but nothing is inserted.

I will appreciate your help, cause I already ripped half of my hair from my head and don't know how the way to populate fact table with foreign keys from dimension tables.


Solution

  • I believe what you need to do is reference those other tables in your query. Below I use the tmp as the main driver of the query and then attempted to look up the resulting ID based on the logic you provided. Those lookups are via LEFT OUTER JOINs which implies the relationship may not be there in which case NULL will go into your fact table. If you'd rather have the row filtered out of hitting the fact table, substitute an INNER JOIN for all of the occurrences. I also assumed your tables were all in dbo schema.

    INSERT INTO
        dbo.Facts 
    (
        DimDateID
    ,   DimIPID
    ,   DimRefererID
    ,   DimRequestID
    ,   DimStatusCodeID
    ,   DimUserAgentID
    )
    SELECT
        DimDate.ID 
    ,   DimIP.ID 
    ,   DimReferer.ID
    ,   DimRequest.ID 
    ,   DimStatusCode.ID
    ,   DimUserAgent.ID 
    FROM
        TMP T
        LEFT OUTER JOIN
            dbo.DimDate 
            ON DimDate.Data = T.DateTime
        LEFT OUTER JOIN
            dbo.DimIP
            ON DimIP.IP = T.RemoteHostName
        LEFT OUTER JOIN
            dbo.DimReferer
            ON DimReferer.Referer = T.Referer
        LEFT OUTER JOIN
            dbo.DimRequest
            ON DimRequest.Request = T.Request
        LEFT OUTER JOIN
            dbo.DimStatusCode
            ON DimStatusCode.StatusCode = T.StatusCode
        LEFT OUTER JOIN
            dbo.DimUserAgent
            ON DimUserAgent.UserAgent = T.UserAgent
    

    Finally, it seems you're missing something measurable, unless you're just counting rows in the Facts table.