Search code examples
sitefinitysitefinity-5

Where in the Sitefinity database is content stored?


I've successfully migrated 1,000s of news items and other content from Sitefinity 5 to Wordpress after hours of excruciating analysis and sheer luck with guessing but have a few items that are still left over. Specifically the pages. I know a lot of the content is stored in very obscure ways but there has to be somebody who has done this before and can steer me in the right direction.

My research (and text-search against the DB) has found the page titles etc but when I search the content I get nothing. My gut tells me that the content is being stored in binary form, can anyone confirm if this is the case?

Sitefinity documentation is only helpful if you're a .net developer who has a site set up in Visual Studio (as far as I've seen).


Solution

  • This is probably the most obfuscated manner of storing content that I've ever encountered. After performing text searches against the database I've finally found where the content is stored but it's not a simple process to get it out.

    Pages' master record appears to be sf_page_node, there are related tables:

    • sf_object_data (page_id is related to sf_page_node.content_id)
    • sf_draft_pages (page_id is related to sf_page_node.content_id)
    • sf_page_data (content_id is related to sf_page_node.content_id)
    • sf_control_properties (control_id is related to sf_object_data.id)

    So you could get the info you need with a query like this:

    select * from 
    [sf_page_node]
    join sf_object_data on sf_page_node.content_id = sf_object_data.page_id
    join sf_control_properties on sf_object_data.id = sf_control_properties.control_id
    

    Other things to consider:

    • the parent_id field is related to the sf_page_node table, so if you're writing a script, be sure to query this as well
    • the page may have a banner image, you will pick up the "place_holder" value as 'BannerHolder' with a caption of "Image" The image may be stored as blobs in sf_media_content, you should handle this separately. The "nme" value of 'ImageId' will have a GUID in the "val" column. You can query sf_media_content with this value as "content_id" the actual binary data is stored in sf_chunks, they relate on "file_id"

    My revised query taking into account what I'll need to migrate content is below:

    select 
        original.content_id,
        original.url_name_,
        original.title_,
        parent.id,
        parent.url_name_,
        parent.title_,
        place_holder,
        sf_object_data.caption_,
        sf_control_properties.nme,
        val
    from [sf_page_node] original
    join sf_object_data on original.content_id = sf_object_data.page_id
    join sf_control_properties on sf_object_data.id = sf_control_properties.control_id
    join sf_page_node parent on original.parent_id = parent.id
    

    I hope this helps someone!