Search code examples
reactjsgraphqlgatsby

How to get updated / lastmod value for static files for sitemap Gatsby


I've been using Gatsby and have been trying to create a sitemap with lastmod values for static pages (src/pages). I saw a random code snippet in which someone ran the query below within his gatsby-config.js and was able to get the date he last modified them.

allSitePage {
  nodes {
    path
    context {
      updated
    }
  }
}

I've not been able to achieve the same feat.

This is what I've tried so far. I've assumed he was using a context manager and set context within his js files and updating the value of the context manually every time he edited the files.

const Updated = React.createContext('2021-11-29')

class IndexPage extends React.Component {
  render() {
    return (
      <div>
        {/* Example */}
      </div>
    )
  }
}

/* Also tried IndexPage.contextType = Updated */
IndexPage.useContext = Updated

export default IndexPage

I've ran the query again, but have not been able to pass the value to be seen within the graphql query. This is the query I ran in the Graphql playground.

query MyQuery {
  allSitePage {
    nodes {
      id
      context {
        updated
      }
    }
  }
}

This is what my whole data structure looks like within the Graphql playground. data structure of graphql playground specifically allSitePage

How would I be able to get / set a updated value to be used in gatsby-config.js when creating a sitemap?


Solution

  • 1. Use gatsby-source-filesystem to add the static page files to your filesystem so you can query them.

    In gatsby-config.js add the following:

    {
      resolve: "gatsby-source-filesystem",
      options: {
        name: "pages",
        path: "./src/pages/",
      },
    },
    

    Note: Whatever you use for the name property here will become what you filter on in the query.

    2. Add gatsby-transformer-gitinfo to your project to add some git information on File fields from the latest commit. (plugin docs)

    Just declare it underneath step 1 like so:

    {
      resolve: "gatsby-source-filesystem",
      options: {
        name: "pages",
        path: "./src/pages/",
      },
    },
    `gatsby-transformer-gitinfo`,
    

    We use this plugin because any of the modifiedTime/mtime/changeTime/ctime fields in Gatsby's File nodes are overwritten at deploy time because git does not record or keep file timestamp metadata.

    3. Now we have all the necessary data added to GraphQL and can query it to see what it looks like:

    query MyQuery {
      site {
        siteMetadata {
          siteUrl
        }
      }
      allSitePage {
        nodes {
          path
        }
      }
      allFile(filter: {sourceInstanceName: {eq: "pages"}}) {
        edges {
          node {
            fields {
              gitLogLatestDate
            }
            name
          }
        }
      }
    }
    

    Note: sourceInstanceName should equal whatever you set for options.name in step 1.

    The query result should look something like this:

    {
      "data": {
        "site": {
          "siteMetadata": {
            "siteUrl": "https://www.example.com"
          }
        },
        "allSitePage": {
          "nodes": [
            {
              "path": "/dev-404-page/"
            },
            {
              "path": "/404/"
            },
            {
              "path": "/404.html"
            },
            {
              "path": "/contact/"
            },
            {
              "path": "/features/"
            },
            {
              "path": "/"
            },
            {
              "path": "/privacy/"
            },
            {
              "path": "/terms/"
            }
          ]
        },
        "allFile": {
          "edges": [
            {
              "node": {
                "fields": {
                  "gitLogLatestDate": "2021-12-09 23:18:29 -0600"
                },
                "name": "404"
              }
            },
            {
              "node": {
                "fields": {
                  "gitLogLatestDate": "2021-12-09 23:18:29 -0600"
                },
                "name": "contact"
              }
            },
            {
              "node": {
                "fields": {
                  "gitLogLatestDate": "2021-12-09 23:18:29 -0600"
                },
                "name": "privacy"
              }
            },
            {
              "node": {
                "fields": {
                  "gitLogLatestDate": "2021-12-07 19:11:12 -0600"
                },
                "name": "index"
              }
            },
            {
              "node": {
                "fields": {
                  "gitLogLatestDate": "2021-12-09 23:18:29 -0600"
                },
                "name": "terms"
              }
            },
            {
              "node": {
                "fields": {
                  "gitLogLatestDate": "2021-12-09 23:18:29 -0600"
                },
                "name": "features"
              }
            }
          ]
        }
      },
      "extensions": {}
    }
    

    4. Now let's put it all together in gatsby-config.js:

    {
      resolve: "gatsby-source-filesystem",
      options: {
        name: "pages",
        path: "./src/pages/",
      },
    },
    `gatsby-transformer-gitinfo`,
    {
      resolve: "gatsby-plugin-sitemap",
      options: {
        query: `{
          site {
            siteMetadata {
              siteUrl
            }
          }
          allSitePage {
            nodes {
              path
            }
          }
          allFile(filter: {sourceInstanceName: {eq: "pages"}}) {
            edges {
              node {
                fields {
                  gitLogLatestDate
                }
                name
              }
            }
          }
        }`,
        resolvePages: ({
          allSitePage: { nodes: sitePages },
          allFile: { edges: pageFiles }
        }) => {
          return sitePages.map(page => {
            const pageFile = pageFiles.find(({ node }) => {
              const fileName = node.name === 'index' ? '/' : `/${node.name}/`;
              return page.path === fileName;
            });
    
            return { ...page, ...pageFile?.node?.fields }
          })
        },
        serialize: ({ path, gitLogLatestDate }) => {
          return {
            url: path,
            lastmod: gitLogLatestDate
          }
        },
        createLinkInHead: true,
      },
    }
    

    Explanation of the gatsby-plugin-sitemap options:

    • query is self-explanatory

    • resolvePages

      • takes in the result from the query, so we can use destructuring to assign the allSitePage and allFile nodes/edges to variables.
      • mapping through the sitePages (allSitePage.nodes), use the page.path to find the matching file name.
        • if the file name is index use / for matching since the path is not /index/, otherwise just wrap the file name in separators.
      • return the merged page object and matching file object that contains the git info
      • (the resulting array of objects from resolvePages becomes the allPages object in the plugin's code)
    • serialize

      • This function maps through allPages aka the array of page objects from resolvePages, so we can again just use destructuring to grab each page's path and gitLogLatestDate.
      • Return properties:
        • url: The plugin will use siteMetaData.siteUrl from the query and append whatever value is assigned here to generate the full URL.

        • lastmod: Expects a valid time value and always outputs a full ISO8601 string regardless of the format passed in.

          Note: Currently, there doesn't seem to be a way to format lastmod as date only through gatsby-plugin-sitemap. This is because gatsby-plugin-sitemap uses this sitemap library under the hood and while some of that library's components have a lastmodDateOnly flag that will chop off the time string, gatsby-plugin-sitemap does not use them.

    • createLinkInHead: links the sitemap index in your site's head

    5. Finally, run gatsby build and you should see the result show up in public/sitemap/sitemap-0.xml

    (Seems like gatsby-plugin-sitemap recently switched to this location, so could be elsewhere if you're using an older version of the plugin.)

    Result should look like this:

    <?xml version="1.0" encoding="UTF-8"?>
    <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:news="http://www.google.com/schemas/sitemap-news/0.9"
            xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"
            xmlns:video="http://www.google.com/schemas/sitemap-video/1.1">
        <url>
            <loc>https://www.example.com/contact/</loc>
            <lastmod>2021-12-10T05:18:29.000Z</lastmod>
        </url>
        <url>
            <loc>https://www.example.com/features/</loc>
            <lastmod>2021-12-10T05:18:29.000Z</lastmod>
        </url>
        <url>
            <loc>https://www.example.com/</loc>
            <lastmod>2021-12-08T01:11:12.000Z</lastmod>
        </url>
        <url>
            <loc>https://www.example.com/privacy/</loc>
            <lastmod>2021-12-10T05:18:29.000Z</lastmod>
        </url>
        <url>
            <loc>https://www.example.com/terms/</loc>
            <lastmod>2021-12-10T05:18:29.000Z</lastmod>
        </url>
    </urlset>
    

    Extra, possibly helpful details - versions of everything I'm using:

    "gatsby": "^4.0.0",
    "gatsby-plugin-sitemap": "^5.3.0",
    "gatsby-source-filesystem": "^4.3.0",
    "gatsby-transformer-gitinfo": "^1.1.0",