Last updated on March 26, 2023

Generate a Dynamic Sitemap with Next.js

Building a sitemap for your web application is one of the most important things to consider if you want to improve the discoverability of your website for search engines like Google, Yahoo, and Bing. If you haven't heard of it yet, a sitemap is a blueprint that contains all of the URLs of your web application and it helps the search engines understand your website's structure so it can be crawled and indexed for everyone to find your content on the web.

For the past couple of months, I have been generating my sitemap manually by adding to it every new page that I develop for my website. Even though this is a very simple task, I have made a couple of mistakes in the process like writing the URL incorrectly or even forgetting to add a new URL into the sitemap.

Now, what would happen when your web application starts to scale? This process of adding the new page URL manually to the sitemap will start to become a nightmare and could lead to even more mistakes than it did before. Because of this, I have decided to build a simple dynamic sitemap generator for my web application.

Sitemap Structure Review

Before we build our sitemap generator, we should understand the basic structure of a sitemap.xml file. This file should always have the .xml format and it should be accessible through the route /sitemap.xml which is the location where search engines expect our sitemap to be. Let's now look into an example of a basic sitemap.xml file:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>http://www.example.com</loc>
    <lastmod>2021-11-10</lastmod>
  </url>
  <url>
    <loc>http://www.example.com/first</loc>
    <lastmod>2021-11-25</lastmod>
  </url>
</urlset>

Here, we will find first our sitemap protocol definition, and then we have the <urlset></urlset> tag which contains all of the URLs of our website that we want to expose to the search engines. Each one of these URLs is added through a <url></url> tag and inside of them, we will find two elements which are the <loc></loc> tag that contains the page URL and the <lastmod></lastmod> tag which contains the last time the page was modified.

More elements could be used and added to our sitemap.xml file. For this tutorial, we will only use the basic structure of this file but if you want to learn more about these elements, you can check out https://www.sitemaps.org/protocol.html.

Build Sitemap Generator

Now that we understand the structure of a sitemap, let's create a new folder inside our next.js project called scripts, and inside of it, let's create a new file called generate-sitemap.mjs. This file will contain the logic of our sitemap generator and its main objective will be to read all of the content that our web application has and then build our sitemap.xml file based on it.

Let's start by adding the following code into our generate-sitemap.mjs file:

scripts/generate-sitemap.mjs
import { writeFileSync } from "fs";
 
(() => {
  const sitemap = `<?xml version="1.0" encoding="UTF-8"?>
    <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"></urlset>
  `;
 
  writeFileSync("public/sitemap.xml", sitemap);
})();

Here, we created a call function that contains a simple definition of our sitemap file structure, and then we call the writeFileSync function to write the content of our sitemap into the public/sitemap.xml file (makes the URL accessible through the route /sitemap.xml).

Now, our sitemap structure does not have any <url></url> definitions inside the <urlset></urlset> tag which means that none of our website URLs are going to be crawled into the search engines. For us to dynamically generate these <url></url> tags, we need to iterate over each one of our website pages and we can do so by updating our generate-sitemap.mjs file with the following code:

scripts/generate-sitemap.mjs
import { writeFileSync } from "fs";
 
(() => {
  const pages = [
    '/about',
    '/blog/first-post',
    '/blog/second-post'
  ];
 
  const urlSet = pages.map((page) => {
    return `<url>
      <loc>http://www.example.com${page}</loc>
      <lastmod>${new Date().toISOString()}</lastmod>
    </url>`;
  }).join("");
 
  const sitemap = `<?xml version="1.0" encoding="UTF-8"?>
    <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">${urlSet}</urlset>
  `;
 
  writeFileSync("public/sitemap.xml", sitemap);
})();

Here, we added two new elements to our file. The first element is an array that contains all the URLs of our website (we will generate this dynamically in the following step) and the second element is an iteration over the new URLs array that will generate a <url></url> tag for each page of our web application.

To add these generated <url></url> tags into our sitemap, we need to append them into the sitemap string variable and we can do so by adding inside the <urlset></urlset> tag the code ${urlSet}.

Now that the basic structure of our sitemap generator is completed, we need to change the static URLs array we defined at the beginning of our file for an array that is generated dynamically based on the files our web application has. To do so, we will use a package called globby which's main purpose is to read the file system of our project and return all the pathnames of our website that match a set of rules that we will create as parameters.

Let's now add globby into our project by running the command npm install --save-dev globby. Once the package is installed, we can update our generate-sitemap.mjs file with the following code:

scripts/generate-sitemap.mjs
import { writeFileSync } from "fs";
import { globby } from "globby";
 
(async () => {
  const pages = await globby([
    "pages/*.js",
    "posts/*.md",
    "!pages/_*.js",
    "!pages/api"
  ]);
 
  const urlSet = pages.map((page) => {
    const path = page
      .replace("pages", "")
      .replace("index", "")
      .replace("posts", "blog")
      .replace(".js", "")
      .replace(".md", "");
 
    return `<url>
      <loc>http://www.example.com${path}</loc>
      <lastmod>${new Date().toISOString()}</lastmod>
    </url>`;
  }).join("");
 
  const sitemap = `<?xml version="1.0" encoding="UTF-8"?>
    <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">${urlSet}</urlset>
  `;
 
  writeFileSync("public/sitemap.xml", sitemap);
})();

We have now added multiple changes to our file. The first change is the restructuring of our URLs array variable called pages which has been replaced for the usage of the globby function. For this, we redefined the variable with the following code: const pages = await globby([]);.

Here, we added an await right before the globby function call and the reason for this is that globby is an asynchronous function and we need to wait for it to finish its execution before we can continue with the next step of our code. Since we can't use await in regular functions, we added the async declaration to our main function call.

Now, for globby to generate the pathnames we need from our web application, we defined some parameter rules which are:

  • pages/*.js: Match all the files inside the pages folder with the file extension .js.
  • posts/*.md: Match all the files inside the posts folder with the file extension .md.
  • !pages/_*.js: Skip all the files inside the pages folder with the file extension .js that start with _.
  • !pages/api: Skip all of the files inside the api folder.

These rules will all be considered over each file inside our web application. If a file matches one rule but does not match another, it will not be added to the final result. A good example is the file pages/_document.js. This file matches the rule pages/*.js but it does not match !pages/_*.js, because of this, the page will not be added to the final URLs array.

The last change we made in the code is over the URLs array iteration. Here, we added a new variable called path and its main objective is to clean up the page URLs retrieved from globby by using some replace rules. To understand how this works, let's take a look at the following example:

// Array generated by globby
[
  'pages/index.js',
  'pages/about.js',
  'posts/first-post.md',
  'posts/second-post.md'
]
 
// Array passed through replace rules
[
  '/',
  '/about',
  'blog/first-post',
  'blog/second-post'
]

Deploy & Test

To wrap up this tutorial, we will add a postbuild script to our package.json file with the command "node ./scripts/generate-sitemap.mjs". If you are deploying your project with vercel, this script will be executed right after the build process. If you try to deploy your web application to test the sitemap generator, you should go right after the deployment is complete into the example.com/sitemap.xml page and check that the sitemap was generated correctly (example.com should be replaced for your website URL).

With this, our package.json file should look like this:

package.json
{
  "name": "nextjs-project",
  "private": true,
  "scripts": {
    "dev": "next dev",
    "build": "next build",
    "start": "next start",
    "postbuild": "node ./scripts/generate-sitemap.mjs"
  },
  "dependencies": {
    "next": "13.2.4",
    "react": "18.2.0",
    "react-dom": "18.2.0"
  },
  "devDependencies": {
    "globby": "^13.1.3"
  }
}