How To Do A Content Audit Part 1: The Quantitative Inventory

Overview

With all the buzz about content strategy, it can be dizzying trying to decide where to start. Inspired by Kristina Halvorson and Melissa Rach’s book, Content Strategy For The Web (second edition) this post is meant to describe one method of accomplishing the first step in the planning process; a Quantitative Inventory. As described so eloquently by the authors, this is an initial assessment of a site’s current content. Completely unbiased and objective, this phase of content planning is driven purely by numbers (of course, this version is slightly SEO-centric). Subsequent posts in this series will address the qualitative assessments, part 2 will describe how to do a best practices assessment for SEO, and part 3 will address the strategic assessment.

So, what’s the point of a quantitative assessment?

“If you don’t understand where you are now, you can’t create a plan to get you where you need to be.” – Kristina Halvorson

Step 1: Collect & Organize URLs

Depending on the size of your site, it might not be feasible to complete this inventory all at once. In this case, the authors suggest to audit only a sample of your content, but I prefer to do individual inventories by site sections or directories. It might just be my OCD, but I like to identify every single piece of content out there, and by segmenting the site, this is possible. If you are lucky enough to have all the URLs on your site, you can skip this step. If, like me, you have so many pages there is no way in hell you have all the content accounted for, continue reading.

  1. Export a list of URLs from your favorite analytics tool: Google Analytics, Site Catalyst, Screaming Frog (in crawl mode), even Google Webmaster tools will do. Even better is a combination of all of the above, including a list directly from your server. Check with your IT guy or gal to see what the best way of completing this list is. In your analytics tool, make sure to view at least a 3 month period to get a good view of all content.
  2. Once your list of URLs is complete and de-duped, create a new excel document with them all in Column B. Organize them alphabetically, which should also order them by page and subdirectory. Label column A “ID Number.”
  3. The ID numbers will be used for quick reference by you and your team, but requires a little bit of thinking. The highest level, or homepage, should be labeled 1.0.0.0, and add as many decimal points as your greatest number of directories go. Hopefully you don’t have content hidden under more than 3 levels, and if you do, that is something to take note of. Keep the number system consistent, and logical. Each folder has it’s own number, but starts back at 1 whenever there is a new folder.
    • For example:
      • /animal/mammal/dog/lab = 1.1.1.1
      • /animal/mammal/dog/pit-bull = 1.1.1.2
      • /animal/reptile/frog/poison-dart = 1.2.1.1
      • /animal/reptile/frog/tree-frog = 1.2.1.2
      • /animal/reptile/snake/anaconda = 1.2.2.1
      • /animal/reptile/snake/cobra = 1.2.2.2
      • /animal/reptile/snake/python = 1.2.2.3

It can get hairy when transforming many page and folder names into numbers, but just take your time and think about them logically. If you’re an excel ninja, you might even be able to whip up a formula to do it for you.

Step 2: Collect Page Element Data

In this step, you’ll grab the basic page elements of each URL using Screaming Frog in List Mode. If you have never used List Mode before, it is a great way to get all the data Screaming From provides, but only for the pages you specify. This includes page titles, headings, their lengths, content types (HTML, pdf, etc), canonical data, redirects, meta data such as descriptions and keyword tag, etc.

  1. Open up a text editor that permits saving the plain text file type (.txt), such as Notepad.
  2. Copy your URLs from column B, and paste them into the text editor. Save this file as .txt, and name it whatever you like, like “URL List”.
  3. Open (or download) Screaming Frog
  4. Under the Mode drop-down menu, select “List” and click “Select File” to upload your list of URLs.
  5. Hit Start, and when it is complete, export the list. It should be in the same order which you uploaded the list by, so you can now add your ID number column to this spreadsheet.
  6. Once you have the export, you can choose to delete a few columns which won’t provide very much value, such as:
    • Meta keyword length
    • H2 Length
    • Meta Refresh (if you know you don’t use them)

Screaming Frog SEO tool in List Mode

 Step 3: Collect Traffic and Conversion Data

Now that you have the page element information, it’s time to separate the wheat from the chaff, so-to-speak. Use your analytics tool of choice to get some data on how the content currently performs, specifically in search engines. You can use tools specific to a search engine, like Google  or Bing Webmaster tools. Or, create a custom segment to include all major search engines if you are using Site Catalyst or Google Analytics. Add a few extra columns at the end to accomodate the following:

  • Impressions
  • Clicks
  • Click-Through Rate
  • If available, Average rank.

Bonus Metrics:

  • Include another column for the top 3 referring (non branded) keywords for each page for an idea of how the content is found.
  • Any other metrics you think are important when examining your content, such as Bounce Rate, Exit Rate, or Time on Page.

Ultra Bonus Metric:

  • Unique Page Views – With this, you’ll identify pages which visitors are actively consuming, regardless of source. (This will be an important metric for Part 2: SEO Best Practices Assessment).

Step 4: Collect Moz Metrics

SEOmoz’s metrics are extremely beneficial, although the value of this data doesn’t come from looking at any single page, but rather in comparison to other pages. Create 5 more columns for Page Authority, MozRank, MozTrust, Links, and Unique linking domains. You won’t need domain-level metrics since it will all be the same (unless you’re including multiple domains or microsites). Screaming Frog already provided some link metrics, but the more data sources the better. With the Mozbar installed, click each URL and collect the five datapoints. This step is time consuming, but invaluable once you have in all in place.

If you want to save some time during this process, there is a great way to do it automatically within a Google Doc. Check out this post from SEER Interactive on how to automatically pull in moz metrics for a list of URLs.

SEOmoz SEO Tools

Remember:

  • Page Authority predicts the likelihood of a single page to rank well, regardless of its content. The higher the Page Authority, the greater the potential for that individual page to rank well in search results.
  • MozRank represents a link popularity score. It reflects the importance of any given web page on the Internet. Pages earn MozRank by the number and quality of other pages that link to them. The higher the quality of the incoming links, the higher the MozRank.
  • MozTrust is SEOmoz’s global link trust score. It is similar to MozRank but rather than measuring link popularity, it measures link trust. Receiving links from sources which have inherent trust, such as the homepages of major university websites or certain governmental web pages, is a strong trust endorsement.

Step 5: Collect Performance Data

In this section, you’ll collect data on how each piece of content actually performs when your users try to access it. I like the page performance tool at Pingdom because they also provide a performance score on a scale of 0-100, which you can also utilize in your content audit. The number of server requests is also reported, which you can view in more detail below the scorecard.

The request numbers are great for identifying items to trim down your load times. Load time is a crucial metric, because even if the content is great no one will wait around forever to see it. In this regard, I have noticed this metric can vary throughout the time of day, speed of connection, etc. To control for these variables, try to collect this data during “down times”, either early in the morning or late at night. If you can’t do that, just make sure you are gathering it within about the same 1 hour period of the day. You can always come back in 24 hours to pick up where you left off.

Page size here is reported in KB, which is more useful than the same provided by Screaming Frog (in bits). Use the tool for your individual URLs, and make 4 additional columns in your inventory to collect the following:

  • Performance Grade
  • Requests
  • Load Time
  • Page Size

Pingdom page speed tool

 That’s it! Now you have all your important metrics within easy reach. Use it to evaluate which pages receive significant traffic, which don’t, and which should. What pages are extremely slow? Do you have duplicate content? Technical problems? Looking at this completed inventory for only a few minutes will reveal some important observations.

For a sample of what this looks like, see this sample spreadsheet. Keep this around for the next part of the series where I’ll describe how to complete a SEO Best Practices Assessment.

If you enjoyed this post, see part 2 about how to do a qualitative content audit.

 

Comments

  1. Thanks for the useful post. On the purely quantitative side, I would just point out that for large sites it’s sometimes necessary to chunk / slice the inventory by things like topic, site, and layer.

  2. Harris,

    Your profile is very interesting to me :) I’m looking forward to seeing how the emerging discipline of content strategy fits with SEO and Analytics over the next few years. Looks like you have a healthy mix going on already! For me personally, the challenge I see is getting the resources for “doing the content” right within smaller organizations and companies. I have been playing the “wearer of all hats” for the last year or so and I can see thats a role that can’t last. I’m excellent at communications, but I’m no data scientist. ;) I’m pushing hard with all the clients and agencies to pay attention to content strategy, but so far its been an uphill battle.

    Thanks for the great post!

    • Harris Schachter says:

      Hi Danielle, thanks for the kind words. It certainly can be challenging to be a one-man (or woman) army in digital marketing. However, I do think it is important to be well rounded, not to the point of exhaustion, but to know enough about each discipline that your strategies are all-encompassing. More importantly, you’ll approach each project with a firm grasp of what each player will want and need :)

      The lines are indeed blurring between disciplines, especially for content strategy and SEO. Not to mention, everything should be as data-driven as possible- that will really get clients on board if you can put a dollar sign next to it! Don’t be afraid to reach out if you have any burning questions, the SEO community takes care of its own.

  3. What a great idea, adding Pingdom loadtime/page speed to a content audit report!

Speak Your Mind

*