The Search Engine Whisperer:How to Use Metadata

Remember my last blog post about metadata? Did it make you wonder how exactly you could start putting it to work for your website? Thankfully, Google, Bing and Yahoo got together to come up with a standard format for web semantics known as ‘schema’, a very detailed guide to which can be found at schema.org.

Once you’ve added this very, very important information to your website in a machine-readable way, search engines will begin to understand just what string of characters means what. They may even begin to use that information in a more intelligent way when the time comes to figure out who comes first, or at least how to display individual webpage entries in the search engine rankings page.

This article is going to show you how to modify your HTML tags to add a bit of useful description to the various pieces of information on your website. As a result, I have to assume that you, the reader, have or are willing to acquire at least a basic knowledge of HTML. Trust me, HTML isn’t that crazy or hard to pick up, and isn’t nearly as complicated as all the ‘real’ programming languages out there. It’s about as complicated as formatting a Word document. If you don’t have this sort of knowledge just yet, go ahead and head over to this wonderful beginner’s guide over at SitePoint. Don’t worry, I’ll wait….

For those of you who are well versed in the arcane ways of the web, or have taken the trouble to gain even a cursory understanding before continuing with this article, let’s start here: the humble <div> tag.

You may know what it is,
but the machine doesn’t

It really doesn’t matter which tag you apply this to, just as long as that tag contains a piece of information which should have particular significance. For example, if your objective for your website is to put together a list of books you really rather enjoyed reading (or didn’t), you’ll want to make sure the search engine understands that your seemingly meaningless and random series of characters is actually a list of books.

One thing, one rule of thumb, to keep in mind throughout this process is this: if your visitor can see, read and understand it, make sure the search engine robot can understand it, too.

For example:

<div> <h1>Outliers: The Story of Success</h1> <ul> <li>Author: Malcolm Gladwell</li> <li>Genre: Non-fiction</li> <li>Subject(s): Social Sciences</li> </ul> </div>

This is a perfect example of some information presented on a webpage that is perfectly comprehensible by human beings who happen to speak English, but not at all by a machine which is looking for the meaning of these strings of characters. To a human Anglophone, this is a listing for a book. To a machine, it’s a series of HTML elements with strings of flat text interspersed.

Unless we tell it otherwise.

Itemscope

We first use the ‘itemscope’ attribute to tell the browser or search engine robot that is reading this piece of code that this is an ‘item’ that means something, and that program had better take special notice of this ‘item’. It also tells the browser or search engine that everything inside the HTML tag with the ‘itemscope’ attribute pertains to that tag. Take this book list entry, for example:

<div itemscope> <h1>Outliers: The Story of Success</h1> <ul> <li>Author: Malcolm Gladwell</li> <li>Genre: Non-fiction</li> <li>Subject(s): Social Sciences</li> </ul> </div>

Itemtype

Now that the itemscope flag is in play, we need to assign an ‘itemtype’ to the <div> to which we are assigning meaning. The itemtype tells the browser or search engine robot what this item is supposed to be by referring to a particular page on the schema.org website that will, in turn, tell it what to look for when evaluating items of that type. Note that a full list of itemtypes and other useful parameters can be found on the schema.org website.

<div itemscope itemtype="https://schema.org/Book"> <h1>Outliers: The Story of Success</h1> <ul> <li>Author: Malcolm Gladwell</li> <li>Genre: Non-fiction</li> <li>Subject(s): Social Sciences</li> </ul> </div>

Itemprop

So now that we’ve said what sort of item this is, we should start telling our friendly search engine robot a few things about it. The ‘itemprop’ attribute, along with a parameter that describes the nature of the contents of the tag to which we’re appending the itemprop attribute, is our way of telling the friendly GoogleBot about our item. It’s show-and-tell, and our new robot overlords are invited.

<div itemscope itemtype="https://schema.org/Book"> <h1 itemprop="name">Outliers: The Story of Success</h1> <ul> <li itemprop="author">Author: Malcolm Gladwell</li> <li itemprop="genre">Genre: Non-fiction</li> <li itemprop="about">Subject(s): Social Sciences</li> </ul> </div>

A nest of properties

We can even nest these attributes to tell more about an item which is part of one of the item properties. For instance, an ‘author’ is also a ‘person’, which has a number of attributes and parameters that can be associated with it. You will, however, want to surround the piece of information with a new tag. If you don’t want to affect the existing formatting of the page by adding a tag, you can use the <span> tag, which exists only for the purpose of <h1 itemprop="name">Outliers: The Story of Success</h1>
<ul>
<li itemprop="author" itemscope itemtype="https://schema.org/Person">
<span itemprop="jobTitle">Author</span>:
<span itemprop="name">Malcolm Gladwell</span>
</li>
<li itemprop="genre">Genre: Non-fiction</li>
<li itemprop="about">Subject(s): Social Sciences</li>
</ul>
</div>

It’s about time

There are also other kinds of information that search engine robots aren’t quite equipped to pick out from amongst other pieces of text. For instance, if you ask an average search engine robot to find you the time of day, it wouldn’t give it to you. This isn’t because the robot is trying to give you the cold shoulder – it’s because they can’t read it unless it’s specifically pointed out to them.

Consider this: what if we want to add the date the book was published to the book list item we’ve been marking up? We’ll need to use a special tag in this case to tell the search engine robot that the particular piece of information should be read as a time or date – in this case, the <time> tag.

<div itemscope itemtype="https://schema.org/Book"> <h1 itemprop="name">Outliers: The Story of Success</h1> <ul> <li itemprop="author" itemscope itemtype="https://schema.org/Person"> <span itemprop="jobTitle">Author</span>: <span itemprop="name">Malcolm Gladwell</span> </li> <li itemprop="genre">Genre: Non-fiction</li> <li itemprop="about">Subject(s): Social Sciences</li> <li>Publishing date: <time datetime="2008-11-18">November 18, 2008</time></li> </ul> </div>

If we happened to know the exact time that the book was published was 9:00am on November 18, 2008, we could say the following:

<li>Publishing date: <time datetime="2008-11-18T09:30">November 18, 2008</time></li>

Keep in mind that you’ll need to format the time according to the ISO 8601 date/time standard, which takes the date as a four-digit year, dash, two-digit month, dash, two-digit day, a ‘T’, and then the hour in four-digit 24-hour time (remember the colon). Since that was kind of obtuse, I’ll show it to you like this:

YYYY-MM-DDTHH:MM

Or, assuming July 12, 2012 and 4:28pm:

2012-07-12T16:28

It’s a wrap!

That should serve as a sufficient introduction to the wonderful world of Schema-format metadata. Feel free to ask questions in the comments, which I’ll do my very best to research and answer.