Simon Miller Team : Web Development Tags : Technology Web Development Programming

Google CSE and Structured Data

Simon Miller Team : Web Development Tags : Technology Web Development Programming

Google Custom Search  Engine (CSE) is a fantastic technology that enables any website, for the low price of $100 annually, to use the power of Google search technologies within your own website seamlessly. For the annual charge you are provided with API access to the Google search results that are provided as XML data. This can easily be styled and parsed with XSL and presented in your website however you wish. For such a low cost of entry it has saved developers hours and hours of work in attempting to create bespoke search engines that perform a similar job.

The Google CSE is great for traditional searching and display of textual results with a hyperlink, but what if you want something a bit more fancy? More importantly, what if your requirements are to filter on search results by arbitrary values such as Category or Price? How about the addition of thumbnails? With the regular CSE implementation you are limited in this regard.

Recently I discovered the Google CSE support of structured data that is only used for CSE and not used by Google itself. Previously you were able to utilise micro-formats to mark up page content to some degree, but the control was limited. With PageMaps you have complete control over the data shown on your custom search results without affecting traditional Google search results.

The following example comes directly from Google’s article on PageMaps:

<html>
      <head>
      <!--
      <PageMap>
           <DataObject type="document">
        <Attribute name="title">ASP.NET for Dummies</Attribute>
        <Attribute name="author">John Smith</Attribute>
        <Attribute name="description">The one stop shop guide to .NET development.</Attribute>
        <Attribute name="page_count">314</Attribute>
        <Attribute name="rating">4</Attribute>
        <Attribute name="date">04/05/2011</Attribute>
     </DataObject>
     <DataObject type="thumbnail">
        <Attribute name="src" value="http://www.example.com/netbook.jpg" />
        <Attribute name="width" value="250" />
        <Attribute name="height" value="320" />
     </DataObject>
</PageMap>
-->
</head>
</html>

By including a HTML commented block in the <head> of your page, Google CSE will process and categorise attributes that can be further used for display and filtering. You can see by the example that a data object has been included for information relating to a document and a second data object provided for a thumbnail. Google CSE simply parses and stores this data on all pages that it indexes and makes it available in your XML for presenting.

The real power comes when you are able to manipulate attributes. In the above examples you will see attributes ‘image_src’, ‘author’, ‘date’ and ‘category’. If we wanted the search result set to be sorted by ‘date’, all we need to do is simple append the attribute to the search query:

https://www.google.com/cse?cx=12345:ABCDE&q=search+string&sort=date-date

In this instance the first ‘date’ refers to the data type of the search result. The second ‘date’ refers to the field name in the returned XML. Another example would be:

https://www.google.com/cse?cx=12345:ABCDE&q=search+string&sort=date-date:a

This will sort by date in ascending order.

One of the other powerful uses of PageMap is filtering data to provide a sub-set of results. For example, if we wanted to return all documents authored by ‘John Smith’, simply append to the query like so:

https://www.google.com/cse?cx=12345:ABCDE&q=search+string&more:pagemap:document-author:John+Smith

There are many other things you can do with PageMap data beyond Sorting and Filtering. This article is only the start of what you can achieve. Read more here to learn about applying Bias to results and restricting results by range.