GoodRelations is a standardized vocabulary for product, price, and company data that can (1) be embedded into existing static and dynamic Web pages and that (2) can be processed by other computers. This increases the visibility of your products and services in the latest generation of search engines, recommender systems, and other novel applications.
Martin Hepp
martin.hepp at ebusiness-unibw.org
Tue Aug 24 15:04:40 CEST 2010
Dear all: Some shop applications, unfortunately, display the very same item at multiple URIs. This is problematic for Search Engine Optimization (SEO) and the Web of Data alike. Examples and Causes =================== There are two typical causes: a) The navigation path of the shop system is used to create "clean" URIs: http://www.myshop.com/staplers/green_pocket_stapler123 http://www.myshop.com/featured-items/green_pocket_stapler123 b) Parameters, e.g. such to control the language of the output, the preferred currency, the session ID (bad...), or the referrer. This case is more severe, because it can easily cause 10 - 100 duplicates per single page: http://www.myshop.com/staplers/green_pocket_stapler123 http://www.myshop.com/staplers/green_pocket_stapler123?lang=en http://www.myshop.com/staplers/green_pocket_stapler123?currency=usd? lang=en http://www.myshop.com/staplers/green_pocket_stapler123?referrer=clicksale Problems ======== 1. If you embed GoodRelations data markup in RDFa syntax to your HTML/ XHTML shop templates, this may cause a massive duplication of data elements for applications that are trying to consume our shop data. 2. This will reduce the findability of your items in GoodRelations- aware applications. 3. It spoils the Web of Data ("proliferation of URIs"). 4. Your pages will receive a lower ranking in search engines, because the amount of links will be spread over multiple URIs. 5. Pages may even be banned from search engines ("duplication of content"); that is independent of whether you are using GoodRelations or not. 6. Crawlers will waste more resources crawling your site, consume more of your valuable bandwidth, and are more likely to use outdated cached versions of your pages. Solutions ========= 1. The ideal solution is to aim for canonical URIs (one URI per product) as much as possible. This may not be easy for pattern a), but it is straightforward for case b), e.g. by using session cookies and / or HTTP redirects. 2. If that is not possible, you should use *absolute* instead of *relative* identifiers in RDFa for all major data elements ("about" attribute in RDFa). For example, in the template for the "product item" page, use <div about="http://www.myshop.com/staplers/green_pocket_stapler123#offering " typeof="gr:Offering"> ... instead of <div about="#offering" typeof="gr:Offering"> ... The effect will be that no matter from which URI the page was actually requested, the same RDF data will be extracted. It requires, though, that you can determine the canonical URI of the page at the time of the request. 3. The last option (but the least powerful, yet still much better than doing nothing) is to add owl:sameAs statements from the RDFa pattern to the canonical URI. The canonical URI should also be used for the foaf:page property, which is the crucial link from the data to the page from where the product can be ordered. Example: <div about="#offering" typeof="gr:Offering"> <div rel="owl:sameAs" resource="http://www.myshop.com/staplers/green_pocket_stapler123#offering "></div> <div property="rdfs:label" content="Cool green stapler for $8.99" xml:lang="en"></div> ... <div rel="foaf:page" resource="http://www.myshop.com/staplers/green_pocket_stapler123 "></div> </div> </div> Best wishes Martin Hepp -------------------------------------------------------- martin hepp e-business & web science research group universitaet der bundeswehr muenchen e-mail: hepp at ebusiness-unibw.org phone: +49-(0)89-6004-4217 fax: +49-(0)89-6004-4620 www: http://www.unibw.de/ebusiness/ (group) http://www.heppnetz.de/ (personal) skype: mfhepp twitter: mfhepp Check out GoodRelations for E-Commerce on the Web of Linked Data! ================================================================= * Project Main Page: http://purl.org/goodrelations/ * Quickstart Guide for Developers: http://bit.ly/quickstart4gr * Vocabulary Reference: http://purl.org/goodrelations/v1 * Developer's Wiki: http://www.ebusiness-unibw.org/wiki/GoodRelations * Examples: http://bit.ly/cookbook4gr * Presentations: http://bit.ly/grtalks * Videos: http://bit.ly/grvideos