GoodRelations is a standardized vocabulary for product, price, and company data that can (1) be embedded into existing static and dynamic Web pages and that (2) can be processed by other computers. This increases the visibility of your products and services in the latest generation of search engines, recommender systems, and other novel applications.
Martin Hepp
martin.hepp at ebusiness-unibw.org
Fri Jun 24 10:47:39 CEST 2011
Dear all: When you publish large amounts of GoodRelations in dump files, e.g. in RDF/XML or Ntriples syntax, you should take measures against excessive traffic caused by crawlers and other clients. The first set of measures is to properly support caching and avoid unnecessary requests. In particular, you should 1. properly set the lastmod attribute in your sitemap.xml, i.e., avoid using the date of creating the sitemap for all entries 2. properly configure caching information sent in the HTTP response header. Second, if you host really significant amounts of data, you should limit the maximum download speed for certain resources and exclude bad bots. Here is a good resource on this for Apache environments: http://www.whoopis.com/howtos/web-bandwidth-limit.html#bwmod For instance, you could throttle the download speed for large .rdf files to 10 KB/ sec: # RDF/XML files larger than 1MB go at 10k/sec max LargeFileLimit .rdf 1000 10000 Best wishes Martin Hepp