Warning: This tool or project is no longer maintained and kept available only for archival purposes. Since GoodRelations and schema.org have evolved significantly in the past years, the current status available on this page is unlikely to function as expected. We take no responsibility for any damage caused by the use of this outdated work, to the extent legally possible.

Due to a lack of resources, we are unable to provide support for this project outside of consulting projects or sponsored research. Please contact us if you can contribute resources to update and enhance these resources.

GoodRelations - The Web Vocabulary for E-Commerce

This is the archive of the goodrelations dicussion list

GoodRelations is a standardized vocabulary for product, price, and company data that can (1) be embedded into existing static and dynamic Web pages and that (2) can be processed by other computers. This increases the visibility of your products and services in the latest generation of search engines, recommender systems, and other novel applications.

[goodrelations] Get 4 Billion Triples of Current GoodRelations RDF/XML Data for 20 Million Amazon Pages

Martin Hepp (UniBW) martin.hepp at ebusiness-unibw.org
Thu Jan 28 16:42:18 CET 2010


Hi Giovanni,

Two remarks:

First, the power of Virtuoso sponger cartridges is of course that they 
create structured RDF data on the fly, fresh from the API. My intention 
was simply to tell researchers and practitioners a simple and cheap way 
to get hold of a huge body of real-world GoodRelations data. Keep in 
mind that the rest of LOD data seems to account for a total of only 8 
billion triples for the moment

Second: Yes, I of course hope that Amazon will be among the early 
adopters of GoodRelations, in particular given that there is now growing 
evidence that this does two things in one turn:

1. Have a positive impact on ranking in current search engines and
2. Pave the ground for much better visibility in the evolving Web of 
Linked Data.


And yes: If anyone has high-profile contact to Amazon,  I would 
appreciate  being introduced. I have been approached by the Web 
development leads of ca. ten large online retailers and am helping them 
add GoodRelations to their pages. It would surprise me if Amazon wanted 
to be a follower in that field.

Best wishes

Martin


Giovanni Tummarello wrote:
> Hi Martin,
>
> while i am all for the sitemap and finding the individual pages,
>
> the step going trough the URI burner is perilous, i mean when the data
> is explicitly there in RDFa then its clear the producer wants it to be
> reused. When its not then you're into the scraping business (ok the
> uriburner might use the direct APIs sometimes but still)
>
> I think with bestbuy having RDFa and with the support by google it
> shouldnt  be long to see amazon put actual RDFa on their pages?
> (anyone knows someone at amazon? :-) )
>
> cheers
> Giovanni
>
> On Fri, Jan 15, 2010 at 3:08 PM, Martin Hepp (UniBW)
> <martin.hepp at ebusiness-unibw.org> wrote:
>   
>> Hi all,
>>
>> It seems there is a quick and easy way to get a full RDF/XML
>> representation of all 20 Million Amazon offers.
>>
>> Here is how it will likely work:
>>
>> 1. Take the Amazon sitemap index files, as given by
>> http://www.amazon.com/robots.txt
>>
>> # Sitemap files
>> Sitemap: http://www.amazon.de/sitemap_index_0.xml
>> Sitemap: http://www.amazon.de/sitemap_index_1.xml
>> Sitemap: http://www.amazon.de/sitemap_index_2.xml
>> Sitemap: http://www.amazon.de/sitemap_index_3.xml
>> Sitemap: http://www.amazon.de/sitemap-manual-index.xml
>> Sitemap: http://www.amazon.de/sitemap_wishlist_index.xml
>>
>>
>> 2. Take the individual sitemap files from all of those, e.g.
>>
>> http://www.amazon.de/sitemap_page_0.xml.gz from
>> http://www.amazon.de/sitemap_index_0.xml
>>
>> <sitemapindex xmlns="http://www.google.com/schemas/sitemap/0.84">
>> <sitemap>
>> <loc>http://www.amazon.de/sitemap_page_0.xml.gz</loc>
>> <lastmod>2006-10-16</lastmod>
>> </sitemap>
>>
>> 3. Now, for each of those ca. 20 Million entries given as <loc>
>> elements, e.g.
>>
>> http://www.amazon.com/Pull-Power-Semantic-Transform-Business/dp/1591842778/
>>
>> <url>
>> <loc>http://www.amazon.com/Pull-Power-Semantic-Transform-Business/dp/1591842778/</loc>
>> </url>
>>
>> use the URIburner service (http://uriburner.com/sparql/) to extract the
>> complete commercial meta-data in GoodRelations.
>>
>> Note that not all URIs are current and that URIburner cannot produce
>> GoodRelations data for not all pages, but it can for the majority of the
>> ca. 20 Million pages.
>>
>> You will get, on average, 200 GoodRelations triples per Amazon page, so
>> the total will be in the order of magnitude of 4 billion !
>>
>> (If you want to check it for yourself, try
>>
>> select COUNT (*) WHERE
>> {?s ?p ?o.
>> FILTER (regex(?o, "^http://purl.org/goodrelations/v1#", "i") or
>> regex(?p, "^http://purl.org/goodrelations/v1#", "i"))
>> }
>>
>> against the URI
>>
>> http://www.amazon.com/Pull-Power-Semantic-Transform-Business/dp/1591842778/
>>
>> Important: Using URIburner on the full set of Amazon URIs will likely
>> impose a great load on the underlying server, operated by OpenLink
>> Software. If you want to use this option, in particular for commercial
>> purposes, please contact Kingsley Idehen before you start. His e-mail is
>> <kidehen at openlinksw.com>.
>>
>> Best wishes
>>
>> Martin Hepp
>>
>>
>> --
>> --------------------------------------------------------------
>> martin hepp
>> e-business & web science research group
>> universitaet der bundeswehr muenchen
>>
>> e-mail:  hepp at ebusiness-unibw.org
>> phone:   +49-(0)89-6004-4217
>> fax:     +49-(0)89-6004-4620
>> www:     http://www.unibw.de/ebusiness/ (group)
>>         http://www.heppnetz.de/ (personal)
>> skype:   mfhepp
>> twitter: mfhepp
>>
>> Check out GoodRelations for E-Commerce on the Web of Linked Data!
>> =================================================================
>>
>> Project page:
>> http://purl.org/goodrelations/
>>
>> Resources for developers:
>> http://www.ebusiness-unibw.org/wiki/GoodRelations
>>
>> Webcasts:
>> Overview - http://www.heppnetz.de/projects/goodrelations/webcast/
>> How-to   - http://vimeo.com/7583816
>>
>> Recipe for Yahoo SearchMonkey:
>> http://www.ebusiness-unibw.org/wiki/GoodRelations_and_Yahoo_SearchMonkey
>>
>> Talk at the Semantic Technology Conference 2009:
>> "Semantic Web-based E-Commerce: The GoodRelations Ontology"
>> http://www.slideshare.net/mhepp/semantic-webbased-ecommerce-the-goodrelations-ontology-1535287
>>
>> Overview article on Semantic Universe:
>> http://www.semanticuniverse.com/articles-semantic-web-based-e-commerce-webmasters-get-ready.html
>>
>> Tutorial materials:
>> ISWC 2009 Tutorial: The Web of Data for E-Commerce in Brief: A Hands-on Introduction to the GoodRelations Ontology, RDFa, and Yahoo! SearchMonkey
>> http://www.ebusiness-unibw.org/wiki/Web_of_Data_for_E-Commerce_Tutorial_ISWC2009
>>
>>
>>
>> _______________________________________________
>> goodrelations mailing list
>> goodrelations at ebusiness-unibw.org
>> http://ebusiness-unibw.org/cgi-bin/mailman/listinfo/goodrelations
>>
>>     
>
>   

-- 
--------------------------------------------------------------
martin hepp
e-business & web science research group
universitaet der bundeswehr muenchen

e-mail:  hepp at ebusiness-unibw.org
phone:   +49-(0)89-6004-4217
fax:     +49-(0)89-6004-4620
www:     http://www.unibw.de/ebusiness/ (group)
         http://www.heppnetz.de/ (personal)
skype:   mfhepp 
twitter: mfhepp

Check out GoodRelations for E-Commerce on the Web of Linked Data!
=================================================================

Project page:
http://purl.org/goodrelations/

Resources for developers:
http://www.ebusiness-unibw.org/wiki/GoodRelations

Webcasts:
Overview - http://www.heppnetz.de/projects/goodrelations/webcast/
How-to   - http://vimeo.com/7583816

Recipe for Yahoo SearchMonkey:
http://www.ebusiness-unibw.org/wiki/GoodRelations_and_Yahoo_SearchMonkey

Talk at the Semantic Technology Conference 2009: 
"Semantic Web-based E-Commerce: The GoodRelations Ontology"
http://www.slideshare.net/mhepp/semantic-webbased-ecommerce-the-goodrelations-ontology-1535287

Overview article on Semantic Universe:
http://www.semanticuniverse.com/articles-semantic-web-based-e-commerce-webmasters-get-ready.html

Tutorial materials:
ISWC 2009 Tutorial: The Web of Data for E-Commerce in Brief: A Hands-on Introduction to the GoodRelations Ontology, RDFa, and Yahoo! SearchMonkey 
http://www.ebusiness-unibw.org/wiki/Web_of_Data_for_E-Commerce_Tutorial_ISWC2009






More information about the goodrelations mailing list