Python4Spongers

RDF "Spongers" are a powerful middleware architecture, developed by OpenLink Software (the makers of Virtuoso), for creating RDF rich meta-data on demand.


The key idea is that the middleware consults public APIs or other data sources for collating relevant RDF meta-data for a given URI.

Unfortunately, the development of such sponger components is still difficult for many programmers. On this page, I propose a simple skeleton for coding the core transformation in Python.


This still requires a wrapper so that the code can be used in a Virtuoso environment, but that should be doable.


If you have any questions or suggestions, please contact me at mheppATcomputerDOTorg.


Example:

  1. !/usr/bin/env python
  2. encoding: utf-8

""" py4spongers.py

Example of how the principle of OpenLink "Sponger" technology can be implemented in Python

Created by Martin Hepp on 2010-03-22. http://www.heppnetz.de/

This software is free software under the LPGL.

"""

import re from rdflib import *

def rdf4uri(uri="", base_uri="http://example.com/uriburner/"):

   """
   This method returns available RDF meta-data for the Web page identified by `` as a string containing RDF/XML.
   
   Input Parameters:
     uri : URI of the page
     base_uri : Base URI to be used for the RDF model
   
   Output Parameter
   
    a string containing RDF/XML
   """
   # Step 1: Fetch entity identifier from URI
   
   # Amazon Example:
   # http://www.amazon.com/Apple-touch-Generation-NEWEST-MODEL/dp/B002M3SOBU/ref=sr_1_9?ie=UTF8&s=electronics&qid=1269264339&sr=8-9
   # http://www.amazon.com/Apple-touch-Generation-NEWEST-MODEL/dp/B002M3SOBU/
   # 
   # We use a simple regex to extract the ID from the URI (needs to be adapted per each sponger)
   p = re.compile(r".*/dp/(\w*)/.*")
   m = p.match(uri)
   identifier = m.group(1)

   # Step 2: Fetch meta-data for that data entity, e.g. via AMAZON API
   # contact API --> omitted in this example
   # In this example, we simply return static data
   
   # Step 3: Compile RDF Graph
   NS = Namespace(base_uri)
   GR = Namespace('http://purl.org/goodrelations/v1#') 
   RDFS = Namespace("http://www.w3.org/2000/01/rdf-schema#")
   RDF = Namespace("http://www.w3.org/1999/02/22-rdf-syntax-ns#")
   
   g = ConjunctiveGraph()
   # static dummy data, to be replaced by real content from API
   
   g.add((NS[identifier+'#Product'],RDF['type'],GR['ProductOrServicesSomeInstancesPlaceholder']))
   g.add((NS[identifier+'#Product'],RDFS['label'],Literal('SampleProduct')))

   # Step 4: Return Graph as RDF/XML
   return g.serialize()

if name == 'main':

   rdf_xml = rdf4uri(uri='http://www.amazon.com/Apple-touch-Generation-NEWEST-MODEL/dp/B002M3SOBU/')
   print rdf_xml

In the current form, it will return a static pattern for each valid Amazon product URI:

 ``
   ``
   ``SampleProduct``
 ``