Untitled



Agenda

  • Introduction to Semantic Web
  • Architecture and languages
  • Semantic Web - Data
  • Extensions
  • Summary
  • References


The Semantic Web is about…

  • Web Data Annotation
    • connecting (syntactic) Web objects, like text chunks, images, … to their semantic notion (e.g., this image is about Innsbruck, Dieter Fensel is a professor)
  • Data Linking on the Web (Web of Data)
    • global networking of knowledge through URI, RDF, and SPARQL (e.g., connecting my calendar with my rss feeds, my pictures, ...)
  • Data Integration over the Web
    • seamless integration of data based on different conceptual models (e.g., integrating data coming from my two favorite book sellers)

Web Data Annotating


Introduction

  • Data integration involves combining data residing in different sources and providing user with a unified view of these data
  • Data integration over the Web can be implemented as follows:
    • Export the data sets to be integrated as RDF graphs
    • Merge identical resources (i.e. resources having the same URI) from different data sets
    • Start making queries on the integrated data, queries that were not possible on the individual data sets.

Export first data set as RDF graph

  • For example the following RDF graph contains information about book “The Glass Palace” by Amitav Ghosh

Export second data set as RDF graph

  • Information about the same book but in French this time is modeled in RDF graph below

Merge identical resources from different data sets

    • Merge identical resources (i.e. resources having the same URI) from different data sets

    Merge identical resources from different data sets

    • Merge identical resources (i.e. resources having the same URI) from different data sets

    Start making queries on the integrated data

    • A user of the second dataset may ask queries like: “give me the title of the original book”
    • This information is not in the second dataset 
    • This information can be however retrieved from the integrated dataset, in which the second dataset was connected with the the first dataset

    Web Architecture

    • Things are denoted by URIs
    • Use them to denote things
    • Serve useful information at them
    • Dereference them

    Semantic Web Architecture

    • Give important concepts URIs
    • Each URI identifies one concept
    • Share these symbols between many languages
    • Support URI lookup

    “Semantic Web Language Layer Cake”

    Identifier, Resource, Representation

      URI, URN, URL

      • A Uniform Resource Identifier (URI) is a string of characters used to identify a name or a resource on the Internet
      • A URI can be a URL or a URN
      • A Uniform Resource Name (URN) defines an item's identity
      • the URN urn:isbn:0-395-36341-1 is a URI that specifies the identifier system, i.e. International Standard Book Number (ISBN), as well as the unique reference within that system and allows one to talk about a book, but doesn't suggest where and how to obtain an actual copy of it
      • A Uniform Resource Locator (URL) provides a method for finding it
      • the URL http://www.sti-innsbruck.at/ identifies a resource (STI's home page) and implies that a representation of that resource (such as the home page's current HTML code, as encoded characters) is obtainable via HTTP from a network host named www.sti-innsbruck.at

      XML Schema Definition (XSD)

      •   A grammar definition language
        • Like DTDs but better
          • Uses XML syntax
        • Defined by W3C
      • Primary features
        • Datatypes
          • e.g. integer, float, date, etc…
        • More powerful content models
          • e.g. namespace-aware, type derivation, etc…

      Introduction

      • The Resource Description Framework (RDF) provides a domain independent data model
      • Resource (identified by URIs)
        • Correspond to nodes in a graph
        • E.g.: 
          • http://www.w3.org/
          • http://example.org/#john
          • http://www.w3.org/1999/02/22-rdf-syntax-ns#Property
      • Properties (identified by URIs)
        • Correspond to labels of edges in a graph
        • Binary relation between two resources
        • E.g.: 
          • http://www.example.org/#hasName 
          • http://www.w3.org/1999/02/22-rdf-syntax-ns#type
      • Literals
        • Concrete data values
        • E.g.: 
          • "John Smith", "1", "2006-03-07"

      Triple Data Model

      • Triple data model:
        <subject, predicate, object>
        • Subject : Resource or blank node
        • Predicate : Property
        • Object : Resource, literal or blank node
      • Example: 
        < ex:john , ex:father -of, ex:bill >
      • Statement (or triple) as a logical formula P(x, y), where the binary predicate P relates the object x to the object y. 
      • RDF offers only binary predicates (properties)

      Graph Model

      • The triple data model can be represented as a graph
      • Such graph is called in the Artificial Intelligence community a semantic net
      • Labeled, directed graphs
        • Nodes: resources, literals
        • Labels: properties
        • Edges: statements

      Introduction

      • RDF Schema (RDFS) is a language for capturing the semantics of a domain, for example:
        • In RDF:
          <#john, rdf:type, #Student>
        • What is a “#Student”?
      • RDFS is a language for defining RDF types:
        • Define classes:
          #Student is a class”
        • Relationships between classes:
          #Student is a sub-class of #Person
        • Properties of classes:
          #Person has a property hasName

      RDF Types

      • Classes:
        <#Student, rdf:type, #rdfs:Class>
      • Class hierarchies:
        <#Student, rdfs:subClassOf, #Person>
      • Properties:
        <#hasName, rdf:type, rdf:Property>
      • Property hierarchies:
        <#hasMother, rdfs:subPropertyOf, #hasParent>
      • Associating properties with classes (a):
        • “The property #hasName only applies to #Person
          <#hasName, rdfs:domain, #Person>
      • Associating properties with classes (b):
        • “The type of the property #hasName is #xsd:string
          <#hasName, rdfs:range, xsd:string>

      Example




      Introduction

      • RDFS has a number of Limitations:
        • Only binary relations
        • Characteristics of Properties, e.g. inverse, transitive, symmetric
        • Local range restrictions, e.g. for class Person, the property hasName has range xsd:string
        • Complex concept descriptions, e.g. Person is defined by Man and Woman
        • Cardinality restrictions, e.g. a Person may have at most 1 name
        • Disjointness axioms, e.g. nobody can be both a Man and a Woman
      • The Web Ontology Language (OWL) provides an ontology language, that is a more expressive Vocabulary Definition Language for use with RDF
        • Class membership
        • Equivalance of classes
        • Consistency
        • Classification

      Introduction (cont')

      • OWL is layered into languages of different expressiveness
        • OWL Lite: Classification Hierarchies, Simple Constraints
        • OWL DL: Maximal expressiveness while maintaining tractability
        • OWL Full: Very high expressiveness, loses tractability, all syntactic freedom of RDF
      • More expressive means harder to reason with
      • Different Syntaxes:
        • RDF/XML (Recommended for Serialization)
        • N3 (Recommended for Human readable Fragments)
        • Abstract Syntax (Clear Human Readable Syntax)

      Example: The Wine Ontology

      • An Ontology describing wine domain
      • One of the most widely used examples for OWL and referenced by W3C.
      • There is also a wine agent associated to this ontology that performs OWL queries using a web-based ontological mark-up language. That is, by combining a logical reasoner with an OWL ontology.
      • The agent's operation can be described in three parts: consulting the ontology, performing queries and outputting results.

      Example: The Wine Ontology Schema

      Querying RDF

      • SPARQL
        • RDF Query language
        • Based on RDQL
        • Uses SQL-like syntax
      • Example:

      Queries

      • PREFIX
        • Prefix mechanism for abbreviating URIs
      • SELECT
        • Identifies the variables to be returned in the query answer
        • SELECT DISTINCT
        • SELECT REDUCED
      • FROM
        • Name of the graph to be queried
        • FROM NAMED
      • WHERE
        • Query pattern as a list of triple patterns
      • LIMIT
      • OFFSET
      • ORDER BY

      Example Query 1

      “Return the full names of all people in the graph”

      result: 

      fullName

      =================

      "John Smith"

      "Mary Smith"  

      Example Query 2

      “Return the relation between John and Mary”

      result:

      p

      =================

      <http://example.org/#marriedTo>

      Example Query 3

      “Return the spouse of a person by the name of John Smith”

      result:

      y

      =================

      <http://example.org/#mary>

      SPARQL and Rule languages

      • SPARQL
        • Query language for RDF triples
        • A protocol for querying RDF data over the Web
      • Rule languages (e.g. SWRL) 
        • Extend basic predicates in ontology languages with proprietary predicates
        • Based on different logics
          • Description Logic
          • Logic Programming

      Introduction

      • A set of dialects to enable rule exchange among different rule systems

      Goals

      • Exchange of Rules 
        • The primary goal of RIF is to facilitate the exchange of rules
      • Consistency with W3C specifications 
        • A W3C specification that builds on and develops the existing range of specifications that have been developed by the W3C
        • Existing W3C technologies should fit well with RIF
      • Wide scale Adoption
        • Rules interchange becomes more effective the wider is their adoption ("network effect“)

      Architecture

      RIF Dialects

      • RIF Core
        • A language of definite Horn rules without function symbols (~ Datalog)
        • A language of production rules where conclusions are interpreted as assert actions
      • RIF BLD
        • A language that lies within the intersection of first-order and logic-programming systems
      • RIF FLD
        • A formalism for specifying all logic dialects of RIF
        • Syntax and semantics described mechanisms that are commonly used for various logic languages (but rarely brought all together)
      • RIF PRD
        • A formalism for specifying production rules
      • Other common specifications
        • RIF DTB – Defines data types and builtins supported by RIF
        • RIF OWL/RDF compatibility – Defines how OWL and RDF can be used within RIF
        • RIF XML data - Defines how XML can be used within RIF

      Semantic Web - Data

      • URIs are used to identify resources, not just things that exists on the Web, e.g. Tim Berners-Lee
      • RDF is used to make statements about resources in the form of triples 
      • With RDFS, resources can belong to classes (my Mercedes belongs to the class of cars) and classes can be subclasses or superclasses of other classes (vehicles are a superclass of cars, cabriolets are a subclass of cars)

      Dereferencable URI

      • Disco essentially is just a nice way to represent RDF metadata such that people can actually browse it. So essentially it’s a representation mechanism for RDF triples. All the triples with the same subject are grouped on one page and then the predicates and objects form a table which someone can browse. When you click on an object, that object becomes the subject of the view and all predicates and objects of that subject become the visible.
      • The Dereferencable URI animation just means that the URI you provide must be dereferenceable or in less buzzword terms – the resource identified by the URI must be retrievable (or dereferenceable) from that URI

      Faceted DBLP

      • The search interface allows to search computer science publications in the collection starting from some keyword and shows the result set along with a set of facets, e.g., distinguishing publication years, authors, or conferences. The animation shows that the RDF metadata underlies the whole system, the different RDF predicates forming the different facets that the user can use to narrow down the result set. Note that the seminal appear on the WSMT comes first for a DBLP search for Dieter Fensel ;)

      Semantic Media Wiki

      • Semantic Media Wiki provides a combination of a Web 2.0 technology, namely Wikis, and semantic web. Users can add tags to the wiki data which auto generates RDF data. Information in the wiki can also be filled with queries, in the example the section on Knows is filled by asking the query <ask>[[ affiliation::DERI Innsbruck]]</ask>

      Legacy Systems

      Legacy Systems (cont')


      KIM platform

      • The KIM platform provides a novel infrastructure and services for:
        • automatic semantic annotation, 
        • indexing, 
        • retrieval of unstructured and semi-structured content.

      KIM Constituents

      • The KIM Platform includes:
        • Ontologies (PROTON + KIMSO + KIMLO) and KIM World KB
        • KIM Server – with a set of APIs for remote access and integration
        • Front-ends: Web-UI and plug-in for Internet Explorer.

      KIM Ontology (KIMO)

      • light-weight upper-level ontology
      • 250 NE classes
      • 100 relations and attributes:
      • covers mostly NE classes, and ignores general concepts
      • includes classes representing lexical resources

      KIM KB

      • KIM KB consists of above 80,000 entities (50,000 locations, 8,400 organization instances, etc.)
      • Each location has geographic coordinates and several aliases (usually including English, French, Spanish, and sometimes the local transcription of the location name) as well as co-positioning relations (e.g. subRegionOf.)
      • The organizations have locatedIn relations to the corresponding Country instances. The additionally imported information about the companies consists of short description, URL, reference to an industry sector, reported sales, net income,and number of employees.

      KIM is Based On…

      • KIM is based on the following open-source platforms: 
      • GATE – the most popular NLP and IE platform in the world, developed at the University of Sheffield. Ontotext is its biggest co-developer.
        www.gate.ac.uk and www.ontotext.com/gate
      • OWLIM – OWL repository, compliant with
        Sesame RDF database from Aduna B.V.
        www.ontotext.com/owlim
      • Lucene – an open-source IR engine by Apache. jakarta.apache.org/lucene/

      KIM Platform – Semantic Annotation

      KIM platform – Semantic Annotation (contd')

      • The automatic semantic annotation is seen as a named-entity recognition (NER) and annotation process.
      • The traditional flat NE type sets consist of several general types (such as Organization, Person, Date, Location, Percent, Money). In KIM the NE type is specified by reference to an ontology.
      • The semantic descriptions of entities and relations between them are kept in a knowledge base (KB) encoded in the KIM ontology and residing in the same semantic repository. Thus KIM provides for each entity reference in the text (i) a link (URI) to the most specific class in the ontology and (ii) a link to the specific instance in the KB. Each extracted NE is linked to its specific type information (thus Arabian Sea would be identified as Sea, instead of the traditional – Location).

      KIM platform – Information Extraction

      •  KIM performs IE based on an ontology and a massive knowledge base.


      KIM platform - Browser Plug-in

      Linked Open Data

      • Linked Data is a method for exposing and sharing connected data via dereferenceable URI’s on the Web
        • Use URIs to identify things that you expose to the Web as resources
        • Use HTTP URIs so that people can locate and look up (dereference) these things
        • Provide useful information about the resource when its URI is dereferenced
        • Include links to other, related URIs in the exposed data as a means of improving information discovery on the Web
      • Linked Open Data is an initiative to interlink open data sources
        • Open: Publicly available data sets that are accessible to everyone
        • Interlinked: Datasets have references to one another allowing them to be used togethe

      Linked Open Data (cont')

        Linked Open Data (cont')

         
        • Linked Open Data statistics:
          data sets: 121
          total number of triples: 13.112.409.691
          total number of links between data sets: 142.605.717

        Linked Open Data principles

        • Use URIs as names for things
          • anything, not just documents
          • you are not your homepage
          • information resources and non-information resources
        • Use HTTP URIs
          • globally unique names, distributed ownership
          • allows people to look up those names
        • Provide useful information in RDF
          • when someone looks up a URI
        • Include RDF links to other URIs
          • to enable discovery of related information

        Linked Open Data - FOAF

        • Friend Of A Friend (FOAF) provides a way to create machine-readable pages about:
          • People
          • The links between them
          • The things they do and create
        • Anyone can publish a FOAF file on the web about themselves and this data becomes part of the Web of Data 

        • FOAF is connected to many other data sets, including
          • Data sets describing music and musicians (Audio Scrobbler, MusicBrainz)
          • Data sets describing photographs and who took them (Flickr)
          • Data sets describing places and their relationship (GeoNames)

        Linked Open Data - GeoNames

        • The GeoNames Ontology makes it possible to add geospatial semantic information to the Web of Data
        • We can utilize GeoNames location within the FOAF profile

        • GeoNames is also linked to more datasets
          • US Census Data
          • Movie Database (Linked MDB)
          • Extracted data from Wikipedia (DBpedia)
        • As our FOAF profile has been linked to GeoNames, and GeoNames is linked to DBpedia, we can ask some interesting queries over the Web of Data
          • What is the population of the city in which Dieter Fensel lives?
            => 117916 people
          • At which elevation does Dieter Fensel live?
            => 574m
          • Who is the mayor of the city in which Dieter Fensel lives
            • => Hilde Zach

          Linked Open Data and Mobiles

          • Combination of Linked Open Data and Mobiles has trigger the emergence of new applications
          • One example is DBpedia Mobile that based on the current GPS position of a mobile device renders a map containing information about nearby locations from the DBpedia dataset.
          • It exploits information coming from DBpedia, Revyu and Flickr data.
          • It provides a way to explore maps of cities and gives pointers to more information which can be explored

          Summary

          • Semantic Web is not a replacement of the current Web, it’s an evolution of it
          • Semantic Web is about:
            • annotation of data on the Web
            • data linking on the Web
            • data Integration over the Web
          • Semantic Web aims at automating tasks currently carried out by humans
          • Semantic Web is becoming real (maybe not as we originally envisioned it, but it is)

          References

          •  T. Berners-Lee, J. Hendler, O. Lassila. The Semantic Web, Scientific American, 2001.
          • D. Fensel. Ontologies : A Silver Bullet for Knowledge Management and Electronic Commerce , 2nd Edition, Springer 2003. 
          • G. Antoniou and F. van Harmelen. A Semantic Web Primer , (2nd edition), The MIT Press 2008.
          • H. Stuckenschmidt and F. van Harmelen. Information Sharing on the Semantic Web , Springer 2004. 
          • T. Berners-Lee. Weaving the Web , Springer 2004. 
          • T. Berners-Lee. Weaving the Web , HarperCollins 2000 
          • T.R. Gruber, Toward principles for the design of ontologies used or knowledge sharing? , Int. J. Hum.-Comput. Stud., vol. 43, no. 5-6,1995 
          • http://en.wikipedia.org/wiki/Semantic_Web
          • http://www.w3.org/TR/rdf-primer/
          • http://en.wikipedia.org/wiki/Resource_Description_Framework 
          • http://en.wikipedia.org/wiki/Linked_Data
          • http://linkeddata.org/
          • http://www.w3.org/TR/rdf-mt/
          • http://www.w3.org/People/Ivan/CorePresentations/RDFTutorial
          • http://en.wikipedia.org/wiki/Semantic_Web
          • http://www.opengeospatial.org/projects/groups/sensorweb
          • http://www.data.gov.uk/

          References

          • http:/en.wikipedia.org/wiki/Resource_Description_Framework
          • http://en.wikipedia.org/wiki/Linked_Data
          • http://www.w3.org/TR/rdf-primer/
          • http://www.w3.org/TR/rdf-mt/
          • http://www.w3.org/People/Ivan/CorePresentations/RDFTutorial
          • http://linkeddata.org/
          • http://www.opengeospatial.org/projects/groups/sensorweb 
          • http://www.data.gov.uk/