IBM 1620 data processing machine, 1962

Who is this?

The Web

The Web accompanies the transition from an industrial to an information society and provides the infrastructure for a new quality of information handling regarding acquisition as well as provisioning

  • high availability
  • high relevance
  • low cost

 

The Web penetrates society

  • Social contacts (social networking platforms, blogging, ...)
  • Economics (buying, selling, advertising, ...)
  • Administration (eGovernment)
  • Work life (information gathering and sharing)
  • Recreation (games, role play, creativity, ...)
  • Education (eLearning, Web as information system, ...)

The current Web

 Immensely successful.

  • Huge amounts of information and data.
  • Syntax standards for transfer of structured data.
  • Machine-processable, human-readable documents.
BUT:
  • Content/knowledge cannot be accessed by machines.
  • Meaning (semantics) of transferred data is not accessible.

Limitations of the Web

Too much information with too little structure and made for human consumption
  • Content search is very simplistic
  • →  future requires better methods
Web content is heterogeneous
  • in terms of content
  • in terms of structure
  • in terms of character encoding
  • → future requires intelligent information integration
Humans can derive new (implicit) information from given pieces of information but on the current Web we can only deal with syntax
  • → requires automated reasoning techniques

What Google does not find

There are many information needs current search engines can not satisfy:

  • Apartments for rent close to well rated Thai restaurants
  • Bi-lingual English-German child care in Berlin reachable in 15 minutes from my place of work
  • Kid-friendly holiday destinations with culture and sports activities
  • Researchers working in south-east Asia on information retrieval topics
  • ERP service providers with offices in Vienna and Berlin
  • ...

We have subconsciously learned not to ask search engines such questions.

In principle, all the required knowledge is on the Web – most of it even in machine-readable form. However, without automated data integration, processing (and reasoning) we cannot obtain a useful answer.

What's the problem with the Web

  • inability to integrate and fuse information from different sources
  • there is lack of comprehensive background knowledge to interpret information found on the Web
  • current Web search is restricted to text in a certain language - there are many “smaller” languages with much less information available than in English

Basic ingredients for the Semantic Web

  • Open Standards for describing information on the Web
  • Methods for obtaining further information from such descriptions

 
We’ll talk about these matters in this course.

Data Models, Access & Integration

Data Integration Enterprise Information Integration
sets of heterogeneous data sources
ppear as a single, homogeneous data source
Data Warehousing
*Based on extract, transform load (ETL)
*Global-As-View (GAV)
Research
*Mediators
*Ontology-based
*P2P
*Web service-based
Data Web
*URIs as entity identifiers
*HTTP as data access protocol
*Local-As-View (LAV)
Data Access Object relational mappings (ORM)
*NeXT’s EOF / WebObjects
*ADO.NET Entity Framework
*Hibernate
Procedural APIs
*ODBC
*JDBC
Query Languages
*Datalog, SQL
*XPath/XQuery
*SPARQL
Linked Data
*de-referenceable URIs
*RDF serialization
Data Models   RDBMS
*organize data in relations, rows, cells
*Oracle, DB2, MS-SQL
     

LOD Cloud May 2007

LOD = Linked Open Data

LOD Cloud October 2007

LOD Cloud February 2008

LOD Cloud September 2008

LOD Cloud March 2009

 
Colours indicate different domains; e.g.: orange = social networks

LOD Cloud September 2010

LOD Cloud September 2011

LOD Cloud August 2014

First update after 3 years – what do you think were the reasons?

 

 

 

 

 

LOD Cloud February 2017LOD cloud February 2017

The Web of Data

 
  • >70 bilion facts
  • covering many different domains (life-sciences, geo, user generated content, government, bibiographic, ...)

Map to the Semantic Web

The Semantic Data Web Stack

User Interface & Applications Trust Crypto Proof Unifying Logic Rules: RIF Ontology: OWL Query: SPARQL RDF-Schema Data Interchange: RDF XML URI Unicode

… also known as “layer cake”

How Mark A. Greenwood realised the Semantic Web:

URIs and Unicode

  • URI = Uniform Resource Identifier
  • Used to create globally unique names for resources
  • Every object with clear identity can be a resource
    • Books, places, organizations ...
  • In the books domain the ISBN serves the same purpose
  • IRIs: Unicode-aware extension of URIs (I = Internationalized)

Resource Description Framework – RDF

 Information is represented in RDF in triples (also called statements, facts):

  • Modeled on linguistic categoriesbut not always consistent
  • Allowed assignments:
    • Subject: URI or blank node
    • Predicate: URI (a.k.a. property)
    • Object: URI, blank node or literal
  • Node and edge labels should be unambiguous, so that the original graph is reconstructable from a list of triples

 

RDF Schema

Not all triples make sense:

Cinema  AlbertEinstein  2012

How can we constrain the use of RDF? 

RDFS (S = “Schema”) allows to define classes, properties and restrict their use.

SPARQL – Query Language for RDF

SELECT * WHERE { jwebsp:John  foaf:knows  ?friend }

Web Ontology Language – OWL

  • OWL: acronym for Web Ontology Language, more easily pronounced than WOL
  • family of languages for authoring ontologies
  • since 2004, OWL 2.0 since 2009     
  • Semantic fragment of FOL

Features

  • Instantiation of classes by individuals
  • Concept hierarchies (taxonomies, inheritance): classes, terms
  • Binary relations between individuals: Properties, Roles
  • Properties of relations (e.g., range, transitive)
  • Data types (e.g. Numbers): concrete domains
  • Logical means expression
  • Clear semantics!

RDFa Content Editor – RDFaCE

supports the automatic semantic annotation of texts

http://rdface.aksw.org/

Literature

  • Pascal Hitzler, Markus Krötzsch, Sebastian Rudolph: Foundations of Semantic Web Technologies, Chapman & Hall/CRC, 2009, 455 pages, hardcover, ISBN: 9781420090505, http://www.semantic-web-book.org
  • Amit Sheth, Krishnaprasad Thirunarayan:  Semantics Empowered Web 3.0: Managing Enterprise, Social, Sensor, and Cloud-based Data and Services for Advanced Applications (Synthesis Lectures on Data Management),  Morgan & Claypool Publishers (December 19, 2012), ISBN: 1608457168
  • Tom Heath, Christian Bizer:  Linked Data (Synthesis Lectures on the Semantic Web: Theory and Technology), Morgan & Claypool Publishers; 1 edition (February 20, 2011), ISBN: 1608454304. http://linkeddatabook.com

Questions

All the corresponding questions for the Introductions are covered in the Questions part of the Deck.

 

 

Motivation

How do you encode the piece of knowledge:
"The theory of relativity was discovered by Albert Einstein" 

<theory>                                        
  <name>Theory of Relativity</name>             
  <discoverer>Albert Einstein</discoverer>      
</theory>                                       

______________________ or _________________________
<person>                                              
<name>Albert Einstein</name>                        
  <discovered>Theory of Relativity</discovered>    
</person>                                        
______________________ or _________________________
<person name="Albert Einstein">                
  <discovered>Theory of Relativity</discovered>
</person>                                      
___________________________________________________
There is no unique way (in XML) to represent knowledge.
Information represented in such ways is not easy to integrate. (Why?)
RDF helps to solve this problem.

Goals

  • Understand the RDF data model, including
    • URI and IRI concepts
    • Triples
    • Resources
    • Literals
    • Blank nodes
    • Lists

Prerequisites

  • Basic understanding of Web technologies, data types

RDF Overview

  • RDF = Resource Description Framework
  • W3C Recommendation since 1998
  • RDF is a data model
    • Originally used for metadata for web resources, then generalized
    • Encodes structured information
    • Universal, machine readable exchange format
  • Data structured in graphs
    • Vertices, edges

Parts of the RDF graph

  •  URIs
    • Used to reference resources unambiguously
  • Literals
    • Describe data values with no clear identity like "100 km/h"
  • Blank nodes
    • Facilitate existential quantification for an individual with certain properties without naming it

Example of an RDF graph

 

RDF Triple

 Components of an RDF triple:

  • Modeled using linguistic categories (but not always consistent)
  • Allowed assignments:
    • Subject: URI or blank node
    • Predicate: URI (a.k.a. property)
    • Object: URI, blank node or literal
  • Node and edge labels should be unambiguous,
    so that the original graph is reconstructable from triple list

URI

  • URI = Uniform Resource Identifier
  • Used to create globally unique names for resources
  • Every object with a clear identity can be a resource
    • Books, places, organizations ...
  • In books domain the ISBN serves the same purpose

URI Syntax

  • Extension of the URL concept
     
  • Not every URI denotes a web document, but the URL is often used as URI for web documents
     
  • Starts with URL schema, which is separated from the rest by ":"
    • examples: http, ftp, mailto, file
       
  • Typically hierarchical structure
    • [scheme:][//authority][path][?query][#fragment]

Self-defined URIs

  • Necessary if resource has no URI yet or URI is not known
  • Use HTTP URIs of own website to avoid naming collisions
  • Facilitates creation of documentation of URI at this location
  • Example: http://jens-lehmann.org/foaf.rdf#i
  • Separation of URI for …
    • a resource (a real-world thing)
    • and its documentation (e.g. an HTML page)
    … with the help of URI references (with “#”-attached fragments) or content negotiation
  • Example: URI for Shakespeare's "Othello":
    • bad (why?): http://de.wikipedia.org/wiki/Othello
    • good: http://de.wikipedia.org/wiki/Othello#URI

IRIs

  • IRI = Internationalized Resource Identifier
  • Generalization of URI concept
  • IRI can contain Unicode
  • Example:
    • http://www.example.org/Wüste
    • http://www.example.org/사막


Literals

  • Used to model data values
  • Representation as strings
  • Interpretation through datatype
  • Literals without datatype are treated as strings
  • Literals may never be the origin of a node of an RDF graph
  • Edges may never be labeled with literals

Turtle Syntax

 
  • Language to serialize RDF Triples to strings
  • Turtle – Terse RDF Triple Language  
  • URIs in angle brackets: <http://dbpedia.org/resource/Leipzig>
  • Literals in quotes
    • "Leipzig"@de 
    • "51.333332"^^xsd:float
  • Triples are subject-predicate-object sentences terminated with a dot.
    <http://dbpedia.org/resource/Leipzig> <http://www.w3.org/2000/01/rdf-schema#label> "Leipzig"@de.
    
  • Whitespace and line breaks are ignored outside of identifiers
  • Status:  W3C Recommendation, http://www.w3.org/TR/turtle/

Turtle Abbreviations (1/2)

In Turtle one can use abbreviations

  • Syntax: @prefix abbr ':'  <URI> .
  • E.g. @prefix dbr:  <http://dbpedia.org/resource/> .

One can transform

<http://dbpedia.org/resource/Leipzig> <http://www.w3.org/2000/01/rdf-schema#label> "Leipzig"@de.

into

@prefix dbr: <http://dbpedia.org/resource/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
 
dbr:Leipzig rdfs:label "Leipzig"@de .

Turtle Abbreviations (2/2)

  • Triples with the same subject can be grouped together
    @prefix rdf: ...
    @prefix geo: ...
    dbr:Leipzig dbp:hasMayor dbr:Burkhard_Jung ;
                rdfs:label   "Leipzig"@de ;
                geo:lat      "51.333332"^^xsd:float ;
                geo:long     "12.383333"^^xsd:float .   
    
  • Even triples with the same subject and predicate can be grouped together
    @prefix dbr: ...
    @prefix dbp: ...
    dbr:Leipzig dbp:locatedIn dbr:Saxony, dbr:Germany;
                dbp:hasMayor  dbr:Burkhard_Jung .
    

Literals II – Datatypes

  •  Example: xsd:decimal

Datatypes in RDF

  • So far: literals are untyped, treated as strings:
    "02" < "100" < "11" < "2"
  • Typing allows better, in other words, semantic interpretation of values
  • Datatypes get identified by URIs and are freely choosable
  • Typically usage of XML Schema Datatypes (XSD)
  • Syntax: "data value"^^<datatype-URI>
  • rdf:HTML and rdf:XMLLiteral are the only predefined datatypes in RDF
    • Used for HTML and XML fragments

Example

 Graph:

 

 
Turtle:
 
 
@prefix dbr: <http://dbpedia.org/resource/> .
@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
 
 
dbr:Leipzig    geo:lat      "51.333332"^^xsd:float ;
                           geo:long   "12.383333"^^xsd:float .

Language declaration

  • Influences only untyped literals
    Example:  
  • In RDF 1.0 the following literals were all different,
    but implementations typically treated them the same.
  • As of RDF 1.1 "Leipzig" is a shorthand for "Leipzig"^^xsd:string.

n-ary relations I

 Cooking with RDF

"For the preparation of mango chutney you need 450g of green mango , a teaspoon of cayenne pepper ..."

1st attempt to model this recipe:

@prefix ex: <http://example.org/> .
 
ex:Chutney ex:hasIngredient "450g green mango",
                                                            "1tsp Cayenne pepper" .  

Not satisfying:

  • Ingredients and amounts coded as strings
  • Search for recipes which contain green mango not easily possible

 

n-ary relations II

 Cooking with RDF

“For the preparation of a mango chutney you need 450g of green mango, a teaspoon of Cayenne pepper …”

2nd attempt to model this recipe:

@prefix ex: <http://example.org/> .
 
ex:Chutney   ex:ingredient    ex:GreenMango;
             ex:amount        "450g" ;
             ex:ingredient    ex:CayennePepper;
             ex:amount        "1tsp" .

Even worse:

  •  No unambiguous association between ingredient and amount possible

n-ary relations III

 Problem: it is a real trivalent, or ternary relationship (see e.g. databases)

 

 Recipe Ingredient Amount
Mango Chutney green Mango 450g
Mango Chutney Cayenne pepper 1 tsp
 
 
 
 
 
 
  • Directly not possible to express in RDF
  • Solution: introduction of helper nodes

n-ary relations IV

Helper nodes in RDF:

As graph:

In Turtle Syntax:
 
@prefix ex: <http://example.org/> .
ex:Chutney             ex:hasIngredient ex:ChutneyIngredient1 .
ex:ChutneyIngredient1  ex:ingredient    ex:GreenMango;
                       ex:amount        "450g" .

Blank nodes

  • Blank nodes can be used for resources which don't need to be named
  • Can be read as existential statements


Turtle Syntax:

@prefix ex: <http://example.org/> .
ex:Chutney ex:hasIngredient _:id1 .
_:id1 ex:ingredient   ex:GreenMango;
      ex:amount        "450g" .
# -----------------------------------------------------------
# can be shortened:
ex:Chutney ex:hasIngredient  [ ex:ingredient ex:GreenMango;
                               ex:amount     "450g" ] .

Lists

  • General data structures for enumerating arbitrarily many resources
     
  • Distinction between
     
    • Container: adding new elements possible
      ordered and unordered container types
       
    • Collections: ordered list; adding new elements impossible
       
  • Can be modeled with previously presented tools, so no additional expressiveness

Types of Container

  • The list root node is assigned one of the following rdf:types:
     
    • rdf:Seq
      • Interpretation as ordered list, sequence
         
    • rdf:Bag
      • Interpretation as unordered set
      • Order coded in RDF not relevant
         
    • rdf:Alt
      • Set of alternatives
      • Usually only one list element relevant

Container

Collections

 Idea: recursive partition of list into a head element and (possibly empty) rest list

Turtle Syntax (Shortened Notation with brackets)
 
@prefix ex: <http://example.org/> .
ex:AKSW ex:groupLeaders (ex:Sören ex:Jens ex:Axel) .

Summary

  • Extensively supported standard for storing and exchanging data
     
  • Enables almost syntax-independent representation of distributed information in a graph based data model
     
  • Pure RDF is very individual oriented
     
  • Almost no possibility to represent schema
    • See RDF-Schema lecture

News in RDF 1.1

  • W3C Recommendation as of February 2014
     
  • Previous versions of RDF used the term “ RDF URI Reference
    instead of “ IRI ” and allowed additional characters:
    <, >, {, }, |, \, ^, `,  (double quote), and “ ” (space).
     
  • In IRIs, these characters must be percent-encoded as described in
    section 2.1 of [ RFC3986 ].
     
  • Literals with language tag now also have a datatype IRI

Tasks & mini projects

This slide contains some suggestions for tasks and mini projects you can complete in addition to the multiple-choice self-assessment test in order to practice and prepare for an exam:

  • Explain the components of the RDF data model!
     
  • Create a small knowledge base in Turtle describing a domain of your choice (e.g. your family)!
     
  • Write an RDF resource describing yourself in Turtle with labels in two different languages, your birthday, and age!
     
  • Draw an RDF graph for  representing a recipe for cup cakes!
     
  • Create an RDF list of European countries! 

References

Exercise 1 - Turtle

  1. Create a small knowledge base in Turtle describing a domain of your choice (e.g. your family)

Exercise 2 - RDF resource description & RDF graph

 

  1. Write an RDF resource description describing yourself in Turtle with labels in two different languages, your birthday and age! 
     
  2. Draw an RDF graph for  representing a cooking recipe (cup cakes!)

Create a small knowledge base in Turtle
describing a domain of your choice (e.g. your family) (i)

@prefix ex:  <http://example.org/> .

@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema> .

ex:FamDoe

rdf:type

ex:Family ;

 

ex:hasMember

ex:JoeDoe , ex:MaryDoe , ex:ChrisDoe ;

 

ex:livesIn

ex:Bonn ;

 

ex:streetAddress

"Römerstraße"@de .

     

ex:JoeDoe

rdf:type

ex:Person , ex:Father ;

 

ex:name

"Joe Doe" ;

 

ex:age

51 .

Create a small knowledge base in Turtle describing a domain of your choice (e.g. your family) (ii)

ex:MaryDoe

rdf:type

ex:Family, ex:Mother ;

ex:name

"Mary Doe" ;

ex:age

50 .

ex:ChrisDoe

rdf:type

ex:Person , ex:Child ;

ex:name

"Chris Doe" ;

ex:age

20 .

 

Write an RDF resource description describing yourself in Turtle with labels in two different languages, your birthday and age

 

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix ex: <http://www.example.com/> .

ex:Person1   rdf:type ex:Person ;
                       
ex:fullName  "John Doe"@en ;
                       
ex:fullName  "Giovanni Doe"@it ;
                       
ex:birthdate  "1945-08-14"^^xsd:dateTime ;
                       
ex:age             "68"^^xsd:float .

Draw an  RDF graph for  representing a recipe for cup cakes

Exercise 3

 

 

Create an RDF list of European countries:

@prefix ex: <http://example.org/> .
@
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

ex:Europe rdf:type ex:Continent ;
  ex:
countries (ex:Germany ex:Malta ex:Italy …) .

 

 

Semantic Data Web Stack – RDF

Data Interchange: RDF URI User Interface & Applications Trust Crypto Proof Unifying Logic Ontology: OWL Rules: RIF Query: SPARQL RDF-Schema XML Unicode

Goals

  • Syntaxes for RDF graphs
  • Almost all of them support the abbreviation of long URIs.
  • Different serialization formats:
    • Notation 3 (N3)
    • RDF/XML
    • JSON-LD
    • RDFa
    • Turtle, N-Triples

Prerequisites

  • Basic knowledge of RDF stack
  • Understand the RDF data model
  • Basic understanding of XML (for the RDF/XML serialization)
  • Basic understanding of HTML (for RDFa)
  • All namespace prefixes used in the following slides resolve to the respective defaults from http://prefix.cc/

What are URIs?

  • URI = Uniform Resource Identifier
  • Used for worldwide, unique identification of resources
  • Every object (in the context of the application) maybe a resource
    • As long as it has a unique identity
    • E.g. books, places, people, relation between those things, abstract concepts
  • Unique Identifiers were already used for other and more specific domains, e.g. ISBN for books or tax identification numbers for people
  • Extension of the URL concept:
    • Not every URI belongs to a webpage, but often a URL is used as a URI for web pages

Syntax of URIs

  • Tim Berners-Lee submitted 1994 the RFC 1630 about URIs
    • Starts with the URI schema  
    •  Protocol (e.g. http, ftp, mailto) and hierarchy separated by ':'
      • Queries parameters can be appended using a leading '?'
      • Fragment identifiers can be appended using a leading '#'

      protocol ":"  hierarchy  [ "?" query ] [ "#" fragment]

Self-defined URIs

  • Needed if a resource has no URI yet
  • Possible strategy to avoid overlapping URIs
    • Use HTTP URIs of your own webspace!
    • It is also possible to publish documentation of the URI at this place
    • E.g. http://jens-lehmann.org/foaf.rdf#i (a person, not a document)

Other Identification Systems

  •  IRI = Internationalized Resource Identifier
    • Generalization of URI, can contain Unicode characters
    • E.g. http://www.example.org/Wüste
  • URN = Uniform Resource Name
    • Subset of URIs, used for identifying resource with freely choosable names
    • Intended for worldwide unique and persistent identification
    • E.g. urn:issn:0167-6423 URN of a Spider Man movie
  • ISBN = International Standard Book Number
    • E.g. ISBN 978-3-86680-192-9
  • ISSN = International Standard Serial Number
    • E.g. ISSN 1234-5678
  • DOI = Digital Object Identifier
    • E.g. DOI 10.1000/182

Expressiveness of RDF formats

 

Most popular formats

  • Various serialization formats for different purposes (as on the Venn diagram on the slide before) are:
    • N-Triples – a text format focusing on simple parsing
    • Turtle – a text format focusing on human readability
    • Notation 3 (N3) – a text format with advanced features beyond RDF
    • RDF/XML – the official XML serialization of RDF
    • JSON-LD – the official JSON serialization of RDF (supersedes earlier alternative approaches, e.g. RDF/JSON)
    • RDFa – a mechanism for embedding RDFa in (X)HTML

Notation 3 (N3) 1/2

  • Designed for human readability 
  • Formal language
    • additional formulas, rules and variables
  • developed by Tim Berners-Lee et al. as a W3C team submission http://www.w3.org/TeamSubmission/n3/
  • Mime Type text/n3, UTF-8  
XML Notation:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"   
         xmlns:dc="http://purl.org/dc/elements/1.1/">
   <rdf:Description rdf:about="http://en.wikipedia.org/wiki/Tony_Benn">  
      <dc:title>Tony Benn</dc:title>  
      <dc:publisher>Wikipedia</dc:publisher>
   </rdf:Description>
</rdf:RDF>

Notation 3 (N3) 2/2

  
XML Notation:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"   
         xmlns:dc="http://purl.org/dc/elements/1.1/">
   <rdf:Description rdf:about="http://en.wikipedia.org/wiki/Tony_Benn">  
      <dc:title>Tony Benn</dc:title>  
      <dc:publisher>Wikipedia</dc:publisher>
   </rdf:Description>
</rdf:RDF>

Notation 3:

@prefix dc: <http://purl.org/dc/elements/1.1/>.
<http://en.wikipedia.org/wiki/Tony_Benn> dc:title "Tony Benn"; 
                                         dc:publisher "Wikipedia".

Features of N3 (1/2)

  • N3 is a formal language that goes beyond RDF
  • N3 is a superset of SPARQL graph patterns, Turtle and N-Triples
  • N3 is more powerful than Turtle and RDF/XML with respect to its formalism
  • N3 is based on a context free grammar allowing to parse it easily

Features of N3 (2/2)

  • Some featured concepts are:
    • variable
    • formula
    • set of universal variables of F
    • set of existential variables of F
    • set of statements of F
    • datatypes: string, integer
    • list, list elements
    • length of list
    • expression
    • set
  • Further information:
    • http://www.w3.org/2000/10/swap/grammar/n3-report.html
    • http://www.w3.org/DesignIssues/Notation3.html

Syntax of N3

  • Triple format: Subject Predicate Object.
  • Everything has to be identified by a URI
    • including relative URIs such as <#fragment>, …
    • … which are relative to the base URI, …
    • … which defaults to the document's URL.
  • Exception: _:node is a blank node ID
  • Exception: Object can be a literal
<#pat> <#knowsAbout> <http://www.w3.org/2000/10/swap/Primer> .
<#pat> <#hasBrother> _:ian .
_:ian <#age> 24 .

Abbreviations

  •   URIs relative to the base URI can be empty:
<> <http://purl.org/dc/elements/1.1/title> "RDF Serializations".
  • One can use prefixes to shorten the text
@prefix dc:  <http://purl.org/dc/elements/1.1/> .
<> dc:title  "RDF Serializations".
  • NOTE: when you use a prefix, you use a colon instead of a hash/slash between dc and title, and you don't use the <angle brackets> 
  • If you have several statements about the same subject you can use either  a semicolon ; to introduce new predicates or a comma to introduce new objects
<> <#subsections>  <#RDF/XML>, <#JSON>, <#RDFa> ;
   <#madeBy>    "slidewiki.org" ;
   <#creatorOfThisDeck> <http://www.informatik.uni-leipzig.de/~auer/foaf.rdf>.

Defining types

  • You can define your own classes on demand, within the data:
@prefix : <#> .
@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
:Person rdf:type rdfs:Class .
  • Note: we defined an empty prefix.
  • We even can abbreviate rdf:type with a
@prefix : <#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
:Person a rdfs:Class .

Equivalence of vocabularies

  • When writing own vocabularies one often notices that an own concept is the same as in another vocabulary
  • N3 has a special and short mechanism to align vocabularies, i.e. "=" (shorthand for owl:sameAs)
:Woman = foo:FemaleAdult .
:Title a rdf:Property; = dc:title .
  • So one can align classes and properties pretty easily

Examples of Features

First example:
@prefix log: <http://www.w3.org/2000/10/swap/log#>.
@keywords.
@forAll x, y, z. {x hasParent y. y hasSister z} log:implies {x hasAunt z}
Given the following data:
Joe hasParent Alan.
Alan hasSister Susie.
A reasoner could conclude:
Joe hasAunt Susie.

Examples continued

Second example:
@forAll :x, :y, :z.
{ 
  :x :wrote :y.
  :y log:includes { :z :weather :w } .
  :x :livesIn :z .
} log:implies {
  :z :weather :w .
}.
Together with the data:
:Bob   :livesIn  :Boston.
:Bob   :wrote    { :Boston :weather :sunny }.
:Alice :livesIn  :Adelaide.
:Alice :wrote   { :Boston :weather :cold }.
a reasoner could conclude:
:Boston :weather :sunny.

Advantages and Disadvantages of N3

  • Advantages
    • Much more compact and readable than XML-based RDF
    • Possibility of defining types, variables and even formulas
    • More powerful than Turtle and N-Triples
  • Disadvantages?

Turtle Syntax

  • Turtle – Terse RDF Triple Language (subset of N3)
  • URIs in angle brackets: <http://dbpedia.org/resource/Berlin>
  • Literals in quotes:
    • "Berlin"@de 
    • "51.333332"^^xsd:float
  • A triple is terminated by a dot:
  • <http://dbpedia.org/resource/Leipzig> <http://www.w3.org/2000/01/rdf-schema#label> "Leipzig"@de .
  • White spaces and line breaks are ignored outside of identifiers
  • Status:  W3C Recommendation 25 February 2014 http://www.w3.org/TR/turtle/

Turtle Abbreviations (1/2)

  •  In Turtle one can use abbreviations
    • Syntax: @prefix abbr ':'  <URI> .
    • E.g. @prefix dbr: <http://dbpedia.org/resource/> .
  • One can transform
  • into    

Turtle Abbreviations (2/2)

  • Triples with the same subject can be grouped together
    @prefix rdf: 
    ...
    @prefix geo: 
    
    dbr:Berlin  dbpedia:country  dbpedia:Germany ;
                rdfs:label       "Berlin"@de ;      
    
  • Even triples with the same subject and predicate can be grouped together
    @prefix dbr: 
    ...
    @prefix dbp: 
    
    dbr:Leipzig dbp:locatedIn dbr:Saxony, dbr:Germany;
                dbp:hasMayor  dbr:Burkhard_Jung .
    

Advantages and Disadvantages of Turtle

  •  Advantages:
    • Concise, thus efficient to store
    • Easy to read for humans
  • Disadvantages:
    • Limited tool support so far (compared to RDF/XML)

N-Triples

  • N-Triples is a line-based, plain text format (http://www.w3.org/TR/n-triples/)
  • N-Triples is a subset of Turtle and Notation 3
    • Abbreviations and groupping not allowed
    • Limited to ASCII character set
  • All tools which support input in either of those formats above will support N-Triples
  • Don't confuse it with Notation 3: Notation 3 is a superset of Turtle and N-Triples
<http://www.w3.org/2001/sw/RDFCore/ntriples/> <http://purl.org/dc/elements/1.1/creator> "Dave Beckett".
<http://www.w3.org/2001/sw/RDFCore/ntriples/> <http://purl.org/dc/elements/1.1/creator> "Art Barstow".
<http://www.w3.org/2001/sw/RDFCore/ntriples/> <http://purl.org/dc/elements/1.1/publisher> <http://www.w3.org/>.

N-Quads

  •  Extends N-Triples with context
<subject> <predicate> <object> <context> .
  • <context> may denote (in state-of-the-art RDF Stores) the provenance of data
    • useful when linking datasets
<http://example.org/bob/foaf.rdf#me> <http://xmlns.com/foaf/0.1/homepage>

<http://example.org/bob/> <http://example.org/bob/foaf.rdf> .
  • <context> can be a URI or a blank node or a literal

Why one should (not) use XML for RDF?

Why?

  • Better support of tools in many programming languages and environments
  • Wide spread of XML in both business and academia
  • RDF standard states that if  RDF data is published it should be available in RDF/XML http://www.w3.org/RDF/

Why not?

  • RDF/XML is complicated to understand due to the encoding of a graph in triples and finally in an XML tree
  • RDF/XML blows up files (might be mitigated by compression)
  • generates much overhead, since XML documents have to be parsed and the results additionally processed to obtain the RDF data
 

XML Syntax of RDF

  • Usage of XML namespaces to disambiguate tag names 
  • RDF tags have their own fixed namespace, usually abbreviated using prefix rdf
 
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
	 xmlns:xsd= "http://www.w3.org/2001/XMLSchema#"        
         xmlns:dbp="http://dbpedia.org/property/"          
	 xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#">
        
<rdf:Description rdf:about="http://dbpedia.org/resource/Leipzig">
	<dbp:hasMayor  rdf:resource="http://dbpedia.org/resource/Burkhard_Jung"/>
	<rdfs:label xml:lang="de">Leipzig</rdfs:label>
	<geo:lat rdf:datatype="float">51.3333</geo:lat>
	<geo:long rdf:datatype="float">12.3833</geo:long>
</rdf:Description>
</rdf:RDF>
 

XML Syntax: rdf:Description

  • Each rdf:Description element stands for a subject
    • URI is the value of rdf:about attribute
  • Each element of rdf:Description stands for a predicate-object pair
    • Name of the child element is the predicate name
    • Value of rdf:resource is the URI of the object
 

<rdf:Description
rdf:about="http://dbpedia.org/resource/Leipzig">
<rdfs:label xml:lang="de">Leipzig</rdfs:label>
</rdf:Description>
 

XML Syntax: Abbreviations

  • Literals can be enclosed by predicates as free form text
  • One subject can contain several property elements
  • One object description can nest several further subjects, e.g.
 
<rdf:Description
	rdf:about="http://dbpedia.org/resource/Leipzig">
	<dbr:name>Leipzig</dbr:name>
	<dbp:hasMayor>
		<rdf:description
			rdf:about="http://dbpedia.org/resource/Burkhard_Jung">
			<dbp:name>Burkhard Jung</dbp:name>
		</rdf:description>
	</dbp:hasMayor>

	<geo:lat rdf:datatype=".../XMLSchema#float">51.3333</geo:lat>
	<geo:long rdf:datatype=".../XMLSchema#float">12.3833 </geo:long>
</rdf:Description>

XML Syntax: Attributes for Literals

  • Literals can expressed using XML Attributes
  • Attribute names will be property URIs
  • Subject URI given by rdf:about
<rdf:Description rdf:about="http://dbpedia.org/resource/Leipzig"
          dbp:name="Leipzig">
          <geo:lat rdf:datatype="float">51.3333</geo:lat>
          <geo:long rdf:datatype="float">12.3833</geo:long>
</rdf:Description>

XML Syntax base URIs

  • Definition of a base URI, against which relative URIs resolve
  • Relative URIs have no schema part.
  • Resolution is often string concatenation (base + relative), but also has more complicated cases (see RFC 3986)
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xml:base="http://dbpedia.org/resource/" >
  <rdf:Description rdf:about="Berlin">
    <property:country rdf:resource="Germany"/>
  </rdf:Description>
</rdf:RDF>

Advantages and Disadvantages of RDF/XML

  • Advantages:
    • Good tool support
    • Reuse of XML transformation tools via XSLT
    • Parsing and in-memory representation via DOM/SAX
  • Disadvantages:
    • Very long and hard to read

RDFa Syntax

  • RDFa = RDF in attributes
  • Developed to embed RDF into HTML and XML
  • Embedded triples can be accessed or extracted
  • IRIs can be used
    (XML and HTML nowadays typically encoded as UTF-8 Unicode)
  • RDFa = Microformats done right ;-)

Motivation

 presentation vs. semantics

On the left, what browsers see. On the right, what humans see. Can we bridge the gap so that browsers see more of what we see?

The schema.org Vocabulary

schema.org :

Example: Movie description.

What the user sees:
Avatar
Director: James Cameron (born August 16, 1954)
Science fiction
Trailer

Movie homepage in HTML

What the browser sees:

 
<div class="movie">
	<h1>Avatar</h1>
	<div class="director">
		Director: James Cameron (born August 16, 1954)
	</div>
	<span class="genre">Science fiction</span>
	<a href="../movies/avatar-theatrical-trailer.html">Trailer</a>
</div>

Movie homepage with RDFa

RDFa annotations using the schema.org vocabulary:

 
<div vocab="http://schema.org/" typeof="Movie">

	<h1 property="name">Avatar</h1>

	<div rel="director" typeof="Person">
                Director: <span property="name">James Cameron</span>

		(born <span property="birthDate" content="1954-08-16"> August 16, 1954</span>)
	</div>

	<span property="genre" xml:lang="en">Science fiction</span>

	<a href="../movies/avatar-theatrical-trailer.mp4" rel="trailer"> Trailer</a>
</div>

RDFa Visualised as a Graph

You can do this interactively with RDFa Play .

<div vocab="http://schema.org/" typeof="Movie">

	<h1 property="name">Avatar</h1>

	<div rel="director" typeof="Person">
                Director: <span property="name">James Cameron</span>
		(born <span property="birthDate" content="1954-08-16"> August 16, 1954</span>)
	</div>

	<span property="genre" xml:lang="en">Science fiction</span>
	<a href="../movies/avatar-theatrical-trailer.mp4" rel="trailer"> Trailer</a>
</div>

Social Data with schema.org

Example: reviews of a movie

schema.org in a Search Engine

Google calls these “Rich Snippets”.

  • Observe the rich appearance of the 3rd result.
  • Note: Annotation does not influence ranking.

CURIEs (1/2)

Short notation for URIs (Compact URIs)

prefix    ::= NCName

reference ::= ( ipath-absolute / ipath-rootless / ipath-empty ) 
[ "?" iquery ] [ "#" ifragment ] (as defined in [RFC3987])

curie     ::=  [ [ prefix ] ':' ] reference

 

CURIEs (2/2)

RDFa requires the following context information:

  • the set of mappings from prefixes to URIs is provided by the current in-scope prefix declarations of the [current element] during parsing;
  • the mapping to use with the default prefix (e.g. :name) is the current default prefix mapping;
  • the mapping to use when there is no prefix is not defined for RDFa in HTML, which effectively prohibits the use of CURIEs that do not contain a colon
    • (however with default terms and a default vocabulary there are further convenience mechanisms)
  • the mapping to use with the '_' prefix , is not explicitly stated, but since it is used to generate [bnode]s, its implementation needs to be compatible with the RDF definition.

RDFa Attributes (1/3)

Some of the following attributes are reused from HTML; in particular, href and src are redundant with resource but convenient in HTML.

  • about a CURIE or IRI, used for stating what the data is about (a subject in RDF terminology);
  • content a CDATA string, for supplying machine-readable content for a literal (a literal object, in RDF terminology);
  • datatype a term or CURIE or absolute IRI representing a datatype, to express the datatype of a literal;
  • href a traditionally navigable IRI for expressing the partner resource of a relationship (a resource object, in RDF terminology);
  • inlist An attribute used to indicate that the object associated with a rel or property attribute on the same element is to be added to the list for that predicate. The value of this attribute must be ignored. Presence of this attribute causes a list to be created if it does not already exist.

RDFa Attributes (2/3)

  • prefix a white space separated list of prefix-name IRI pairs of the form NCName ':' ' '+ xsd:anyURI
  • property a white space separated list of terms or CURIEs or absolute IRIs, used for expressing relationships between a subject and either a resource object if given or some literal text (also a predicate);
  • rel a white space separated list of terms or CURIEs or absolute IRIs, used for expressing relationships between two resources (predicates in RDF terminology);
  • resource a CURIE or IRI for expressing the partner resource of a relationship that is not intended to be navigable (e.g., a “clickable” link) (also an object);
  • rev a white space separated list of terms or CURIEs or absolute IRIs, used for expressing reverse relationships between two resources (also predicates);

RDFa Attributes (3/3)

  • src an IRI for expressing the partner resource of a relationship when the resource is embedded (also a resource object);
  • typeof a white space separated list of terms or CURIEs or absolute IRIs that indicate the RDF type(s) to associate with a subject;
  • vocab A IRI that defines the mapping to use when a term is referenced in an attribute value.

RDFa Example

<html prefix="rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
	      rdfs: http://www.w3.org/2000/01/rdf-schema#
              xsd: http://www.w3.org/2001/XMLSchema#
              dbr: http://dbpedia.org/resource/
              geo: http://www.w3.org/2003/01/geo/wgs84_pos#">

<head>
	<title>Leipzig</title>
</head>

<body about="dbr:Leipzig">

	<h1 property="rdfs:label" xml:lang="de">Leipzig</h1>

	<p>
		Leipzig is a city in Germany. It is located at latitude
        	<span property="geo:lat" datatype="xsd:float">51.3333</span>
			and longitude
		<span property="geo:long" datatype="xsd:float">12.3833</span>.
	</p>
</body>
</html>
 

RDFa Lite

<p vocab="http://schema.org/" typeof="Person">
	
        My name is
    
	<span property="name">Sören Auer</span>
    
	and you can give me a ring via
	<span property="telephone">49-341-97-32367</span>
    
	or visit
	<a property="url" href="http://aksw.org/SoerenAuer.html"> my homepage</a>.
</p>
 

Advantages and Disadvantages of RDFa (1/2)

  • Advantages
    • Integrated handling of human (HTML) and machine (RDF) representation
    • Re-uses a number of HTML features
    • "principles of interoperable metadata" are enforced by RDFa:
      • Publisher Independence – every web site can use its own representation
      • Data Reuse - data is not duplicated. Separate RDF and HTML sections are not required anymore for the same content.
      • Self Containment - RDF data is nevertheless separated from the HTML (sitting in special attributes, can be extracted)
      • Schema Modularity - attributes are reusable
      • Evolvability - additional fields can be added and XML transforms can extract the semantics of the data from an HTML file

Advantages and Disadvantages of RDFa (2/2)

 
  • Disadvantages
    • Readability is lower than with Turtle
    • Backwards compatibility with RDFa 1.0 adds some overhead.

JSON-LD

  • JSON = JavaScript Object Notation
  • JSON data are JavaScript data structures and can be interpreted via eval()
  • For most other programming languages there also exist parsers
  • Java Script Object  Notation for Linking Data
  • W3C Recommendation of 2014     
  • Plain JSON example:
{
  "name": "John Lennon",
  "born": "1940-10-09",
  "spouse": "http://dbpedia.org/resource/Cynthia_Lennon"
}
  • How can we enrich this with semantics?
  • JSON-LD answer: add an RDF context!  

JSON-LD Contexts (1/2)

  • @context is a special keyword to make explicit the semantic context in which some JSON data is communicated.
  • The context includes, e.g., name-to-IRI mappings.
  • The @id keyword assigns IRIs to things.
{ "@context":
  { "name": "http://schema.org/name",
    "born": { "@id": "http://schema.org/birthDate",
                "@type": "http://www.w3.org/2001/XMLSchema#date" },
    "spouse": { "@id": "http://schema.org/spouse",
                      "@type": "@id"
                    } 
              }
  "@id": "http://dbpedia.org/resource/John_Lennon",
  "name": "John Lennon",
  "born": "1940-10-09",
  "spouse": "http://dbpedia.org/resource/Cynthia_Lennon" }
​​​​​​​

JSON-LD Contexts (2/2)

  • Note that the remaining JSON remained unchanged.
  • It is possible to centralise context definitions in external locations and point to them:
{
     "@context": "http://json-ld.org/contexts/person.jsonld",
     "@id": "http://dbpedia.org/resource/John_Lennon",
     "name": "John Lennon", 
     ...
 }

RDF/JSON

{
"http://dbpedia.org/resource/Leipzig" :
   { "http://dbpedia.org/property/hasMayor":
	[ { "type":"uri", "value":"http://dbpedia.org/resource/Burkhard_Jung" } ],

	"http://www.w3.org/2000/01/rdf-schema#label":

	[ { "type":"literal", "value":"Leipzig", "lang":"en" } ] 
    
	"http://www.w3.org/2003/01/geo/wgs84_pos#lat":

	[ { "type":"literal", "value":"51.3333",
	     "datatype":"http://www.w3.org/2001/XMLSchema#float" } ]
         
	"http://www.w3.org/2003/01/geo/wgs84_pos#lon":

	[ { "type":"literal", "value":"12.3833",
	    "datatype":"http://www.w3.org/2001/XMLSchema#float" } ]
  }
}
 

RDF/JSON Syntax

  • RDF/JSON has the form subject S, predicate P, object O
     
    { "S" : { "P" : [ O ] } }      
    
  • Type : has to be an URI, literal or blank node AND  has to be written in lower case
  • Value : Describes data of an object
    • Best practice: Render the whole URI
  • Lang : Language of an literal. Optional, but if it exists it might not be empty
  • Datatype : Data type of an object, optional.

Advantages and Disadvantages of JSON-LD

  • Advantages:
    • Compact data format to exchange data between applications
    • Very good tool support (almost every programming language supports JSON)
    • Less overhead while parsing and serialization than XML
  • Disadvantages
    • RDF structures that go beyond key/value pairs (i.e. property/object pairs attached to a given subject) are not as easy to read for humans as in Turtle

IRI Serialization (1/2)

  • IRIs are a URI generalization that allows Unicode characters.
    • Defined in RFC 3987
  • Only the following formats are fully compatible with the IRI RFC
    • RDFa
    • Notation 3
    • JSON-LD (the obsolete RDF/JSON as well)
  • NTriples & NQuads do not support IRIs
    • Both use 7-bit US-ASCII character encoding
    • http://www.w3.org/TR/rdf-testcases/#character
       

IRI Serialization (2/2)

  •  RDF/XML & Turtle provide partial IRI support
    • Their grammar is not fully mapped to the IRI grammar
  • In RDF/XML predicates must be declared as XML Elements
  • Turtle is fully compatible only when using absolute IRIs

Syntax and Usage of data types

  • Difference between lexical and value domain
    • Lexical: "3.14", "+04.1300", "-2,5"
    • Value: 3.14, 4.13, -2.5
  • Untyped literals get treated like char sequences
    • "02" < "100" < "11" < "2" (lexicographic order)
  • Typing allows to handle values 'semantically'
    • Data types will be identified by URIs
    • Syntax: "VALUE"^^data type URI
    • In fact data type labels are freely selectable URIs
  • Most commonly one uses XML Schema data types (XSD)
    • Further complexity beyond this should be modelled using additional RDF properties.
    • Example: "2,718 km"

Example for data type usage

Turtle:
@prefix dbr: <http://dbpedia.org/resource/> .
@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
dbr:Leipzig    geo:lat     "51.333332"^^xsd:float ,
               geo:long    "12.383333"^^xsd:float .
XML:
<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
         xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#">
<rdf:Description rdf:about="http://dbpedia.org/resource/Leipzig">
   <geo:lat rdf:datatype="http://www.w3.org/2001/XMLSchema#float">51.33
    </geo:lat>
   <geo:long rdf:datatype="http://www.w3.org/2001/XMLSchema#float">12.38
    </geo:long>
</rdf:Description>

Predefined Data Type

  • rdf:HTML and rdf:XMLLiteral are the only datatypes predefined by RDF
  • Used for arbitrary but balanced XML/HTML fragments

RDF/XML has the following special syntax for the rdf:XMLLiteral datatype:

<rdf:Description rdf:about="http://example.org/SemanticWeb">
<ex:Titel rdf:parseType="Literal">
  <b>Semantic Web</b>
  <br />
  Grundlagen
</ex:Titel>
</rdf:Description>

Language definitions (1/2)

Language information influences only untyped literals

XML:

<rdf:Description rdf:about="http://dbpedia.org/resource/Leipzig">
   <rdfs:label xml:lang="de">Leipzig</rdfs:label>
   <rdfs:label xml:lang="ru">Лейпциг</rdfs:label>
</rdf:Description>
Turtle:
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
             http://dbpedia.org/resource/Leipzig
             rdfs:label     "Leipzig"@de ,
             rdfs:label     "Лейпциг"@ru .

Language definitions (2/2)

Language information influences only untyped literals

According to the RDF 1.1 specification the 2nd literal has a type different from the 1st and 3rd one (but often similiarly implemented)! The 1st and 3rd are the same (difference from RDF 1.0).

@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix dbr: <http://dbpedia.org/resource/> .
dbr:Leipzig    rdfs:label     "Leipzig" ,
                              "Leipzig"@de ,
                              "Leipzig"^^xsd:string .

References

Exercise 1 - dataset modeling

  1. Think of a little dataset that models a person with his/her personal information such as contact info and links to homepage.

  2. Write it down in:
    1. RDF/XML
    2. JSON-LD
    3. Turtle
    4. N-Triples (i.e. no abbreviations)

Exercise 2

  1. Think of a little dataset that models something from your daily life
    1. It has to have at least 5 resources (types being from two different vocabularies) and 3 literals
  2. Write it down in:
    1. RDF/XML
    2. JSON-LD
    3. Turtle
    4. N-Triples (i.e. no abbreviations)
  3. Measure the number of characters you need without whitespaces
  4. Try to compress your dataset as much as you can and tell us about your compression factor (baseline: N-Triples)

Create a small dataset modelling some domain you know well (e.g. family relationships).

 
Here is a possible vocabulary:
  • In the lecture on RDFS we'll see how this can be modelled more formally.
  • Note: Top 100 namespaces can be found in: http://goo.gl/fU8JNS

Populate the dataset with at least 3 instances

Write it down in: RDF/XML

<?xml version ="1.0" encoding = "utf-8"?>
	
  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
	     xmlns:ex = "http://example.org/family#"
	     xmlns:foaf= "http://xmlns.com/foaf/0.1/">
    
	<rdf:Description
		rdf:about="http://example.org/family#DoeFamily" >

		<ex:familyName
			rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >Doe
            	</ex:familyName>
                
		<ex:streetAddress
			rdf:datatype ="http://www.w3.org/2001/XMLSchema#string" >
			High Street
            	</ex:streetAddress>
                
		<ex:hasNeighbour rdf:resource = "http://example.org/family#SmithFamily" />
			<ex:livesIn rdf:resource = "http://dbpedia.org/resource/Bonn" />
            
		<ex:hasMember>
        		<foaf:Person rdf:about= "http://example.org/family#John" /foaf:Person>
		</ex:hasMember>
        
		<ex:hasMember>
			<foaf:Person rdf:about="http://example.org/family#Jenny" /foaf:Person>
		</ex:hasMember>
    
	</rdf:Description>
    
	<rdf:Description
		rdf:about ="http://example.org/family#Bonn">
	</rdf:Description>
  </rdf:RDF>
 

Write it down in: JSON-LD

Write it down in: Turtle

 

Write it down in: N-Triples (no abbreviations)

<http://example.org/family#DoeFamily> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://example.org/family#Family> .
<http://example.org/family#DoeFamily> <http://example.org/family#familyName> "Doe"^^<http://www.w3.org/2001/XMLSchema#string> .
<http://example.org/family#DoeFamily> <http://example.org/family#streetAddress> "High Street"^^<http://www.w3.org/2001/XMLSchema#string> .
<http://example.org/family#DoeFamily> <http://example.org/family#hasNeighbour> <http://example.org/family#SmithFamily> .
<http://example.org/family#DoeFamily> <http://example.org/family#livesIn> <http://dbpedia.org/resource/Bonn> .
<http://example.org/family#DoeFamily> <http://example.org/family#hasMember> <http://example.org/family#John> .
<http://example.org/family#DoeFamily> <http://example.org/family#hasMember> <http://example.org/family#Jenny> .
<http://example.org/family#Bonn> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/ontology/Place> .
<http://example.org/family#John> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
<http://example.org/family#Jenny> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .

Measure the number of characters you need without whitespaces

  • RDF/XML – 776 characters
  • JSON-LD – 515 characters
  • Turtle – 380 characters
  • N-Triples – 878 characters

Compressing Ontology

@prefix dbo: .
@prefix foaf: .
@prefix rdf: .
@prefix rdfs: .
@prefix rel: .
@prefix xml: .
@prefix xsd: .
rel:Family a rdfs:Class .
rel:familyName a rdfs:Property ;
rdfs:domain rel:Family ;
rdfs:range xsd:string .
rel:hasMember a rdfs:Property ;
rdfs:domain rel:Family ;
rdfs:range foaf:Person .
rel:hasNeighbour a rdfs:Property ;
rdfs:domain rel:Family ;
rdfs:range rel:Family .
rel:livesIn a rdfs:Property ;
rdfs:domain rel:Family ;
rdfs:range dbo:Place .
rel:streetAddress a rdfs:Property ;
rdfs:domain rel:Family ;
rdfs:range xsd:string .

 

Linked Data Stack – RDFS

RDF Schema User Interface & Applications Trust Crypto Proof Unifying Logic Ontology: OWL Rules: RIF Query: SPARQL Data Interchange: RDF XML URI Unicode

Goals

  • Understand for which use cases RDF Schema is suited
  • Understand the semantics of RDF Schema
    • Be able to read RDF Schema
    • Be able to create your own RDF Schema
  • Know its limitations

Prerequisites

  • Basic knowledge of RDF stack
  • Know the RDF data model

What is RDF Schema?

We can use RDF triples to express facts like:

 ex:AlbertEinstein   ex:discovered   ex:TheoryOfRelativity .

But how we can refine such a fact?

  • How we can define that the predicate ex:discovered has a person as subject and a theory as object?
  • How we can express that Albert Einstein was a researcher and that every researcher is a human?

Such knowledge is called schema knowledge or terminological knowledge

RDF Schema gives us the possibility to model such knowledge

RDF Schema (short: RDFS)

  • is a part of the W3C RDF recommendation family
  • used for schema/terminological knowledge
  • itself is an RDF vocabulary (thus every RDF Schema graph is an RDF graph)
  • its vocabulary is generic (not bound to a specific application area)
  • allows to specify semantics of user-defined RDF vocabularies

The Namespace of RDF Schema is http://www.w3.org/2000/01/rdf-schema#

(common prefix: rdfs)

Classes

A Class is a set of things (or entities). In RDF these things are identified by URIs.

The membership of an entity to a class is defined using the rdf:type property.

The fact that ex:MyBlueVWGolf is a member/instance of the class ex:Car can be expressed:

ex:MyBlueVWGolf   rdf:type   ex:Car .

A resource can belong to several classes:

ex:MyBlueVWGolf   rdf:type   ex:Car .
ex:MyBlueVWGolf rdf:type ex:GermanProduct .



Hierarchies of Classes

Classes can be arranged in hierarchies using the rdfs:subClassOf property.

Every ex:Car is an ex:MotorVehicle:

ex:Car   rdfs:subClassOf   ex:MotorVehicle .


Implicit Knowledge

Using the schema definition we are able to identify implicit knowledge.

ex:MyBlueVWGolf  rdf:type         ex:Car .
ex:Car rdfs:subClassOf ex:MotorVehicle .

implicitly contains the following statement as a logical consequence

ex:MyBlueVWGolf   rdf:type   ex:MotorVehicle .



Implicit Knowledge

The statements

ex:Car           rdfs:subClassOf   ex:MotorVehicle .
ex:MotorVehicle rdfs:subClassOf ex:Vehicle .

implicitly contains the following statement as a logical consequence

ex:Car   rdfs:subClassOf   ex:Vehicle .

We can see that rdfs:subClassOf is transitive.

Defining an own Class

Every URI denoting a class is an instance of rdfs:Class. For defining an own class we have to write:

ex:Car   rdf:type   rdfs:Class .


Which makes rdfs:Class itself an instance of rdfs:Class.

rdfs:Class   rdf:type   rdfs:Class .


Equivalence of Classes

To express the equivalence of two classes we can use

ex:Car         rdfs:subClassOf   ex:Automobile .
ex:Automobile rdfs:subClassOf ex:Car .

Which leads to the statement

ex:Car   rdfs:subClassOf   ex:Car .

We can see that rdfs:subClassOf is reflexive.


Predefined RDFS Classes (1/2)

There are several other predefined classes than rdfs:Class:
  • rdfs:Resource is the class of all things.
    It is the superclass of all classes.

     
  • rdf:Property is the class of all properties.
     
  • rdfs:Datatype is the class of all datatypes.
    (Every instance of this class is a subclass of rdfs:Literal.)

     
  • rdfs:Literal is the class of literal values
    such as strings or integers.
    (If such a literal is typed, it is an instance of rdfs:Datatype.)
 

Predefined RDFS Classes (2/2)

  • rdf:langString is the class of language-tagged string literals.
    It is an instance of rdfs:Datatype and a subclass of rdfs:Literal.

     
  • rdf:XMLLiteral is the class of XML literal values. Its a subclass of rdfs:Literals and an instance of rdfs:Datatype.
     
  • rdf:Statement is the class of the RDF statements.
    So every RDF triple can be seen as an instance of this class with a rdf:subject, rdf:predicate and rdf:object property.

     
  • rdfs:Container is a super-class of the RDF Container classes. (i.e. rdf:Bag, rdf:Seq, rdf:Alt)
     
Yes, rdf:Property, rdf:XMLLiteral, rdf:Statement, etc., are in the RDF vocabulary already, but only RDFS declares them to be classes.
 

Defining an own Property

As we can define Classes we can define new Properties.

For expressing that there is a new Property we define it as an instance of the property class.

ex:drives   rdf:type   rdf:Property .

With this new Property we can express that Max drives a VW Golf (not just any one, but a specific one).

ex:Max   ex:drives   ex:MyBlueVWGolf .


Hierarchies of Properties

Using the rdfs:subPropertyOf property we can define a hierarchy of properties.

ex:drives   rdfs:subPropertyOf   ex:controls .

(You see that a vocabulary is often an idealized model of the real world!)

With the former statement

ex:Max   ex:drives   ex:MyBlueVWGolf .

We can infer that

ex:Max   ex:controls   ex:MyBlueVWGolf .

Range and Domain of Properties

Every property has a Domain and a Range that specify which class the subject or the object must have.

ex:Max   ex:drives   ex:MyBlueVWGolf .
^^^^^^               ^^^^^^^^^^^^^^^
Domain                    Range

Using the Properties rdfs:domain and rdfs:range we can define the Domain and Range of a Property.

ex:drives   rdfs:domain   ex:Person .
ex:drives   rdfs:range    ex:Vehicle .

The same can be done for datatypes

ex:hasAge   rdfs:range   xsd:nonNegativeInteger .

Important to understand:

  1. “must have” above is not a constraint in the sense of “if ex:MyBlueVWGolf is not an ex:Vehicle, then the RDF statement above is illegal”.
  2. It means “given that ex:MyBlueVWGolf is used with ex:drives, we know that it is an ex:Vehicle (in addition to whatever else it may be)”.
  3. Possibility (1) wouldn't make sense, as in RDF Schema there is no way of expressing that something is not an instance of some class.

Range and Domain of Properties

Every property has a Domain and a Range that specify which class the subject or the object must have.

ex:Max   ex:drives   ex:MyBlueVWGolf .
^^^^^^               ^^^^^^^^^^^^^^^
Domain                    Range

Using the Properties rdfs:domain and rdfs:range we can define the Domain and Range of a Property.

ex:drives   rdfs:domain   ex:Person .
ex:drives   rdfs:range    ex:Vehicle .

The same can be done for datatypes

ex:hasAge   rdfs:range   xsd:nonNegativeInteger .

Important to understand:

  1. “must have” above is not a constraint in the sense of “if ex:MyBlueVWGolf is not an ex:Vehicle, then the RDF statement above is illegal”.
  2. It means “given that ex:MyBlueVWGolf is used with ex:drives, we know that it is an ex:Vehicle (in addition to whatever else it may be)”.
  3. Possibility (1) wouldn't make sense, as in RDF Schema there is no way of expressing that something is not an instance of some class.

Multiple Range Statements

The statements

ex:drives   rdfs:range   ex:Car .
ex:drives   rdfs:range   ex:Ship .

mean that the Range of ex:drives has to be both  an ex:Car and an ex:Ship!

If you want to express that the object of the Property has to be a car or a ship, a better expression would be

ex:Car     rdfs:subClassOf   ex:Vehicle .
ex:Ship rdfs:subClassOf ex:Vehicle .
ex:drives  rdfs:range   ex:Vehicle .

Implicit Knowledge

Once we define the Domain and Range of properties, we have to take care that using such properties could lead to unintended consequences.

The schema

ex:isMarriedTo    rdfs:domain  ex:Person .
ex:isMarriedTo rdfs:range ex:Person .
ex:instituteAKSW rdf:type ex:Institution .

and the additional statement

ex:Max   ex:isMarriedTo   ex:instituteAKSW .

leads to the logical consequence

ex:instituteAKSW   rdf:type   ex:Person .

False Conclusion

Some people might be confused about the combination of specifying domain and range of a property and the hierarchy of the classes. So we want to look at an example:

ex:drives  rdfs:range       ex:Car .     # (1)
ex:Car     rdfs:subClassOf  ex:Vehicle . # (2)

These two triples do not entail the following relation

ex:drives   rdfs:range   ex:Vehicle .    # (3)

I.e. we do not gain new terminological knowledge. Still, of any concrete triple having the predicate ex:drives, we know that its object is of type ex:Vehicle – which is the same effect as if we had statement (3) in our schema.

Container Class

RDF Schema defines the rdfs:Container class which is a superclass for the Containers defined by RDF:

  • rdf:Seq
  • rdf:Bag
  • rdf:Alt

Container Membership

RDF Schema defines new classes for working with Containers:

  • The rdfs:ContainerMembershipProperty class which contains all properties that are used to state that a resource is a member of a container (e.g. rdf:_1, rdf:_2, …).
  • The rdfs:member Property is a superproperty for all Properties of all Container membership Properties.
    (So every instance of rdfs:ContainerMembershipProperty is a rdfs:subPropertyOf the rdfs:member Property)

Reification

How can we state in RDF
"The detective supposes that the butler killed the gardener." ?

ex:Detective   ex:supposes   "The butler killed the gardener" .
ex:Detective   ex:supposes   ex:theButlerKilledTheGardener .

Both ways are unsatisfactory. What we would like to talk about is the triple

ex:Butler   ex:killed   ex:Gardener .

itself.

Reification

With the class rdf:Statement RDF Schema gives the possibility of reification . It has the following Properties

  • rdf:subject defining an rdfs:Resource which is the subject of the statement
  • rdf:predicate defining an rdf:Property which is the predicate of the statement
  • rdf:object defining an rdf:Resource which is the object of the statement

Using Reification we can see an RDF triple as a Resource and formulate facts about it. (e.g. that the theory hasn't been proved)

ex:Detective  ex:supposes    _:theory .
_:theory      rdf:type       rdf:Statement .
_:theory      rdf:subject    ex:Butler .
_:theory      rdf:predicate  ex:hasKilled .
_:theory      rdf:object     ex:Gardener .
_:theory      ex:hasState    "unproved" .

Note that the following statement is not a logical consequence of this:

ex:Butler   ex:killed   ex:Gardener .

If the RDF(S) specification said that it is a logical consequence, this wouldn't make sense – it would prevent us from being able to talk about unproved or wrong statements.

Reification

With the class rdf:Statement RDF Schema gives the possibility of reification . It has the following Properties

  • rdf:subject defining an rdfs:Resource which is the subject of the statement
  • rdf:predicate defining an rdf:Property which is the predicate of the statement
  • rdf:object defining an rdf:Resource which is the object of the statement

Using Reification we can see an RDF triple as a Resource and formulate facts about it. (e.g. that the theory hasn't been proved)

ex:Detective  ex:supposes    _:theory .
_:theory      rdf:type       rdf:Statement .
_:theory      rdf:subject    ex:Butler .
_:theory      rdf:predicate  ex:hasKilled .
_:theory      rdf:object     ex:Gardener .
_:theory      ex:hasState    "unproved" .

Note that the following statement is not a logical consequence of this:

ex:Butler   ex:killed   ex:Gardener .

If the RDF(S) specification said that it is a logical consequence, this wouldn't make sense – it would prevent us from being able to talk about unproved or wrong statements.

Reification

rdf:type ex:supposes ex:hasState rdfs:subject rdfs:predicate rdfs:object rdfs:Statement ex:Detective "unproven" ex:Butler ex:killed ex:Gardener

Supplementary Information

RDF Schema gives the possibility to add additional information to resources using the following Properties:

  • rdfs:label can be used to give a human readable name for a resource.
  • rdfs:comment for adding a longer comment or explanation.
  • rdfs:seeAlso points to an URI where additional information about the resource can be found.
  • rdfs:isDefinedBy points to an URI where the resource is defined.
    (rdfs:isDefinedBy is a subproperty of rdfs:seeAlso)
ex:VWGolf   rdfs:label        "VW Golf" .
ex:VWGolf   rdfs:comment   "The VW Golf is a popular German car..." .
ex:VWGolf   rdfs:seeAlso   wikipedia:VW_Golf .
ex:VWGolf   rdfs:isDefinedBy  ex2:VolkswagenDataset .

The advantage of using these properties is that the additional information is represented as structured RDF, too.

Define acid and base

We want to implement a System which should be able to calculate the needed amount of an acid or a base to neutralize a given solution.

Therefore our System has to handle information about acids and bases. For this information we define an own schema for our data.

We start by defining acid and base as own classes

ex:Acid   rdf:type   rdfs:Class .
ex:Base rdf:type rdfs:Class .

Both can be described as chemical compounds. So we define this as an own class and the other two subclasses of it.

ex:ChemicalCompound   rdf:type   rdfs:Class .
ex:Acid rdfs:subClassOf ex:ChemicalCompound .
ex:Base rdfs:subClassOf ex:ChemicalCompound .

Define some instances

After that, we are able to add some acids and bases. For example:

  • hydrogen chloride (ex:HCl)
  • phosphoric acid (ex:H3PO4)
  • Sodium hydroxide (ex:NaOH) and
  • Calcium hydroxide (ex:Ca_OH2).
ex:HCl     rdf:type   ex:Acid .
ex:H3PO4 rdf:type ex:Acid .
ex:NaOH rdf:type ex:Base .
ex:Ca_OH2 rdf:type ex:Base .

Intermediate result

The Picture shows what we have done so far.

rdfs:subClassOf rdfs:subClassOf rdf:type rdf:type rdf:type rdf:type ex:ChemicalCompound ex:Acid ex:Base ex:HCl ex:H3PO4 ex:NaOH ex:Ca_OH2

Add the molar mass

After creating the classes we want to add a typical chemical property, like the molar mass. At first we define it as a property.

ex:hasMolarMass   rdf:type   rdfs:Property .

Every chemical compound can have an information about its molar mass.

ex:hasMolarMass   rdfs:domain   ex:ChemicalCompound .

The molar mass itself could be defined as an own datatype with some additional information. But for our small example we want to use a simple existing floating point datatype. So the Range of the ex:hasMolarMass property is

ex:hasMolarMass   rdfs:range   xsd:float .

After defining the property we can use it:

ex:NaOH   ex:hasMolarMass   "39.9971" .

Add the molar mass

After creating the classes we want to add a typical chemical property, like the molar mass. At first we define it as a property.

ex:hasMolarMass   rdf:type   rdfs:Property .

Every chemical compound can have an information about its molar mass.

ex:hasMolarMass   rdfs:domain   ex:ChemicalCompound .

The molar mass itself could be defined as an own datatype with some additional information. But for our small example we want to use a simple existing floating point datatype. So the Range of the ex:hasMolarMass property is

ex:hasMolarMass   rdfs:range   xsd:float .

After defining the property we can use it:

ex:NaOH   ex:hasMolarMass   "39.9971" .

Add additional Properties

As a general difference between an acid and a base we want to define that an acid can donate protons and a base can accept protons. Therefore we want to define two properties with which we can store how many protons can be accepted or donated.

ex:canDonateProtons   rdf:type     rdfs:Property .
ex:canDonateProtons   rdfs:domain  ex:Acid .
ex:canDonateProtons   rdfs:range   xsd:integer .

ex:canAcceptProtons   rdf:type     rdfs:Property .
ex:canAcceptProtons   rdfs:domain  ex:Base .
ex:canAcceptProtons   rdfs:range   xsd:integer .

Using these new properties we can distinguish the acids and bases by there strength (that means by the number of protons they donate or accept).

ex:HCl     ex:canDonateProtons  "1" .
ex:H3PO4   ex:canDonateProtons  "3" .
ex:NaOH    ex:canAcceptProtons  "1" .
ex:Ca_OH2  ex:canAcceptProtons  "2" .

Add additional Properties

As a general difference between an acid and a base we want to define that an acid can donate protons and a base can accept protons. Therefore we want to define two properties with which we can store how many protons can be accepted or donated.

ex:canDonateProtons   rdf:type     rdfs:Property .
ex:canDonateProtons   rdfs:domain  ex:Acid .
ex:canDonateProtons   rdfs:range   xsd:integer .

ex:canAcceptProtons   rdf:type     rdfs:Property .
ex:canAcceptProtons   rdfs:domain  ex:Base .
ex:canAcceptProtons   rdfs:range   xsd:integer .

Using these new properties we can distinguish the acids and bases by there strength (that means by the number of protons they donate or accept).

ex:HCl     ex:canDonateProtons  "1" .
ex:H3PO4   ex:canDonateProtons  "3" .
ex:NaOH    ex:canAcceptProtons  "1" .
ex:Ca_OH2  ex:canAcceptProtons  "2" .

Define solution

Our program has to work with solutions which contain different amounts of chemical compounds. So we have to define the class ex:Solution and a property describing its mass.

ex:Solution  rdf:type     rdfs:Class .

ex:hasMass rdf:type rdfs:Property .
ex:hasMass rdfs:domain ex:Solution .
ex:hasMass rdfs:range xsd:float .

Define ingredient

Our program has to work with solutions which contain different amounts of chemical compounds. However, a single Property which describes that a chemical compound is inside a solution is not expressive enough, because we need also the possibility to give information about the amount of this compound.

So we have to define the class ex:Ingredient and a Property for describing the amount in percentages.

ex:Ingredient  rdf:type     rdfs:Class .

ex:hasAmount rdf:type rdfs:Property .
ex:hasAmount rdfs:domain ex:Ingredient .
ex:hasAmount rdfs:range xsd:float .

Additionally this class needs two Properties to be connected to an ex:Solution and an ex:ChemicalCompound.

ex:isPartOf   rdf:type     rdfs:Property .
ex:isPartOf rdfs:domain ex:Ingredient .
ex:isPartOf rdfs:range ex:Solution .

ex:contains rdf:type rdfs:Property .
ex:contains rdfs:domain ex:Ingredient .
ex:contains rdfs:range ex:ChemicalCompound .

Define ingredient

Our program has to work with solutions which contain different amounts of chemical compounds. However, a single Property which describes that a chemical compound is inside a solution is not expressive enough, because we need also the possibility to give information about the amount of this compound.

So we have to define the class ex:Ingredient and a Property for describing the amount in percentages.

ex:Ingredient  rdf:type     rdfs:Class .

ex:hasAmount rdf:type rdfs:Property .
ex:hasAmount rdfs:domain ex:Ingredient .
ex:hasAmount rdfs:range xsd:float .

Additionally this class needs two Properties to be connected to an ex:Solution and an ex:ChemicalCompound.

ex:isPartOf   rdf:type     rdfs:Property .
ex:isPartOf rdfs:domain ex:Ingredient .
ex:isPartOf rdfs:range ex:Solution .

ex:contains rdf:type rdfs:Property .
ex:contains rdfs:domain ex:Ingredient .
ex:contains rdfs:range ex:ChemicalCompound .

Using the schema

Now our small schema is able to express all information which a program needs to solve the following typical task in chemistry:

You have 100g of a solution which consists of 20% phosphoric acid. How much of a 10% sodium hydroxide solution do you need to neutralize this solution?

We could start by inserting the information contained inside the task description into our knowledge base (which relies on our schema):

ex:GivenSolution  rdf:type    ex:Solution .
ex:GivenSolution ex:hasMass "100" .

ex:PhosAcidIng20Perc rdf:type ex:Ingredient .
ex:PhosAcidIng20Perc ex:hasAmount "20" .
ex:PhosAcidIng20Perc ex:contains ex:H3PO4 .
ex:PhosAcidIng20Perc ex:isPartOf ex:GivenSolution .

ex:SearchedSolution rdf:type ex:Solution .

ex:SodiumHyIng10Perc rdf:type ex:Ingredient .
ex:SodiumHyIng10Perc ex:hasAmount "10" .
ex:SodiumHyIng10Perc ex:contains ex:NaOH .
ex:SodiumHyIng10Perc ex:isPartOf ex:SearchedSolution .

Using the schema

Now our small schema is able to express all information which a program needs to solve the following typical task in chemistry:

You have 100g of a solution which consists of 20% phosphoric acid. How much of a 10% sodium hydroxide solution do you need to neutralize this solution?

We could start by inserting the information contained inside the task description into our knowledge base (which relies on our schema):

ex:GivenSolution  rdf:type    ex:Solution .
ex:GivenSolution  ex:hasMass  "100" .

ex:PhosAcidIng20Perc  rdf:type      ex:Ingredient .
ex:PhosAcidIng20Perc  ex:hasAmount  "20" .
ex:PhosAcidIng20Perc  ex:contains   ex:H3PO4 .
ex:PhosAcidIng20Perc  ex:isPartOf   ex:GivenSolution .

ex:SearchedSolution   rdf:type     ex:Solution .

ex:SodiumHyIng10Perc  rdf:type      ex:Ingredient .
ex:SodiumHyIng10Perc  ex:hasAmount  "10" .
ex:SodiumHyIng10Perc  ex:contains   ex:NaOH .
ex:SodiumHyIng10Perc  ex:isPartOf   ex:SearchedSolution . 

 

Using the schema

The program could query the following needed information from our knowledge base:

ex:H3PO4  ex:hasMolarMass      "97.995" .
ex:H3PO4 ex:canDonateProtons "3" .

ex:NaOH   ex:hasMolarMass   "39.9971" .
ex:NaOH   ex:canAcceptProtons "1" .


Relying on all these information which was modeled using our own schema the program could easely compute the result for the given task:

We need 244.9g of the sodium hydroxide solution for neutralizing the given phosphoric acid solution.

RDF and RDF Schema

RDF Schema extends RDF with a special Vocabulary for terminological knowledge.

The picture shows the different kinds of knowledge and the different usages of RDF-Schema and the “normal” RDF

ex:Vehicle rdfs:subClassOf terminological knowledge ex:Person rdfs:domain ex:drives rdfs:range ex:Car RDF-Schema RDF rdf:type rdf:type ex:Max ex:drives ex:MyVWGolf assertional knowledge

Limitations of RDFS

RDF Schema can be used as a lightweight language for defining a vocabulary (also called ontology) used in RDF graphs.

However, RDF Schema has some limitations regarding the possibilities of formulating ontologies.

Missing Expressivity

RDF Schema does not contain possibilities to make the following Expressions:

  • It is not possible to define a negation of an expression.
    For example it is not possible that the Domain of a property does not contain a certain class.
  • It is not possible to define cardinalities.
    For example it is not possible that a Person has either 0 or 1 ex:isMarriedTo relations.
  • It is not possible to define a set of classes.
    For example we can not express that the Domain of a property should be one or another class. We have to create a new superclass of both which we can use as the domain of the property.
  • It is not possible to define metadata of the schema.
    We are not able to add important metadata like a version to the schema.

These examples are just the most important limitations of RDFS. Therefore we will be looking at the mightier OWL in another lecture.

Summary

  • RDF Schema (short: RDFS) can be used to express terminological knowledge by defining Classes and Properties.
  • The Classes and Properties can be arranged in hierarchies.
  • The Domain and Range of Properties can be defined.
  • The Schema allows the inference of knowledge which has been defined implicitly.
  • RDF Schema can be used to define a "lightweight" ontology but it is not as expressive as OWL.
  • The current standard, RDF Schema 1.1, only has minor changes over RDFS 1.0. 

References

Tasks & mini projects

This slide contains some suggestions for tasks and mini projects you can complete in addition to the multiple-choice self-assessment test in order to practice and prepare for an exam:

  1. Create a schema with which you can describe your familiar situation
    • Start with needed classes and the relations in the first degree (spouse, parents and children)
    • go on with the relations in the second degree (sister, brother, grandfather,...)
    • How would you model the gender? As a property of a "Person" superclass or would you create two separated classes "Man" and "Woman"? What would be better for defining the domain or range of properties like "isTheGrandfatherOf"?

Exercises 1 - Classes

  1. Create graph representation of different courses presented by different people in a university. (For example lectures or seminars as sub-classes of courses are presented by professors or researchers as sub-classes of person.)
  2. Create Turtle and RDF graph of this example
     

Exercises 2 - Properties

  1. Create graph representation of different properties and hierarchies of properties
  2. Create Turtle and RDF graph of this example

Exercises 3 - Lists


Tasks

  • Create a schema with which you can describe your familiar situation
    • Start with needed classes and the relations in the first degree (spouse, parents and children)
    • go on with the relations in the second degree (sister, brother, grandfather,...)
    • How would you model the gender? As a property of a "Person" superclass or would you create two separated classes "Man" and "Woman"? What would be better for defining the domain or range of properties like "isTheGrandfatherOf"?

Create a schema with which you can describe your familiar situation


controlled 3D RDF Visualization

http://www.ebremer.com/system/files/story/images/nexus-dna-force-directedlayout-2011-05-15.jpg

Overview

  1. What is Semantics?
  2. What is Model-theoretic Semantics?
  3. Model-theoretic Semantics for RDF(S)
  4. What is Proof-theoretic Semantics?
  5. Proof-theoretic Semantics for RDF(S)
  6. References

Overview

  1. What is Semantics?
  2. What is Model-theoretic Semantics?
  3. Model-theoretic Semantics for RDF(S)
  4. What is Proof-theoretic Semantics?
  5. Proof-theoretic Semantics for RDF(S)

Syntax and Semantics

Syntax: character strings without meaning
Semantics: meaning of the character strings


Semantics of Programming Languages

Semantics of Logic

Recall: Implicit Knowledge

 If an RDFS document contains the triples

u rdf:type ex:Textbook .

and

ex:Textbook rdfs:subClassOf ex:Book .

then

u rdf:type ex:Book .

is implicitly also the case: it is a logical consequence . We can also say it is deduced (deduction) or inferred (inference). We do not have to state this explicitly. Which statements are logical consequences is governed by the formal semantics.

Recall: Implicit Knowledge

From

ex:Textbook  rdfs:subClassOf  ex:Book .
ex:Book      rdfs:subClassOf  ex:PrintMedia .

The following is a logical consequence:

ex:Textbook rdfs:subClassOf ex:PrintMedia .

That is, rdfs:subClassOf is transitive .

What Semantics Is Good For

Opinions differ. Here is one:

The Semantic Web requires shareable, declarative and computable semantics.

That is, the semantics must be a formal entity, which is clearly defined and automatically computable.

Ontology languages provide this by means of their formal semantics.

Semantic Web semantics is given by a relation—the logical consequence relation.

In other words …

 We capture the meaning of information

  • not by specifying its meaning (which is impossible)
  • by specifying how information interacts with other information

We describe the meaning indirectly through its effects.

Overview

  1. What is Semantics?
  2. What is Model-theoretic Semantics?
  3. Model-theoretic Semantics for RDF(S)
  4. What is Proof-theoretic Semantics?
  5. Proof-theoretic Semantics for RDF(S)

Model-theoretic Semantics

You need:

  • a language/syntax
  • a notion of a model for sentences of the language

Models

  • are made such that each sentence is either true of false w.r.t. a given model
  • if a sentenceα is true in a modelM then we write Mα .

Model-theoretic Semantics

Logical consequence

  • β is a logical consequence of α (written αβ ), if M:MαMβ .
  • if K is a set of sentences, we writeKβ , if κK:κβ .
  • if J is another set of sentences, we write KJ , if βJ:Kβ .

Note: the notation is overloaded.

Model-theoretic Semantics

Logical consequence

  • β is a logical consequence of α (written αβ ), if M:MαMβ .
  • if K is a set of sentences, we write , ifκK:κβ .
  • if J is another set of sentences, we write KJ , if βJ:Kβ .

Note: the notation is overloaded.

Logical Consequence

Model Theory—(contrived) example (1/4)

Language

  • variables  ,w,x,y,z,
  • symbol  η
  • allowed sentences:  aηb for  (a,b) any variables

We want to know

  • What are the logical consequences of the set  {xηy,yηz}

To answer this we must say what the models in our semantics are.

Model Theory—(contrived) example (2/4)

Say, a model 𝐼 of a set 𝐾 of sentences consists of

  • a set 𝐶 of cars and
  • a function 𝐼(⋅), which maps each variable to a car in 𝐶 such that, for each sentence 𝑎 𝜂 𝑏 in 𝐾 we have that 𝐼(𝑎) has more horsepower than 𝐼(𝑏).

We now claim that {𝑥 𝜂 𝑦 , 𝑦 𝜂 𝑧} ⊨ x η ⁡z.

Model Theory—(contrived) example (3/4)

Proof: Consider any model 𝑀 of {𝑥 𝜂 𝑦 , 𝑦 𝜂 𝑧} Since 𝑀 ⊨ {𝑥 𝜂 𝑦,𝑦 𝜂 𝑧}, we know that

  • 𝑀(𝑥) has more horsepower than 𝑀(𝑦) and
  • 𝑀(𝑦) has more horsepower than 𝑀(𝑧).

Hence, 𝑀(𝑥) has more horsepower than 𝑀(𝑧), i.e. 𝑀 ⊨ 𝑥 𝜂 𝑧.

This argument holds for all models of {𝑥 𝜂 𝑦,𝑦 𝜂 𝑧}, therefore {𝑥 𝜂 𝑦,𝑦 𝜂 𝑧} ⊨ 𝑥 𝜂 𝑧.

Model Theory—(contrived) example (4/4)

Say, a model I of a set K of sentences consists of

  • a set C of cars and
  • a function I() , which maps each variable to a car in C such that, for each sentenceaηb in K we have that I(a) has more horsepower than I(b) .

An interpretation I for our language consists of

  • a set C of cars and
  • a function I() , which maps each variable to a car in C .

And that's it. No information whether a sentence is true or not w.r.t. I .

Overview

  1. What is Semantics?
  2. What is Model-theoretic Semantics?
  3. Model-theoretic Semantics for RDF(S)
  4. What is Proof-theoretic Semantics?
  5. Proof-theoretic Semantics for RDF(S)

Model-theoretic Semantics applied to RDF(S)

Language: valid RDF(S)
Sentences are triples; graphs are sets thereof.
Interpretations are given via sets and functions  from language vocabularies to these sets.


Models are defined such that they capture the intended meaning of the RDF(S) vocabulary.
Three different notions:

Simple Interpretations

A simple interpretation I of a given vocabulary V consists of:
  • IR , a non-empty set of resources , alternatively called domain or universe of discurse of I
  • IP , the set of properties of I (which may overlap with IR ), 
  • IEXT , a function assigning to each property a set of pairs from IR , i.e. IEXT:IP2IR×IR , where IEXT(p) is called the extension of property p ,
  • IS , a function mapping URIs from V into the union of the sets IR and IP , i.e. IS:VIRIP

Simple Interpretations

  •  , a function from the typed literals in V into the set IR of resources and
  • LV , a particular subset of IR , called the set of literal values , containing (at least) all untyped literals from V .

Simple Interpretation Function

 

The simple interpretation function \(\cdot^\mathcal{I}\) (written as exponent) is defined as follows:
  • every untyped literal \("\!\!a\!\!"\) is mapped to \(a\), formally \(("\!\!a\!\!")^{\mathcal{I}}=a\),
  • every untyped literal carrying language information \("\!\!a\!\!"\!\!@t\) is mapped to the pair \(\langle a, t \rangle\),
  • every typed literal \(l\) is mapped to \(\mathrm{I_L}(l)\), formally \(l^\mathcal{I}=\mathrm{I_L}(l)\), and
  • every URI \(u\) is mapped to \(\mathrm{I_S}(u)\), i.e. \(u^\mathcal{I}=\mathrm{I_S}(u)\).

Simple Interpretations

Simple Models

The truth value (spo.)I of a (grounded) triple spo. is true iff s , p , and o are contained in V and sI,oIIEXT(pI) .

Simple Models

The truth value(spo.)I of a (grounded) triple spo. is true iff s , p , and o are contained in V and sI,oIIEXT(pI) .

 

What about blank nodes?

Say, \(A(\cdot)\) is a function from blank nodes to URIs. (These URIs need not be contained in the graph we are looking at.)

If, in a graph \(G\), we replace each blank node \(x\) by \(A(x)\), the we obtain the graph \(G^\prime\), which is called a grounding of G.

We know how to do semantics for the grounded graph.

So define \(I \models G\) iff \(I \models G^\prime\) for at least one grounding \(G^\prime\) of \(G\).

Simple Entailment

A graph \(G\) simply entailes a graph \(G^\prime\) if every simple interpretation that is a model of \(G\) is also a model of \(G^\prime\).

(Recall that a simple interpretation is a model of a graph \(G\) if it is a model of each triple in \(G\).)

It's really simple

Basically, \(G \models G^\prime\) iff \(G^\prime\) can be obtained from \(G\) by replacing some nodes in \(G\) with blank nodes.

It is really simple entailment.

RDF Semantic Conditions

An RDF Interpretation of a vocabulary \(V\) is a simple interpretation of the vocabulary \(V \cup V_\mathrm{RDF}\) that additionally satisfies the following conditions:
  • \(x \in \mathit{IP}\) iff \(\langle x, \verb|rdf:Property|^\mathcal{I} \rangle \in \mathrm{I_{EXT}}(\verb|rdf:type|^\mathcal{I})\).
  • if \("\!\!s\!\!"\!\!\verb|^^rdf:XMLLiteral|\) is contained in \(V\) and \(s\) is a well-formed XML literal, then
    • \(\mathrm{I_L}("\!\!s\!\!"\!\!\verb|^^rdf:XMLLiteral|)\) is the XML value of \(s\);
    • \(\mathrm{I_L}("\!\!s\!\!"\!\!\verb|^^rdf:XMLLiteral|) \in \mathit{LV}\);
    • \(\langle \mathrm{I_L}("\!\!s\!\!"\!\!\verb|^^rdf:XMLLiteral|), \verb|rdf:XMLLiteral|^\mathcal{I} \rangle \in \mathrm{I_{EXT}}(\verb|rdf:type|^\mathcal{I})\)
  • if \("\!\!s\!\!"\verb|^^rdf:XMLLiteral|\) is contained in \(V\) and \(s\) is an ill-formed XML literal, then
    • \(\mathrm{I_L}("\!\!s\!\!"\!\!\verb|^^rdf:XMLLiteral|) \not \in \mathit{LV}\);
    • \(\langle \mathrm{I_L}("\!\!s\!\!"\!\!\verb|^^rdf:XMLLiteral|), \verb|rdf:XMLLiteral|^\mathcal{I} \rangle \not \in \mathrm{I_{EXT}}(\verb|rdf:type|^\mathcal{I})\)

RDF Axiomatic Triples

In addition, each RDF Interpretation has to evaluate the following triples to true:

     
 
rdf:type
rdf:type
rdf:Property .
 
rdf:subject
rdf:type
rdf:Property .
 
rdf:predicate
rdf:type
rdf:Property .
 
rdf:object
rdf:type
rdf:Property .
 
rdf:first
rdf:type
rdf:Property .
 
rdf:rest
rdf:type
rdf:Property .
 
rdf:value
rdf:type
rdf:Property .
 
rdf:_i
rdf:type
rdf:Property .
 
rdf:nil
rdf:type
rdf:List .

RDFS Semantic Conditions (1)

Define (for a given RDF Interpretation \(\mathcal{I}\)):

  • \(\mathrm{I_{CEXT}} : \mathit{IR} \longrightarrow 2^\mathit{IR}\)—we define \(\mathrm{I_{CEXT}}(y)\) to contain exactly those elements \(x\) for which \(\langle x, y \rangle\) is contained in \(\mathrm{I_{EXT}}(\verb|rdf:type|)^\mathcal{I}\). The set \(\mathrm{I_{CEXT}}(y)\) is then also called the (class) extension of \(y\).
  • \(\mathit{IC}=\mathrm{I_{CEXT}}(\verb|rdfs:Class|^\mathcal{I})\)
  • \(\mathit{IR}=\mathrm{I_{CEXT}}(\verb|rdfs:Resource|^\mathcal{I})\)
  • \(\mathit{LV}=\mathrm{I_{CEXT}(\verb|rdfs:Literal|^\mathcal{I})}\)
  • If \(\langle x, y\rangle\ \in \mathrm{I_{EXT}}(\verb|rdfs:domain|^\mathcal{I})\) and \(\langle u, v \rangle \in \mathrm{I_{EXT}}(x)\), then \(u \in \mathrm{I_{CEXT}}(y)\)
  • If \(\langle x, y\rangle\ \in \mathrm{I_{EXT}}(\verb|rdfs:range|^\mathcal{I})\) and \(\langle u, v \rangle \in \mathrm{I_{EXT}}(x)\), then \(v \in \mathrm{I_{CEXT}}(y)\)
  • \(\mathrm{I_{EXT}}(\verb|rdfs:subPropertyOf|^\mathcal{I})\) is reflexive and transitive on \(\mathit{IP}\).

RDFS Semantic Conditions (2)

  • If \(\langle x, y \rangle \in \mathrm{I_{EXT}}(\verb|rdfs:subPropertyOf|^\mathcal{I})\),
    then \(x, y \in \mathit{IP}\) and \(\mathrm{I_{EXT}}(x) \subseteq \mathrm{I_{EXT}}(y)\).
  • If \(x \in \mathit{IC}\),
    then \(\langle x, \verb|rdfs:Resource|^\mathcal{I} \rangle \in \mathrm{I_{EXT}}(\verb|rdfs:subClassOf|^\mathcal{I})\).
  • If \(\langle x, y \rangle \in \mathrm{I_{EXT}}(\verb|rdfs:subClassOf|^\mathcal{I})\),
    then \(x, y \in \mathit{IC}\) and \(\mathrm{I_{CEXT}}(x) \subseteq \mathrm{I_{CEXT}}(y)\).
  • \(\mathrm{I_{EXT}}(\verb|rdfs:subClassOf|^\mathcal{I})\) is reflexive and transitive on \(\mathit{IC}\).
  • If \(x \in \mathrm{I_{CEXT}}(\verb|rdfs:ContainerMembershipProperty|^\mathcal{I})\),
    then \(\langle x, \verb|rdfs:member|^\mathcal{I} \rangle \in \mathrm{I_{EXT}}(\verb|rdfs:subPropertyOf|^\mathcal{I})\).
  • If \(x \in \mathrm{I_{CEXT}}(\verb|rdfs:Datatype|^\mathcal{I})\),
    then \(\langle x, \verb|rdfs:Literal|^\mathcal{I} \rangle \in \mathrm{I_{EXT}}(\verb|rdfs:subClassOf|^\mathcal{I})\).

RDFS Axiomatic Triples (1)

Furthermore all of the following axiomatic triples must be satisfied:

rdf:typerdfs:domainrdfs:Resource .
rdfs:domainrdfs:domainrdf:Property .
rdfs:rangerdfs:domainrdf:Property .
rdfs:subPropertyOfrdfs:domainrdf:Property .
rdfs:subClassOfrdfs:domainrdfs:Class .
rdf:subjectrdfs:domainrdf:Statement .
rdf:predicaterdfs:domainrdf:Statement .
rdf:objectrdfs:domainrdf:Statement .
rdf:memberrdfs:domainrdfs:Resource .
rdf:firstrdfs:domainrdf:List .
rdf:restrdfs:domainrdf:List .
rdf:seeAlsordfs:domainrdfs:Resource .
rdf:isDefinedByrdfs:domainrdfs:Resource .
rdfs:commentrdfs:domainrdfs:Resource .
rdfs:labelrdfs:domainrdfs:Resource .
rdfs:valuerdfs:domainrdfs:Resource .

RDFS Axiomatic Triples (2)

Furthermore all of the following axiomatic triples must be satisfied:

rdf:type rdfs:range rdfs:Class .
rdfs:domain rdfs:range rdfs:Class .
rdfs:range rdfs:range rdfs:Class .
rdfs:subPropertyOf rdfs:range rdfs:Property .
rdfs:subClassOf rdfs:range rdfs:Class .
rdf:subject rdfs:range rdfs:Resource .
rdf:predicate rdfs:range rdfs:Resource .
rdf:object rdfs:range rdfs:Resource .
rdfs:member rdfs:range rdfs:Resource .
rdf:first rdfs:range rdfs:Resource .
rdf:rest rdfs:range rdfs:List .
rdfs:seeAlso rdfs:range rdfs:Resource .
rdfs:isDefinedBy rdfs:range rdfs:Resource .
rdfs:comment rdfs:range rdfs:Literal .
rdfs:label rdfs:range rdfs:Literal .
rdf:value rdfs:range rdfs:Resource .

RDFS Axiomatic Triples (3)

Furthermore all of the following axiomatic triples must be satisfied:

rdfs:ContainerMembershipPropertyrdfs:subClassOfrdf:Property .
rdfs:Altrdfs:subClassOfrdfs:Container .
rdfs:Bagrdfs:subClassOfrdfs:Container .
rdfs:Seqrdfs:subClassOfrdfs:Container .
rdfs:isDefinedByrdfs:subPropertyOfrdfs:seeAlso .
rdf:XMLLiteralrdfs:typerdfs:Datatype .
rdf:XMLLiteralrdfs:subClassOfrdfs:Literal .
rdfs:Datatyperdfs:subClassOfrdfs:Class .
rdf:_irdf:typerdfs:ContainerMembershipProperty .
rdf:_irdfs:domainrdfs:Resource .
rdf:_irdfs:rangerdfs:Resource .

Overview

  1. What is Semantics?
  2. What is Model-theoretic Semantics?
  3. Model-theoretic Semantics for RDF(S)
  4. What is Proof-theoretic Semantics?
  5. Proof-theoretic Semantics for RDF(S)

Back to our contrived example

Say, a model I of a set K of sentences consists of

  • a setC of cars and
  • a functionI() , which maps each variable to a car in C such that, for each sentence aηb in K we have that I(a) has more horsepower than I(b) .

 

Can we find an algorithm that computes all logical consequences of a set of sentences?

Back to our contrived example

 

Algorithm input: set K of sentences.

  1. The algorithm non-deterministically selects two sentences from K . If the first sentence is aηb and the second sentences is bηc then add aηc to K . That is
    ifaηbK and bηcKthenKK{aηc}
  2. Repeat step 1. until no selection results in a change of K .
  3. Output K .

Back to our example

The algorithm produces only logical consequences: it is sound w.r.t. the model-theoretic semantics.

The algorithm produces all logical consequences: it is complete w.r.t. the model-theoretic semantics.

The algorithm always terminates.

The algorithm is non-deterministic.

What is the computational complexity of this algorithm?

What do we do again?

Recall: \(\beta\) is a logical consequence of \(\alpha\) (\(\alpha \models \beta\)), if for all \(M\) with \(M \models \alpha\), we also have \(M \models \beta\).

Implementing model-theoretic semantics directly is not feasible: We would have to deal with all models of a knowledge base. Since there are a lot of cars in this world, we would have to check a lot of possibilities.

Proof theory reduces model-theoretic semantics to symbol manipulation. It removes the models from the process.

Deduction rules

\(\verb|if| \quad a \; \eta \; b \in K\) and \(b \; \eta \; c \in K \quad \verb|then| \quad K \leftarrow K \cup \{a \; \eta \; c\}\)

is a so-called deduction rule. Such rules are usually written schematically as

\[\frac{a \; \eta \; b \quad b \; \eta \; c}{a \; \eta \; c}\]

Overview

  1. What is Semantics?
  2. What is Model-theoretic Semantics?
  3. Model-theoretic Semantics for RDF(S)
  4. What is Proof-theoretic Semantics?
  5. Proof-theoretic Semantics for RDF(S)

Notation

\(a\) and \(b\) can refer to arbitrary URIs (i.e. anything admissible for the predicate position in a triple,

\(\mathit{\_\!\!:\!\!n}\) will be used for the ID of a blank node,

\(u\) and \(v\) refer to arbitrary URIs or blank node IDs (i.e. any possible subject of a triple),

\(x\) and \(y\) can be used for arbitrary URIs, blank node IDs or literals (anything admissible for the object position in a triple), and

\(l\) may refer to any literal.

Simple Entailment Rules


\[\frac{u \quad a \quad x \quad .}{u \quad a \quad \mathit{\_\!\!:\!\!n} \quad .} \qquad \mathrm{se1}\]


\[\frac{u \quad a \quad x \quad .}{\mathit{\_\!\!:\!\!n} \quad a \quad x \quad .}\qquad \mathrm{se2}\]


\(\mathit{\_\!\!:\!\!n}\) must not be contained in the graph the rules is applied to.

RDF Entailment Rules

\[\frac{}{u \quad a \quad x \quad .} \qquad \mathrm{rdfax}\]
for all RDF axiomatic triples \(u \; a \; x \; .\)

 

\[\frac{u \quad a \quad l \quad .}{u \quad a \quad \mathrm{\_\!\!:\!\!n} \quad .} \qquad \mathit{lg}\]
where \(\mathit{\_\!\!:\!\!n}\) does not yet occur in the graph

 

\[\frac{u \quad a \quad y \quad .}{a \quad \verb|rdf:type| \quad \verb|rdf:Property| \quad .} \qquad \mathrm{rdf1}\]
\[\frac{u \quad a \quad l \quad .}{\mathit{\_\!\!:\!\!n} \quad \verb|rdf:type| \quad \verb|rdf:XMLLiteral| \quad .} \qquad \mathrm{rdf2}\]
where \(\mathit{\_\!\!:\!\!n}\) does not yet occur in the graph, unless it has been introduced by a preceding application of the \(\mathrm{lg}\) rule

RDFS Entailment Rules (1)

\[\frac{}{u \quad a \quad x \quad .} \qquad \mathrm{rdfax}\]
for all RDFS axiomatic triples \(u \; a \; x \; .\)

\[\frac{u \quad a \quad l \quad .}{\mathit{\_\!\!:\!\!n} \quad \verb|rdf:type| \quad \verb|rdfs:Literal| \quad .} \qquad \mathrm{rdfs1}\] with \(\mathit{\_\!\!:\!\!n}\) as usual

\[\frac{a \quad \verb|rdfs:domain| \quad x \quad . \qquad u \quad a \quad y \quad .}{u \quad \verb|rdf:type| \quad x \quad .} \qquad \mathrm{rdfs2}\]
\[\frac{a \quad \verb|rdfs:range| \quad x \quad . \qquad u \quad a \quad v \quad .}{v \quad \verb|rdf:type| \quad x \quad .} \qquad \mathrm{rdfs2}\]
\[\frac{u \quad a \quad x \quad .}{u \quad \verb|rdf:type| \quad \verb|rdfs:Resource| \quad .} \qquad \mathrm{rdfs4a}\]
\[\frac{u \quad a \quad v \quad .}{v \quad \verb|rdf:type| \quad \verb|rdfs:Resource| \quad .} \qquad \mathrm{rdfs4b}\]

RDFS Entailment Rules (2)

 

\[\frac{u \quad \verb|rdfs:subPropertyOf| \quad v \quad . \qquad v \quad \verb|rdfs:subPropertyOf| \quad x \quad .}{u \quad \verb|rdfs:subPropertyOf| \quad x \quad .}\quad \mathrm{rdfs5}\]

\[\frac{u \quad \verb|rdf:type| \quad \verb|rdf:Property| \quad .}{u \quad \verb|rdfs:subPropertyOf| \quad u \quad .} \quad \mathrm{rdfs6}\]

\[\frac{a \quad \verb|rdfs:subPropertyOf| \quad b \quad . \qquad u \quad a \quad y \quad .}{u \quad b \quad y \quad .}\quad \mathrm{rdfs7}\]

\[\frac{u \quad \verb|rdf:type| \quad \verb|rdfs:Class| \quad .}{u \quad \verb|rdfs:subClassOf| \quad \verb|rdfs:Resource| \quad .} \quad \mathrm{rdfs8}\]

\[\frac{u \quad \verb|rdfs:subClassOf| \quad x \quad . \qquad v \quad \verb|rdf:type| \quad u \quad .}{u \quad \verb|rdf:type| \quad x \quad .}\quad \mathrm{rdfs9}\]

\[\frac{u \quad \verb|rdf:type| \quad \verb|rdfs:Class| \quad .}{u \quad \verb|rdfs:subClassOf| \quad u \quad .} \quad \mathrm{rdfs10}\]

RDFS Entailment Rules (3)

 

\[\frac{u \quad \verb|rdfs:subClassOf| \quad v \quad . \qquad v \quad \verb|rdfs:subClassOf| \quad x \quad .}{u \quad \verb|rdfs:subClassOf| \quad x \quad .}\quad \mathrm{rdfs11}\]

\[\frac{u \quad \verb|rdf:type| \quad \verb|rdfs:ContainerMembershipProperty| \quad .}{u \quad \verb|rdfs:subPropertyOf| \quad \verb|rdfs:member| \quad .}\quad \mathrm{rdfs12}\]

\[\frac{u \quad \verb|rdf:type| \quad \verb|rdfs:Datatype| \quad .}{u \quad \verb|rdfs:subClassOf| \quad \verb|rdfs:Literal| \quad .}\quad \mathrm{rdfs13}\]

\[\frac{u \quad a \quad \mathit{\_\!\!:\!\!n} \quad .}{u \quad a \quad l \quad .}\quad \mathrm{gl}\]
where \(\mathit{\_\!\!:\!\!n}\) identifies a blank node introduced by an earlier weakening of the literal \(l\) via the rule \(\mathrm{lg}\)

Completeness (1/2)

The deduction rules for simple and RDF entailment are sound and complete.

The deduction rules for RDFS entailment are sound.

Completeness (2/2)

According to the spec [2] they are also complete, but they are not:

ex:isHappilyMarriedTo   rdfs:subPropertyOf      _:bnode .
_:bnode                 rdfs:domain             ex:Person .
ex:Bob                  ex:isHappilyMarriedTo   ex:Alice .

has the logical consequence

ex:Bob  rdf:type  ex:Person .

but is not derivable using the RDFS deduction rules.

Which rule(s) need to be changed and how in order to fix this?

Complexity

Simple, RDF, and RDFS entailment are NP-complete problems.

If we disallow blank nodes, all three entailment problems are polynomial [3].

Does RDFS semantics do what it should

Does

ex:speaksWith   rdfs:domain       ex:Homo .
ex:Homo         rdfs:subClassOf   ex:Primates .

entail

ex:speaksWith   rdfs:domain   ex:Primates .

?

Efficient RDFS Reasoning

  

Efficient RDFS Reasoning

 

References

  1. Pascal Hitzler et al.: Foundations of Semantic Web Technologies, Chapman & Hall, 2010
  2. Patrick Hayes, RDF Semantics, W3C Recommendation, http://www.w3.org/TR/rdf-mt/, W3C, 2004
  3. Herman J. ter Horst: Completeness, decidability and complexity of entailment for RDF Schema and a semantic extension involving the OWL vocabulary, J. Web Sem. 3(2-3): 79-115 (2005)

Model theoretic Semantics (1/2)

Recap the notions “theory”, “logical consequence”, and “equivalence” and decide whether the following statements are true. Provide an (informal) explanation for your claim.

 

Model theoretic Semantics (2/2)

For arbitrary theories T and S holds:

 

1- If a formula F is a tautology then T |= F holds, i.e. every theory entails at least all tautologies.

2- The bigger a logical theory is, the more models it has. This means, if T ⊆ S, then every model of T is also a model of S.

3- If ¬F ∈ T, then T |= F cannot hold.

 

 

Shared blank nodes

What about blank nodes?


The universe {Alice, Bob, Monica, Ruth} with:
I( ex:Alice )=Alice, I( ex:Bob )=Bob, IEXT(I( ex:hasChild ))={<Alice,Monica>,<Bob,Ruth> }

Source : https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-mt/index.html

Tasks

  • Entail triples from the following Graph
    • ex:Garfield rdf:type ex:Cat
    • ex:Cat rdfs:subClassOf ex:Animal
    • ex:hasPet rdfs:range ex:Animal
    • ex:hasPet rdfs:domain ex:Person
    • ex:Person rdfs:subClassOf ex:Animal
    • ex:hasPet rdfs:subPropertyOf ex:livesWith
    • ex:Judie ex:hasPet ex:Casimir

Linked Data Stack - OWL

Ontology: OWL User Interface & Applications Trust Crypto Proof Unifying Logic Rules: RIF Query: SPARQL RDF-Schema Data Interchange: RDF XML URI Unicode

Ontology—Philosophy

  •  Singular only (there are no "ontologies")
  • The "science of being"
  • Found in Aristotle (Socrates), Thomas Aquinas, Descartes, Kant, Hegel, Wittgenstein, Heidegger, Quine, ....

Ontology—Computer Science

Gruber (1993): An Ontology is a formal specification of a shared conceptualization of a domain of interest.
  • Machine interpretable
  • Based on consensus
  • Describes concepts
  • Related to a topic (subject matter)

Ontology—Practical, some Requirements

  • Instantiation of classes by individuals
  • Concept hierarchies (taxonomies, inheritance): classes, terms
  • Binary relations between individuals: Properties, Roles
  • Properties of relations (e.g., range, transitive)
  • Data types (e.g. Numbers): concrete domains
  • Logical means expression
  • Clear semantics!

OWL—In General

  • OWL acronym for Web Ontology Language, more easily pronounced than WOL
  • family of languages for authoring ontologies
  • since 2004, OWL 2.0 since 2009
  • Semantic fragment of FOL
  • W3C-Documents contain details that can not all be addressed here

RDF Schema as Ontology Language?

  • suitable for simple ontologies
  • Advantage: automatic inference is relatively efficient
  • But unsuitable for modeling complex: but
  • Usage of more powerful languages ​​like
    • OWL
    • F-Logic
    • ... 

Individuals

Manchester Syntax

Individual: Einstein  Types: Professor

Turtle Syntax 

:Einstein rdf:type :Professor.

The head of an OWL document

Defining namespaces in the root
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
@prefix owl: <http://www.w3.org/2002/07/owl#>.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
Turtle Syntax

The head of an OWL document

General Information

:bookstoreOntology rdf:type owl:Ontology;
rdfs:comment "Bookstore Ontology";
owl:versionInfo "1.2";
owl:imports "http://library.org/books".

Turtle Syntax

The head of an OWL document

inherited from RDFS for versioning
rdfs:comment owl:versionInfo
rdfs:label owl:priorVersion
rdfs:seeAlso owl:backwardCompatibleWith
rdfs:isDefinedBy owl:incompatibleWith
owl:DeprecatedClass
owl:DeprecatedProperty
 
 
 
 
 
 
 
 
 

also owl:imports

OWL Documents

  • Consist of a set of Axioms
  • An axiom can be expressed as a set of RDF triples
  •  Use RDF / XML as a standard syntax
  • There are other formats that are often even easier to read and process
  • Simple example: Ontology http://my-ontology.org with two classes and a student and a person between which is a subclass-of relationship

OWL Documents

  • Several syntaxes for OWL according to the use case:
    • RDF/XML: Standard syntax / data exchange
    • OWL/XML: easier for XML tools
    • Turtle: easier to read and write RDF triples
    • MOS: easier to read and write DL ontologies
    • Functional: easier to see formal structure
  • Provision of RDF/XML "duty" upon publication of ontologies, other types optionally
  • OWL file consists:
    • Header with general information
    • Rest with actual ontology

OWL Syntax—Turtle

@prefix : <http://my-ontology.org/>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix owl : <http://www.w3.org/2002/07/owl#>.
@prefix rdf : <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@base <http://my-ontology.org/>.
<http://my-ontology.org/> rdf:type owl:Ontology.
:Person rdf:type owl:Class.
:Student rdf:type owl:Class ; rdfs:subClassOf :Person .
  • best allround RDF syntax
  • concise and easy to read
  • not specifically designed for OWL, complex expressions impractical
  • very good tool support

OWL Syntax—Manchester


Prefix: rdfs = <http://www.w3.org/2000/01/rdf-schema#>
Prefix: owl = <http://www.w3.org/2002/07/owl#>
Prefix: rdf = <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
Ontology: <http://my-ontology.org/>
Class: Person
  SubClassOf: owl:Thing
Class: Student
  SubClassOf: Person

  • very easy to read and write
  • cumbersome to write some types of axioms
  • Description Logic: Student ⊆ Person
  • functional syntax  
  • used for OBO (Open Biomedical Ontologies)

OWL Syntax—RDF/XML

<?xml version="1.0"?> <!DOCTYPE rdf:RDF [ <!ENTITY owl "http://www.w3.org/2002/07/owl#" > <!ENTITY rdfs "http://www.w3.org/2000/01/rdf-schema#" > <!ENTITY rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#" > ]> <rdf:RDF xmlns="http://my-ontology.org/" xml:base="http://my-ontology.org/" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <owl:Ontology rdf:about=""/> <owl:Class rdf:about="Person"> <rdfs:subClassOf rdf:resource="&owl;Thing"/> </owl:Class> <owl:Class rdf:about="Student"> <rdfs:subClassOf rdf:resource="Person"/> </owl:Class> </rdf:RDF>
  • best tool support
  • very verbose and difficult to read

OWL Syntax—OWL/XML

For the complete syntax see the W3C Recommendation .

  • specifically designed for OWL, thus more simple then RDF/XML
  • still very verbose
<?xml version="1.0"?>
<!DOCTYPE Ontology [
<!ENTITY owl "http://www.w3.org/2002/07/owl#" > [...] ]>
<Ontology [...] URI="http://my-ontology.org/">
    <SubClassOf>
        <Class URI="&my-ontology;Person"/>
        <Class URI="&owl;Thing"/>
    </SubClassOf>
    <SubClassOf>
        <Class URI="&my-ontology;Student"/>
        <Class URI="&my-ontology;Person"/>
    </SubClassOf>
</Ontology>

Classes, Roles and Individuals

The three components of Ontology axioms (analogous to RDFS):

  • Individuals / objects
    • Concrete elements in modeled world
  • Classes / concepts
    • Sets of objects
  • Roles / Properties
    • Associates two individuals

Classes

Definition

Turtle Syntax :Professor rdf:type owl:Class .
Manchester Syntax Class: Professor
 
 
 

predefined: owl:Thing , owl:Nothing

Abstract Roles (Turtle)

 Abstract roles are defined like classes

 :
 belongsTo 
 rdf:type owl:ObjectProperty. 
Turtle Syntax

Domain and Range of abstract roles

 :
 belongsTo 
 rdf:type owl:ObjectProperty;    rdfs:domain :Person;    rdfs:range :Organisation. 
Turtle Syntax

Abstract Roles (Manchester)

Abstract roles are defined like classes

 ObjectProperty: belongsTo
Manchester Syntax

Domain and Range of abstract roles

 ObjectProperty: belongsTo  Domain: Person  Range:  Organisation
Manchester Syntax

Concrete roles

Concrete roles have a data type in the range

DatatypeProperty: firstName 
Manchester Syntax

Domain and Range of concrete roles

DatatypeProperty: firstName
 Domain: Person
 Range: string 
Manchester Syntax  

Concrete roles

Concrete roles have a data type in the range

 :firstName rdf:type owl:DatatypeProperty.  
Turtle Syntax

Domain and Range of concrete roles

 :firstName rdf:type owl:DatatypeProperty;
         rdfs:domain :Person;
         rdfs:range  xsd:String.  
 Turtle Syntax  

Individuals and roles

:Einstein           rdf:type             :Professor ;
                    :belongsTo           :Princeton,
                                         :Bern;
                    :firstName           "Albert"^^xsd:string .
Turtle Syntax

Roles are generally not functional.

Individuals and roles

Individual: Einstein
 Types: Person
 Facts: belongsTo Princeton, belongsTo Bern,  firstName Albert
Manchester Syntax

Roles are generally not functional.

Class Relations

 

owl:equivalentClass

It follows by inference, that Book is a subclass of Publications .

    :Book                 rdf:type                    owl:Class;
    rdfs:subClassOf        :Publication.
    :Publication        rdf:type                    owl:Class;
    owl:equivalentClass :Publications.
    :Publications      rdf:type                    owl:Class .
  
Turtle Syntax
Class: Book
 SubClassOf: Publication

Class: Publication
 EquivalentTo: Publications

Class: Publications
Manchester Syntax

rdfs:subClassOf

:Professor           rdf:type          owl:Class;
                     rdfs:subClassOf   :FacultyStaff.

:FacultyStaff        rdf:type          owl:Class ;
                     rdfs:subClassOf   :Person .  
Turtle Syntax
Class: Professor
 SubClassOf: FacultyStaff


Class: FacultyStaff
 SubClassOf: Person  
Manchester Syntax 

It follows by inference, that Professor is a subclass of Person.

owl:disjointWith


:Book rdf:type owl:Class; rdfs:subClassOf :Publication.
:FacultyStaff rdf:type owl:Class; owl:disjointWith :Publication.
:Professor rdf:type owl:Class; rdfs:subClassOf :FacultyStaff.
:Publication rdf:type owl:Class.

Turtle Syntax

Class: Professor
 SubClassOf: FacultyStaff
Class: Book
 SubClassOf: Publication
Class: FacultyStaff
 DisjointWith: Publication 

Manchester Syntax

It follows by inference, that Professor and Book are also disjoint classes.

Individuals and class relations


:Book                    rdf:type          owl:Class ; 
                         rdfs:subClassOf   :Publication . 
:SemanticWebGrundlagen   rdf:type          :Book ;     
                         :autor            :MarkusKroetzsch,
                                           :PascalHitzler, 
                                           :SebastianRudolph,  
                                           :YorkSure .
Turtle Syntax

It follows by inference, that SemanticWebGrundlagen is a Publication.

Individuals and class relations

Class: Buch   SubClassOf: Publication 
Individual: SemanticWebGrundlagen  
 Types: Buch  
 Facts: autor PascalHitzler, autor MarkusKroetzsch,
        autor SebastianRudolph,       
        autor YorkSure   
 
Manchester Syntax

It follows by inference, that SemanticWebGrundlagen is a Publikation.

Role Relationships

Analogous to classes, for roles there is rdfs:subPropertyOf and owl:equivalentProperty. However, Roles can also be inverses (owl: inverseOf) of each other:

:examinedBy owl:inverseOf :examinerOf.

Turtle Syntax

ObjectProperty: examinedBy   InverseOf: examinerOf

Manchester Syntax 

Role Properties

  • Domain
  • Range
  • Transitivity, i.e. r (a, b) and r (b, c) implies r(a,c)
  • Symmetry, i.e. r (a, b) implies r (b, a)
  • Functionality r (a, b) and r (a, c) implies b=c
  • Inverse functionality r (a, b) and r (c, b) implies a=c

Relationships between individuals

 Individual: Faehnrich   SameAs: ProfessorFaehnrich  
Manchester Syntax
 :Faehnrich rdf:Type :Professor;            owl:sameAs :ProfessorFaehnrich.  
Turtle Syntax

It follows by inference, that ProfessorFaehnrich is a Professor.

Inequality of individuals is specified by owl:differentFrom.

Relationships between individuals

DifferentIndividuals: YorkSure, PascalHitzler, RudiStuder
Manchester Syntax
[ rdf:type owl:AllDifferent ;    
  owl:distinctMembers ( :PascalHitzler  :RudiStuder  :YorkSure) 
] .
Turtle Syntax

Task—Contradiction?

Class: Organisation
Class: Office  SubClassOf: wgs84:SpatialThing
ObjectProperty: hasOffice
 Domain: Organisation
 Range: Office
#imported axioms to longitude and latitude
ObjectProperty: wgs84:lat
 Domain: wgs84:SpatialThing
ObjectProperty: wgs84:long
 Domain: wgs84:SpatialThing
Individual: CompanyXY
 Types: Organisation
 Facts: hasOffice OfficeXY,
        wgs84:lat "41.8292928"^^xsd:double,
        wgs84:long "17.8282"^^xsd:double      
Manchester Syntax 

Enumerations/Nominals


Class: MoonOfMars  EquivalentTo: {Phobos, Deimos} 

Manchester Syntax


:MoonsOfMars rdf:type owl:Class;
owl:equivalentClass
[
rdf:type owl:Class ;
owl:oneOf ( :Phobos :Deimos )
] .

Turtle Syntax


This states that mars has just these two moons.

Logical class constructors

  • Logical AND (conjunction): owl:intersectionOf
  • Logical OR (disjunction): owl:unionOf
  • Logical NOT (negation): owl:complementOf
  • Used to construct complex classes from simple classes

Intersection

Class: MoonOfMars   EquivalentTo: Moon and ObjectNearMars 
Manchester Syntax
:MoonOfMars
  owl:equivalentClass
[
 rdf:type            owl:Class ;
 owl:intersectionOf  (:Moon :ObjectNearMars)
 ].
Turtle Syntax
It follows e.g. by inference that all moons of Mars are objects near Mars.

Union

Class: Boat   SubClassOf: SailBoat or MotorBoat
Manchester Syntax
:Boat  rdfs:subClassOf
 [
  rdf:type            owl:Class ;
  owl:UnionOf  ( :SailBoat :MotorBoat )
 ] . 
Turtle Syntax

Disjoint Union

A professor can either be active or retired but not both at the same time.

<owl:Class rdf:about="#Professor">
 <rdfs:subClassOf>
  <owl:unionOf rdf:parseType="Collection">
   <owl:Class rdf:about="#Active"/>
   <owl:Class rdf:about="#Retired"/>
  </owl:unionOf>
 </rdfs:subClassOf>
</owl:Class>

RDF/XML Syntax

Class: Professor   DisjointUnionOf: Active, Retired 

Manchester Syntax
:Professor  rdfs:subClassOf
[
rdf:type            owl:Class ;
owl:DisjoinUnionOf  ( :Active :Retired )
] .
Turtle Syntax

Complement

<owl:Class rdf:ID="FacultyStaff">
 <rdfs:subClassOf>
  <owl:complementOf rdf:resource="#Publication"/>
 </rdfs:subClassOf>
</owl:Class>
<!-- semantically equivalent statement: -->
<owl:Class rdf:ID="FacultyStaff">
 <owl:disjointWith rdf:resource="#Publication"/>
</owl:Class> 
RDF/XML Syntax
Class: FacultyStaff   SubClassOf: not Publication 
Manchester Syntax
:FacultyStaff rdf:type owl:Class ;
 rdfs:subClassOf
 [
  rdf:type owl:Class ;
  owl:complementOf :Publication
 ].
Turtle Syntax

Modeling Example

Modeling task: all persons are either male or female.

:Male       rdfs:subClassOf  :Person .
:Female     rdfs:subClassOf  :Person ;
            owl:disjointWith  :Male .

:Person     owl:equivalentClass
[
 rdf:type owl:Class ;
 owl:unionOf (:Male, :Female )
]. 
Solution (Turtle Syntax)
Class: Male    SubClassOf: Person 
Class: Female   SubClassOf: Person   DisjointWith: Male 
Class: Person   EquivalentTo: Male or Female  
Solution (Manchester Syntax)

Role limitations (allValuesFrom)

Are used to define complex classes by roles

<owl:Class rdf:ID="Exam">   
 <rdfs:subClassOf>         
  <owl:Restriction>       
   <owl:onProperty rdf:resource="#examiner"/>             
    <owl:allValuesFrom rdf:resource="#Professor"/>    
  </owl:Restriction>     
 </rdfs:subClassOf> 
</owl:Class> 
Manchester Syntax 
:Pruefung   rdfs:subClassOf 
[ rdf:type  owl:Restriction ;   owl:onProperty  :examiner ;   
owl:allVauesFrom  :Professor ] . 
Turtle Syntax
I.e. all examiners must be professors.

Role limitations (someValuesFrom)

<owl:Class rdf:ID="Exam">     
 <rdfs:subClassOf>         
  <owl:Restriction>             
   <owl:onProperty rdf:resource="#examiner"/>             
    <owl:someValuesFrom rdf:resource="#Person"/>         
  </owl:Restriction>     
 </rdfs:subClassOf> 
</owl:Class> 
RDF/XML Syntax
 Class: Exam   SubClassOf: 
 examiner 
  some Person  
Manchester Syntax
:Pruefung   rdfs:subClassOf            
[ rdf:type    owl:Restriction ;   owl:onProperty   :examiner;   
 owl:someVauesFrom   :Person ] .
 
Turtle Syntax
I.e. each test must have at least one examiner.

Role limitations (cardinalities)

Class: Exam   SubClassOf: examiner max 2
Manchester Syntax
:Exam      rdf:type     owl:Class ; 
           rdfs:subClassOf 
[rdf:type owl:Restriction ;                
 owl:onProperty :examiner ;        
 owl:maxCardinality "2"^^xsd:nonNegativeInteger] . 
Turtle Syntax

I.e. each test can have at most two examiners.

Analogous to max: min, exactly

Modeling Example

Modeling task: A performance requirement (of a software product) is a requirement, which is created by customers. It leads to a system requirement.
Class: PerformanceRequirement  SubClassOf: Requirement
 and (createdBy only Customer)
 and (leadsTo some SystemRequirement 
Solution (Manchester Syntax)

Role limitations (hasValue)

<owl:Class rdf:ID="ExamAtFaehnrich">     
 <rdfs:equivalentClass>         
  <owl:Restriction>           
   <owl:onProperty rdf:resource="#examiner"/> 
    <owl:hasValue rdf:datatype="#Faehnrich"/>         
  </owl:Restriction>     
 </rdfs:equivalentClass> 
</owl:Class> 
RDF/XML Syntax
Class: PruefungBeiFaehnrich   EquivalentTo: examiner value Faehnrich 
Manchester Syntax
:Exam   rdfs:equivalentClass        
 [rdf:type  owl:Restriction ;   owl:onProperty  :examiner ;     
            owl:hasValue :Faehnrich ] . 

Turtle Syntax

Domain and Range

ObjectProperty: belongsTo   Range: Organisation
Manchester Syntax
is equivalent to the following:
Class: owl:Thing   SubClassOf: belongsTo only Organisation
Manchester Syntax

Domain and Range

:belongsTo   rdf:type  owl:ObjectProperty ;  rdfs:range  :Organisation . 
Turtle Syntax

is equivalent to the following:
owl:Thing   rdfs:subClassOf     
 [ rdf:type   owl:Restriction ; owl:onProperty :belongsTo ; 
  owl:allValuesFrom   :Organisation ] . 
Turtle Syntax

Domain and Range

How do you model the domain D of a property p without using the Domain keyword?

Hint: Whenever p occurs at least once on something, this thing is of type D. For “at least once”, think of cardinality constraints or of existential quantification.

Domain and Range: Caution!

<owl:ObjectProperty rdf:ID="belongsTo">
 <rdfs:range rdf:resource="#Organisation"/>
 </owl:ObjectProperty> <Number rdf:ID="Five">
 <belongsTo rdf:resource="#Primes"/>
</Number>       
RDF/XML Syntax
ObjectProperty:
 belongsTo   Range: Organisation
Individual: Five
 Types: Number
 Facts: belongsTo Primes
Manchester Syntax
:belongsTo    rdf:type         owl:ObjectProperty ;
              rdfs:range       :Organisation.
:Five         rdf:type         :Number;
              :belongsTo       :Primes. 
Turtle Syntax

It now follows that Primes is an organization!

Role Properties

<owl:ObjectProperty rdf:ID="hasColleague">
 <rdf:type rdf:resource="&owl;TransitiveProperty"/>
 <rdf:type rdf:resource="&owl;SymmetricProperty"/>
</owl:ObjectProperty>
<owl:ObjectProperty rdf:ID="hasProjectManager">
 <rdf:type rdf:resource="&owl;FunctionalProperty"/>
</owl:ObjectProperty> <owl:ObjectProperty rdf:ID="isProjectManagerFor">
      <rdf:type rdf:resource="&owl;InverseFunctionalProperty"/>
</owl:ObjectProperty>
<Person rdf:ID="SoerenAuer">
<hasColleague rdf:resource="#SebastianTramp"/>
<hasColleague rdf:resource="#JensLehmann"/>
<isProjectManagerFor rdf:resource="#Triplify"/>
</Person>
<Project rdf:ID="OntoWiki">
 <hasProjectManager rdf:resource="#SoerenAuer"/>
 <hasProjectManager rdf:resource="#AuerSoeren"/>
</Project> 
RDF/XML Syntax

Conclusions from the example

  • SebastianTramp hasColleagues SoerenAuer
  • SebastianTramp hasColleagues JensLehmann
  • SoerenAuer owl:sameAs AuerSoeren

Negative Facts

Only possible since OWL 2.0.
 Individual: Bill
   Facts: not hasWife Mary
Manchester Syntax

 []  rdf:type               owl:NegativePropertyAssertion ;
     owl:sourceIndividual   :Bill ;
     owl:assertionProperty  :hasWife ;
     owl:targetIndividual   :Mary .
Turtle Syntax

Modeling Example

Modeling task
There are male and female persons. People have names.

Modeling Example

Modeling task

There are male and female persons. People have names.

:MalePerson    rdfs:subClassOf  :Person.
:FemalePerson  rdfs:subClassOf  :Person.
:hasName       rdf:type         owl:DatatypeProperty;
               rdfs:domain      :Person;
               rdfs:range       xsd:string.
Solution (Turtle Syntax)

Modeling Example

Modeling task
There are male and female persons. Persons have names.
:MalePerson    rdfs:subClassOf  :Person.
:FemalePerson  rdfs:subClassOf  :Person.
:hasName       rdf:type         owl:DatatypeProperty;
               rdfs:domain      :Person;
               rdfs:range       xsd:string.
:John          rdf:type         :MalePerson;
               :hasName         "John"^^xsd:string.
:Jill          rdf:type         :FemalePerson.
:hasName       "Jill"^^xsd:string .  
Solution (Turtle Syntax)

OWL 1 vs OWL 2 and History

  • OWL became W3C recommendation in 2004, OWL 2 in 2009
  • OWL 2 but fully backward compatible extension of OWL  
  • different language subsets: OWL variants vs OWL 2 profiles
  • syntactic sugar (already possible but now easier to express)
    • disjoint union of classes
  • new expressivity
    • keys
    • property chains
    • richer datatypes, data ranges
    • qualified cardinality restrictions
    • asymmetric, reflexive, and disjoint properties
    • enhanced annotation capabilities

OWL 2 Full

  • Unlimited use of all OWL and RDF language elements (must be valid RDFS)
  •  Difficult: non-existent type separation (classes, roles, individuals), thus:
    • owl:Thing the same as rdfs:Resource
    • owl:Class the same as rdfs:Class
    • owl:DatatypeProperty subclass of owl:ObjectProperty
    • owl:ObjectProperty the same as rdf:Property

OWL 2 Profiles

  • Three OWL profiles added in OWL 2.0: OWL QL, EL and RL.
  • Design principle for profiles:
  • Identify maximal OWL 2 sublanguages that are still implementable in polynomial time.
  • In general disallow negation and disjunction, because that complicates reasoning and is rarely needed

OWL QL

  • QL = Query Language  
  • based on the DL-Lite family of description logics
  • designed for query answering using SQL rewriting on top of relational databases
  • subclasses can only be class names or existentials with unrestricted filler  
  • superclasses can be class names, existentials or conjunctions with superclass filler (recursive) or negations with subclass filler

Allowed

\[\text{Fish} \sqsubseteq \text{Animal}\]

\[\exists \text{hasHouse}.\top \sqsubseteq \text{Landlord}\]

Forbidden

\[\exists \text{hasHouse}.\text{Villa} \sqsubseteq \text{RichLandlord}\]

\[\text{Student} \sqsubseteq \text{Poor} \sqcap \exists \text{hasBike}.\top \sqcap \neg \text{Pupil}\]


Details About OWL QL (1)

Forbidden


equality \[ \text{RichPerson} = \text{FamousPerson} \]

disjunctions \[ \text{Book} = \text{Ebook} \sqcup \text{PrintedBook} \]

universals \[ \forall \text{killed}.\text{Human} = \text{Murderer} \]

self \[\text{loves}.\text{Self} = \text{NarcisticPerson} \] 

cardinalities \[ \geq 5 \text{wonFight}.\top = \text{SuccessfulBoxer} \]

Details About OWL QL (2)

Forbidden

keys

Class: Person
HasKey: hasSSN
property chains
ObjectProperty: hasGrandparent
SubPropertyChain: hasParent o hasParent
transitive properties
ObjectProperty: hasAncestor
Characteristics: Transitive
nominals
EquivalentClasses(
:MyBirthdayGuests
ObjectOneOf(:Bill :John :Mary) )
functional properties
ObjectProperty: hasHusband
Characteristics: Functional

OWL EL

  • based on description logic \(\mathcal{EL}\)++, \(\mathcal{E}\) stands for full existential qualification
  • focus on terminlogical expressivity used for light-weight ontologies
  • allow existential but not universal, only rdfs:range (special kind of universals) allowed with restrictions
  • property domains, class/property hierarchies, class intersections, disjoint classes/properties, property chains, Self, nominals (classes with enumerated individual members) and keys fully supported
  • no inverse or symmetric properties, no disjunctions or negations

Examples

\[\exists \text{has.Sorrow} \sqsubseteq \exists \text{has}.\text{Liqueur}\]

\[\top \sqsubseteq \exists \text{hasParent}.\text{Person}\]

\[\text{German} \sqsubseteq \exists \text{knows}.\text{{angela}}\]

\[\text{hasParent} \circ \text{hasParent} \sqsubseteq \text{hasGrandparent}\]

OWL RL

  • RL = Rule Language
  • subclass axioms as rule-like implications with head (superclass) and body (subclass)
  • \(\text{DaysWithRain} \sqsubseteq \text{DaysWithWetStreet}\)

Protégé OWL editor

 http://protege.stanford.edu

Further Reading

Manchester Format

  • Convert the following using the Manchester Format
    • ex:Garfield rdf:type ex:Cat
    • ex:Cat rdfs:subClassOf ex:Animal
    • ex:hasPet rdfs:range ex:Animal
    • ex:hasPet rdfs:domain ex:Person
    • ex:Person rdfs:subClassOf ex:Animal
    • ex:hasPet rdfs:subPropertyOf ex:livesWith
    • ex:Judie ex:hasPet ex:Casimir

Entailment

  • Entail Axioms and Facts from Ontology
    • Individual: ex:Garfield
      • Types: ex:Cat
      • Facts: ex:knows ex:Odie, ex:livesWith ex:Casimir
    • Class: ex:Cat SubClassOf: ex:Animal
    • ObjectProperty: ex:hasPet
      • Range: ex:Animal
      • Domain: ex:Person
      • SubPropertyOf: ex:livesWith
    • Class: ex:Person SubClassOf: ex:Animal
      • DisjointWith: ex:Cat
    • Individual: ex:Judie
      • Facts: ex:hasPet ex:Casimir
    • ObjectProperty: ex:livesWith
      • InverseOf: ex:livesWith
      • SubPropertyOf: ex:knows
      • Characteristics: Transitive
  • Convert to Turtle Syntax
    • ObjectProperty: ex:livesWith
      • InverseOf: ex:livesWith
      • SubPropertyOf: ex:knows
      • Characteristics: Transitive
  • Give 3 Models of the ex:livesWith Property for: 1 person, 2 persons, infinite number of persons

Try Protégé OWL editor out!

There are to possibilities to use Protégé;

1. Download it on your desktop

2. Use it online

Source:http://protege.stanford.edu/

Protégé on your desktop

You can easily download and run Protégé OWL editor on your desktop.

Source: http://protege.stanford.edu/products.php#desktop-protege

Using Protégé online!

Sign up and make a new project

To DO: make a class.

Introduction

 

Goals

Learn about

  • Semantics of OWL
  • Description Logics
  • Tableau reasoning

Prerequisites

  • Basic knowledge of Propositional Logic
  • Basic knowledge of First-Order Logic


Semantics of OWL

The semantics of OWL is based on Description Logics, which have a model theoretic formal semantics.

   OWL DL corresponds to \(\mathcal{SHOIN}(D)\).

   OWL 2 corresponds to \(\mathcal{SROIQ}(D)\).

In this lecture we will explain the semantics of OWL DL by describing the description logic \(\mathcal{SHOIN}(D)\)

Description Logics

  • family of knowledge representation languages
  • usually fragments of First Order Logic (FOL)
  • in most cases decidable
  • comparatively expressive
  • originated from Semantic Networks
  • intuitive syntax
  • variable free (e.g. P ⊑ Q instead of ∀ x . p(x) → q(x))


Description Logics - Basic Components

Basic components:

  • concept names (atomic concepts), e.g. Student, Book, ...
  • role names, e.g. bornIn, worksFor, ...
  • individual names (individuals, objects), e.g. Steven, Mary, ...

The set of concept, role and individual names is often denoted as signature or vocabulary.

Description Logics - Knowledge Base

Usually a DL knowledge base consists of:

  • TBox \(\mathcal{T}\) : Information about concepts.
  • ABox \(\mathcal{A}\) : Information about individuals.

Additionally, in more expressive DLs:

  • RBox \(\mathcal{R}\) : Information about roles.

\(\mathcal{ALC}\) - Concepts

\(\mathcal{ALC}\), Attributive Language with Complement, is the most simple propositionally closed (i.e. subsuming propositional logic) DL.  

(Complex) \(\mathcal{ALC}\) concepts are inductively defined as follows:

  • every concept name is a concept,
  • \(\mathcal{\top}\) and \(\mathcal{\bot}\) are concepts,
  • if \(C\) and \(D\) are concepts and \(r\) is a role then the following are concepts:
    • \(\neg C\) (often called negation or complement)
    • \(C \sqcap D\) (often called conjunction, intersection or "and")
    • \(C \sqcup D\) (often called disjunction, union or "or")
    • \(\exists r.C\) (often called existential restriction)
    • \(\forall r.C\) (often called value restriction)

\(\mathcal{ALC}\) Concepts - Examples

\(\text{Person} \sqcap \exists \text{hasChild}.\top\)

  • Persons with children

\(\text{Animal} \sqcap \forall \text{eat}.\text{Vegetable}\)

  • Animals which only eat vegetables

\(\text{Professor} \sqcup \text{Student}\)

  • Professors or students

\(\text{Person} \sqcap \forall\text{bornIn}.\neg \text{EuropeanCountry}\)

  • Persons which are not born in a country in Europe

\(\mathcal{ALC}\) - TBox

A TBox (terminological box) consists of a finite set of terminological axioms.

Terminological axioms are basically (general) concept inclusion axioms , i.e. given concepts \(C\) and \(D\), GCIs are denoted as

\[C \sqsubseteq D\] 

The concept equality can be expressed with

\[C \equiv D\]

which is an abbreviation for \(C \sqsubseteq D\) and \(D \sqsubseteq C\).

\(\mathcal{ALC}\) - ABox

An ABox consists of a finite set of assertional axioms of the following forms:

  • \(C(a)\), so-called concept assertions
  • \(r(a,b)\), so-called role assertions

\(\mathcal{ALC}\) - Semantics (1)

A formal definiton of the model-theoretic semantics  of \(\mathcal{ALC}\) is given by means of an interpretation \(\mathcal{I} = (\Delta^{\mathcal{I}}, \cdot^{ \mathcal{I}})\) consisting of

  • a non-empty domain \(\Delta^{ \mathcal{I}}\)
  • a mapping \( \cdot^{ \mathcal{I} }\), which maps
    • every individual \(a\) to a domain element \(a^{\mathcal{I}}\in\Delta^{\mathcal{I}}\)
    • every concept name \(A\) to a subset of domain elements \(A^{\mathcal{I}}\subseteq\Delta^{\mathcal{I}}\)
    • every role \(r\) to a set of pairs of domain elements \(r^{\mathcal{I}} \subseteq \Delta^{\mathcal{I}}\times\Delta^{\mathcal{I}}\)

\(\mathcal{ALC}\) - Semantics (2)

  • Interpretation is extended to complex concepts by
    • \(\top^\mathcal{I}=\Delta^\mathcal{I}\)
    • \(\bot^\mathcal{I}=\emptyset\)
    • \((\neg C)^\mathcal{I}=\Delta^\mathcal{I} \setminus C^\mathcal{I} \)
    • \((C \sqcap D)^\mathcal{I} = C^\mathcal{I} \cap D^\mathcal{I} \)
    • \( (C \sqcup D)^\mathcal{I}=C^\mathcal{I} \cup D^\mathcal{I} \)
    • \((\exists r.C)^\mathcal{I}= \{x \in \Delta^\mathcal{I} | \exists y \in C^\mathcal{I} \text{ such that } (x,y) \in r^\mathcal{I} \}\)
    • \((\forall r.C)^\mathcal{I}= \{x \in \Delta^\mathcal{I} | \forall (x,y) \in r^\mathcal{I} \rightarrow y \in C^\mathcal{I} \} \)

 

\(\mathcal{ALC}\) - Semantics (3)

  •  Interpretation is extended to axioms by
    • \(\mathcal{I} \models C \sqsubseteq D \text{ iff } C^\mathcal{I} \subseteq D^\mathcal{I} \)
    • \(\mathcal{I} \models C \equiv D \text{ iff } C^\mathcal{I} = D^\mathcal{I} \)
    • \(\mathcal{I} \models C(a) \text{ iff } a^\mathcal{I} \in C^\mathcal{I} \)
    • \(\mathcal{I} \models r(a,b) \text{ iff } (a^\mathcal{I},b^\mathcal{I}) \in r^\mathcal{I} \)

\(\mathcal{ALC}\) Knowledge Base - Example

TBox \(\mathcal{T}\) :

\( \begin{align} \text{Man} &\equiv \neg \text{Woman} \sqcap \text{Person}\\ \text{Woman} &\sqsubseteq \text{Person}\\ \text{Mother} &\equiv \text{Woman} \sqcap \exists \text{hasChild}.\top \\ \end{align} \)

ABox \(\mathcal{A}\) :

\( \begin{align} \text{Man(STEPHEN)}\\ \neg\text{Man(MONICA)}\\ \text{Woman(JESSICA)}\\ \text{hasChild(STEPHEN, JESSICA)}\\ \end{align} \)

\(\mathcal{ALC}\) in OWL

The \(\mathcal{ALC}\) operators correspond to the following OWL expressions:

\( \begin{align} \top &: \text{owl:Thing}\\ \bot &: \text{owl:Nothing}\\ \neg &: \text{owl:complementOf}\\ \sqcup &: \text{owl:unionOf}\\ \sqcap &: \text{owl:intersectionOf}\\ \exists &: \text{owl:someValuesFrom}\\ \forall &: \text{owl:allValuesFrom}\\ \end{align} \)


\(\mathcal{ALC}\) + Inverse Roles

Naming: \(\mathcal{ALCI}\)

A role can be

  • a role name \(r\), or
  • a inverse role \(r^-\)

The semantics of inverse roles is defined by

\[ (r^-)^\mathcal{I} = \{(y,x) | (x,y) \in r^\mathcal{I} \}\]

OWL construct: owl:inverseOf

\(\mathcal{ALC}\) + Role Hierarchy

Naming: \(\mathcal{ALCH}\)

For roles \(r,s\) 

  • a role inclusion axiom (RIA) is denoted by \(r \sqsubseteq s\)
  • \(r \equiv s\) is an abbreviation for \(r \sqsubseteq s\) and \(s \sqsubseteq r\)

An interpretation \(\mathcal{I}\) entails \(r \sqsubseteq s\) iff \(r^\mathcal{I} \subseteq s^\mathcal{I}\).

OWL construct: rdfs:subPropertyOf

\(\mathcal{ALC}\) + Role Transitivity

Naming: \(\mathcal{ALC}\) + Transitivity = \(\mathcal{S}\)

For a role \(r\) 

  • a transitivity axiom is denoted by \(\text{Trans}(r)\)

An interpretation \(\mathcal{I}\) entails \(\text{Trans}(r)\) iff

\[(x,y) \in r^\mathcal{I} \wedge (y,z) \in r^\mathcal{I} \rightarrow (x,z) \in r^\mathcal{I}.\] 

OWL construct: owl:TransitiveProperty

\(\mathcal{ALC}\) + Role Functionality

Naming: \(\mathcal{ALCF}\)

For a role \(r\) 

  • a functionality axiom is denoted by \(\text{Func}(r)\)

An interpretation \(\mathcal{I}\) entails \(\text{Func}(r)\) iff

\[(x,y) \in r^\mathcal{I} \wedge (x,z) \in r^\mathcal{I} \rightarrow y=z.\] 

OWL construct: owl:FunctionalProperty

Simple vs. Complex Roles

Let \(\mathcal{R}\) be a role hierarchy and let \(\sqsubseteq^{*}_{\mathcal{R}}\) be its reflexive and transitive closure.

  • A role \(r\) is complex w.r.t. \(\mathcal{R}\), if there exists a role \(s\) such that \(\text{Trans}(s) \in \mathcal{R}\) and \(s \sqsubseteq^{*}_{\mathcal{R}} r\).
  • Otherwise, the role \(r\) is simple .

Example:

\[\mathcal{R}=\{ u \sqsubseteq r , r \sqsubseteq s , s \sqsubseteq t , q \sqsubseteq t , \text{Trans}(r) \}\]

Complex: \(r,s,t\)

Simple: \(u,q\)

\(\mathcal{ALC}\) + Unqualified Number Restrictions

Naming: \(\mathcal{ALCN}\)

For a simple role \(r\) and a natural number \(n\), number restrictions \(\geq n r, \leq n r, = n r\) are concepts which semantics is defined as

\(\begin{align} (\geq n r)^\mathcal{I} &= \{x \in \Delta^\mathcal{I} | \#\{y \in \Delta^\mathcal{I} | (x,y) \in r^\mathcal{I} \} \geq n \}\\ (\leq n r)^\mathcal{I} &= \{x \in \Delta^\mathcal{I} | \#\{y \in \Delta^\mathcal{I} | (x,y) \in r^\mathcal{I} \} \leq n \}\\ (= n r)^\mathcal{I} &= \{x \in \Delta^\mathcal{I} | \#\{y \in \Delta^\mathcal{I} | (x,y) \in r^\mathcal{I} \} = n \}\\ \end{align}\)

OWL constructs:

\( \begin{align} \geq n r &= \text{owl:minCardinality}\\ \leq n r &= \text{owl:maxCardinality}\\ = n r &= \text{owl:cardinality} \end{align} \)

\(\mathcal{ALC}\) + Nominals

Naming: \(\mathcal{ALCO}\)

Let \(a_1, \ldots, a_n\) be individuals. A nominal \(\{a_1, \ldots, a_n\}\) is a concept which semantics is defined as

\( (\{a_1, \ldots, a_n\})^\mathcal{I}=\{a_1^\mathcal{I}, \ldots, a_n^\mathcal{I}\} \)

OWL construct: owl:oneOf

Logical Reasoning

Deductive Reasoning

  • starts with the assertion of a general rule and proceeds from there to a guaranteed specific conclusion
  • "from the general rule to the specific application"   

Inductive Reasoning

  • begins with observations that are specific and limited in scope, and proceeds to a generalized conclusion that is likely, but not certain, in light of accumulated evidence
  • "from the specific to the general"

 Abductive Reasoning

  • begins with an incomplete set of observations and proceeds to the likeliest possible explanation for the set   

Logical Reasoning in Description Logics (1)

 

Logical Reasoning in Description Logics (2)

Let \(\mathcal{I}\) be an interpretation, \(\mathcal{T}\) be a TBox, \(\mathcal{A}\) be an ABox and \(\mathcal{K}=(\mathcal{T},\mathcal{A})\) be a knowledge base.We say

  • \(\mathcal{I}\) is a model for \(\mathcal{T}\), iff \(\mathcal{I} \models \alpha\) for every axiom \(\alpha \in \mathcal{T}\), written \(\mathcal{I} \models \mathcal{T}\). 
  • \(\mathcal{I}\) is a model for \(\mathcal{A}\), iff \(\mathcal{I} \models \alpha\) for every axiom \(\alpha \in \mathcal{A}\), written \(\mathcal{I} \models \mathcal{A}\). 
  • \(\mathcal{I}\) is a model for \(\mathcal{K}\), iff \(\mathcal{I} \models \mathcal{T}\) and \(\mathcal{I} \models \mathcal{A}\).
  • An axiom \(\alpha\) is entailed by \(\mathcal{K}\), written \(\mathcal{K} \models \alpha\), iff every model \(\mathcal{I}\) of \(\mathcal{K}\) is a model for \(\alpha\).

Reasoning Services (1)

 

Concept Satisfiability

\[ \mathcal{K} \not \models C \equiv \bot \]

The problem of checking whether \(C\) is satisfiable w.r.t. \( \mathcal{K} \), i.e. whether there exists a model \(\mathcal{I}\) of \( \mathcal{K} \)  such that \(C^{\mathcal{I}} \neq \emptyset \).

Subsumption

\[ \mathcal{K} \models C \sqsubseteq D\]

The problem of checking whether \(C\) is subsumed by \(D\) w.r.t. \(\mathcal{K}\), i.e. whether \(C^{\mathcal{I}} \subseteq D ^{\mathcal{I}}  \) in every model \(\mathcal{I}\) of  \(\mathcal{K}\) .

Satisfiability (Consistency)

\[ \mathcal{K} \not \models \top \sqsubseteq \bot\]

The problem of checking whether \(\mathcal{K}\) is consistent, i.e. whether it has a model .

Reasoning Services (2)

Instance Checking

\[ \mathcal{K} \models C(a)\]

The problem of checking whether the assertion \(C(a)\) is satisfied w.r.t. \(\mathcal{K}\), i.e. whether \(a^{\mathcal{I}} \in C^{\mathcal{I}}\) in every model \(\mathcal{I}\) of \(\mathcal{K}\).

Retrieval

\[\{a | \mathcal{K} \models C(a)\}\]

The problem of finding all individuals \(a\) which belong to concept \(C\) w.r.t. \(K\), i.e. find all \(a\) for a given \(C\) such that \(a^{\mathcal{I}}\in C^{\mathcal{I}}\) in every model \(\mathcal{I}\) of \(\mathcal{K}\).

Realization

\[\{C | \mathcal{K} \models C(a)\}\]

The problem of finding all named classes \(C\) which an indivdual \(a\) belongs to w.r.t. \(K\), i.e. find all \(C\) for a given \(a\) such that \(a^{\mathcal{I}}\in C^{\mathcal{I}}\) in every model \(\mathcal{I}\) of \(\mathcal{K}\).

Reasoning Services (3)

We can reduce all services to satisfiability check:

Concept Satisfiability

\(K \not \models C \equiv \bot \longleftrightarrow\) exists a \(x\) such that \(K \cup \{C(x)\}\) is satisfiable.

Subsumption

\(K \models C \sqsubseteq D \longleftrightarrow K \cup \{C \sqcap \neg D(x)\}\) is unsatisfiable.

Instance Check

\(K \models C(a) \longleftrightarrow K \cup \{\neg C(a)\}\) is unsatisfiable.

Tableau Algorithm

How can we proove the satisfiability of a concept?

(Remember: A concept is satisfiable, if there exists a model \(\mathcal{I}\) satisfiying it.) 

We need a constructive decision procedure for constructing models.

\(\longrightarrow\)  Tableau Algorithm  

Proof procedure:

  • transform a given concept into Negation Normal Form (NNF)
  • apply completion rules in arbitrary order as long as possible
  • the concept is satisfiable if, and only if, a clash-free tableau can be derived to which no completion rule is applicable

Sample Ontology for Tableau Algorithm

TBox \(\mathcal{T}\) :

\( \begin{align} \text{Man} &\equiv \neg \text{Woman} \sqcap \text{Person}\\ \text{Woman} &\sqsubseteq \text{Person}\\ \text{Mother} &\equiv \text{Woman} \sqcap \exists \text{hasChild}.\top \\ \end{align} \)

ABox \(\mathcal{A}\) :

\( \begin{align} \text{Man(STEPHEN)}\\ \neg\text{Man(MONICA)}\\ \text{Woman(JESSICA)}\\ \text{hasChild(STEPHEN, JESSICA)}\\ \end{align} \)

Let \(\mathcal{I}\) be an interpretation with:

\( \begin{align} \text{Man}^\mathcal{I}&=\{STEPHEN\}\\ \text{Woman}^\mathcal{I}&=\{JESSICA, MONICA\}\\ \text{Mother}^\mathcal{I}&=\{MONICA\}\\ \text{Person}^\mathcal{I}&=\{JESSICA, MONICA, STEPHEN\}\\ \text{hasChild}^\mathcal{I}&=\{\langle MONICA, STEPHEN \rangle, \langle STEPHEN, JESSICA \rangle\}\\ \end{align} \)

then it holds that

\( \mathcal{I} \models \mathcal{T} \text{ and } \mathcal{I}\models \mathcal{A} \)

Negation Normal Form

A concept is in Negation Normal Form (NNF) if all occurences of negations in it are in front of atomic concepts.

Every \(\mathcal{ALC}\) concept can be transformed into an equivalent one in NNF using the following rules:

\[ \begin{aligned} NNF(C) &= C, \text{ if } C \text{ is atomic }\\ NNF(\neg C) &= \neg C, \text{ if } C \text{ is atomic}\\ NNF(\neg \neg C) &= NNF(C) \\ NNF(C \sqcup D) &= NNF(C) \sqcup NNF(D) \\ NNF(C \sqcap D) &= NNF(C) \sqcap NNF(D) \\ NNF(\neg(C \sqcup D)) &= NNF(\neg C) \sqcap NNF(\neg D) \\ NNF(\neg(C \sqcap D)) &= NNF(\neg C) \sqcup NNF(\neg D) \\ NNF(\forall R.C) &= \forall R.NNF(C) \\ NNF(\exists R.C) &= \exists R.NNF(C) \\ NNF(\neg \forall R.C) &= \exists R.NNF(\neg C) \\ NNF(\neg \exists R.C) &= \forall R.NNF(\neg C) \\ \end{aligned} \]

Negation Normal Form - Example

Transform the concept

\[\neg (\neg (A \sqcup \neg B) \sqcap \neg C))\]

to an equivalent concept in negation normal form:

\[ \begin{aligned} &NNF(\neg (\neg (A \sqcup \neg B) \sqcap \neg C))\\ &= NNF(\neg \neg (A \sqcup \neg B)) \sqcup NNF(\neg \neg C)\\ &= NNF(A \sqcup \neg B) \sqcup NNF(C)\\ &= NNF(A \sqcup \neg B) \sqcup C\\ &= NNF(A) \sqcup NNF(\neg B) \sqcup C\\ &= A \sqcup \neg B \sqcup C\\ \end{aligned} \]

Tableau algorithm for \(\mathcal{ALC}\) concept satisfiability

A tableau (completion graph) for a \(\mathcal{ALC}\) concept is a labeled oriented graph \(G=\langle V,E,L\rangle\), where

  • each node \(x \in V\) is labeled with a set \(L(x)\) of concepts, and
  • each edge \(\langle x,y \rangle \in E\) is labeled with a set \(L(\langle x,y \rangle\)) of roles.

A completion graph \(G\)

  • contains a clash, if \(\{A,\neg A\} \in L(x)\) for some atomic concept \(A\), or \(\bot \in L(x)\), or \(\neg \top \in L(x)\)
  • is complete, if no completion rule can be applied on it.

Completion Rules for \(\mathcal{ALC}\) concept satisfiability

\(\sqcap\)-rule if \( C \sqcap D \in L(v), \text{ for some } v \in V \text{ and } \{C, D\} \not \subseteq L(v) \)
then \(L(v):=L(v) \cup \{C, D\} \)
\(\sqcup\)-rule if \( C \sqcup D \in L(v), \text{ for some } v \in V \text{ and } \{C, D\} \cap L(v) = \emptyset \)
then \( \text{choose } X \in \{C, D\} \text{ and let } L(v) := L(v) \cup \{X\} \)
\(\exists\)-rule if \( \exists r.C \in L(v), \text{ for some } v \in V, \text{ and there is no } r\text{-successor } \) \( v' \text{ of } v \text{ such that } C \in L(v') \)
then \(V:= V \cup \{v'\} , E:=E \cup \{ \langle v,v' \rangle \}, L(v'):=\{C\}\) \( \text{ and } L(\langle v,v' \rangle):=\{r\}\) \(\text{ for a new vertex } v'\)
\(\forall\)-rule if \( v,v' \in V, v' \text{ is } r\text{-successor of } v, \forall r.C \in L(v) \text{ and } C \not \in L(v') \)
then \( L(v'):=L(v') \cup {C} \)

Tableau algorithm for \(\mathcal{ALC}\) concept satisfiablity - Example 1

We check whether \(C=(A \sqcap \neg A) \sqcup B\) is satisfiable. It is in NNF, so we can directly apply the tableau algorithm to

\[(A \sqcap \neg A) \sqcup B.\]

The only rule applicable is \(\sqcup-\text{rule}\). We have two possibilities. Firstly, we can try

\[L(x)=\{C, A \sqcap \neg A\}.\]

Then we can apply \(\sqcap-\text{rule}\) and obtain

\[L(x)=\{C, A \sqcap \neg A, A, \neg A\}.\]

We have obtained a clash, thus this choice was unsuccessful. Secondly, we can try

\[L(x)=\{C, B\}.\]

No more rule is applicable and we obtained no clash. Thus, \((A \sqcap \neg A) \sqcup B\) is satisfiable.

A model \(\mathcal{I}\) satisfiying it is given by

\[\Delta^\mathcal{I}=\{x\}, A^\mathcal{I}=\emptyset, B^\mathcal{I}=\{x\}.\]

Tableau algorithm for \(\mathcal{ALC}\) concept satisfiablity - Example 2

We check whether \(C=A \sqcap \exists r.B \sqcap \forall r.\neg B\) is satisfiable. It is in NNF, so we can directly apply the tableau algorithm to

\[C=A \sqcap \exists r.B \sqcap \forall r.\neg B.\]

An application of \(\sqcap-\text{rule}\) gives

\[L(x)=\{C, A, \exists r.B, \forall r.\neg B\}.\]

An application of \(\exists-\text{rule}\) gives

\(\begin{align*} L(x)&=\{C, A, \exists r.B, \forall r.\neg B\}\\ L(y)&=\{B\}\\ L(\langle x,y \rangle)&=\{r\} \end{align*} \)

An application of \(\forall-\text{rule}\) gives

\(\begin{aligned} L(x)&=\{C, A, \exists r.B, \forall r.\neg B\}\\ L(y)&=\{B, \neg B\}\\ L(\langle x,y \rangle)&=\{r\} \end{aligned} \)

We obtained a clash and no other choices are possible. Thus, \(A \sqcap \exists r.B \sqcap \forall r.\neg B\) is unsatisfiable and there exists no model.

Tableau algorithm for \(\mathcal{ALC}\) TBoxes

We extend the tableau algorithm to check satisfiability of \(\mathcal{ALC}\) TBoxes
  • An \(\mathcal{ALC}\) TBox contains only axioms (GCIs) of form \(C \sqsubseteq D\) (Note that axioms of form \(C \equiv D\) can be rewritten as \(C \sqsubseteq D\) and \(D \sqsubseteq C\))
  • Every GCI is equivalent to \(\top \sqsubseteq \neg C \sqcup D\)

We can internalize the whole TBox into a single axiom:
\[\mathcal{T}=\{C_i \sqsubseteq D_i | 1 \leq i \leq n\}\]

is equivalent to
\[ \top \sqsubseteq \underset{1 \leq i \leq n}{{\LARGE\sqcap}} \neg C_i \sqcup D_i \]

Let \(C_\mathcal{T}\) be the concept on the right side of the GCI, then an additional rule is

\(\mathcal{T}\)-rule if \( C_\mathcal{T} \,\,\not \in L(v), \text{ for some } v \in V\)
then \( L(v):=L(v) \cup \{C_\mathcal{T}\} \)

Tableau algorithm for \(\mathcal{ALC}\) TBoxes - Example

Suppose we have a TBox \(\mathcal{T}=\{A \sqsubseteq \exists r.A\}\) and want to check whether the concept \(A\) is satisfiable.

We start with

\[L(x)=\{A\}\]

The only rule applicable is \(\mathcal{T}-\text{rule}\), thus we obtain

\[L(x)=\{A, \neg A \sqcup \exists r.A\}\]

After applying \(\sqcup-\text{rule}\), the first choice leads to a clash, thus we use the second part of the disjuction and get

\[L(x)=\{A, \neg A \sqcup \exists r.A, \exists r.A\}\]

We can apply the \(\exists-\text{rule}\) and get

\( \begin{align*} L(x)&=\{A, \neg A \sqcup \exists r.A, \exists r.A\}\\ L(y)&=\{A\} \end{align*} \)

At this point, we could run the same procedure on \(y\), thus the algorithm would never terminate.

Solution: We need to discover cycles \(\Longrightarrow\) Blocking

Blocking for \(\mathcal{ALC}\)

Goal: Ensure termination of the tableau algorithm.
Solution: Detect cycles that might occur due to application of the \(\mathcal{T}-\text{rule}\).
Result: Completion graph is always finite.
 
 
 
 

Blocking:

A node \(v' \in V\) is directly blocked by a node \(v \in V\), if

  1. \(v\) is ancestor of \(v'\)
  2. \(L(v') \subseteq L(v)\)
  3. there is no directly blocked node \(v^{''}\), such that \(v''\) is ancestor of \(v\)
     

A node \(v'\) is blocked, if either

  1. \(v'\) is directly blocked, or
  2. there is a directly blocked node \(v\) which is ancestor of \(v'\)

Tableau algorithm for \(\mathcal{ALC}\) TBoxes With Blocking - Example

Suppose we have a TBox \(\mathcal{T}=\{A \sqsubseteq \exists r.A\}\) and want to check whether the concept \(A\) is satisfiable.

We obtain the clash-free tableau

\( \begin{align*} L(x)&=\{A, \neg A \sqcup \exists r.A, \exists r.A\}\\ L(y)&=\{A, \neg A \sqcup \exists r.A, \exists r.A\} \end{align*} \)

wherein \(y\) is directly blocked by \(x\).

We can get a finite model by taking into account that

  • blocked nodes do not represent elements in the model
  • an edge from a node \(v\) to a directly blocked node \(v'\) is represented in the model as "edge" from \(v\) to the node which directly blocks \(v'\) 

For our example we get

\[ \Delta^\mathcal{I}=\{x\}, A^\mathcal{I}=\{x\}, r^\mathcal{I}=\{\langle x,x \rangle\} \]

Tableau algorithm for \(\mathcal{ALC}\) TBoxes With Blocking - (In)finite Models

The TBox \(\{\mathit{Guard}\sqsubseteq \exists \mathit{shields}.\mathit{Guard}\}\) has a finite model (see previous slides).

What about the following TBox?

\[ \begin{align*}\mathit{Guard}&\sqsubseteq \exists \mathit{shields}.\mathit{Guard} \sqcap \leq 1 \mathit{shields}^-\\ \mathit{FirstGuard}&\sqsubseteq \mathit{Guard} \sqcap \leq 0 \mathit{shields}^- \end{align*} \]

The existence of a FirstGuard forces the existence of an infinite sequence of guards, each one shielding the next.

I.e. there exist TBoxes that do not have finite models.

Complexity of DL Reasoning

http://www.cs.man.ac.uk/~ezolin/dl/: configure your DL and learn how complex standard reasoning tasks are

Summary

  • Semantics of OWL is based on Description Logics
  • Description Logics are decidable fragments of First Order Logic
  • Deductive reasoning in OWL is possible
  • The tableau algorithm is one of the most common reasoning procedures for OWL

TA-Description Logic

Translate the following statement to Description logic:

a) SubClassOf(Prof Union(intersectionOf(Person University member) intersectionOf(Person Not(Doktorand))))

b)proof the following query using Tableau algorithm

SubClassOf(Professor Person)

TA-Propositional Logic

Tableaux Algorithm (Propositional Logic)

Example 1:           proof the following statement: ((q r) (¬q r))

(1)                      negation:                                           ¬((q r ) (¬q r))

(2)                                                              

                          (q r)  

(3)                           ¬ (¬q r) = q ∧ ¬r

(4)                           q

(5)                            r      

(6)                                        q

(7)                          ¬r

TA-Description Logic

Having the following Knowledge Base, proof the example query:

knowledge Base W: {¬P (E U) (E ∩¬D), (P ∩ ¬E)(a) Proof the following query:           (P ∩ ¬E)(a) (from knowledge base)

   P (a)

 ¬E (a)

  (¬   P (E U) (E ∩¬D)) (a) (from knowledge base)

            ¬P (a)   |  ((EU) ((E ∩ ¬D))(a)

             (E U) (a) | (E ∩¬D) (a)

              E (a)              

              E (a)

             U (a)

            ¬D (a)

Required material

Learning Objectives

  • Rule languages ​​in the Semantic Web
  • Relationship between OWL and rules

The limits of OWL

Description logic concepts are insufficient as query language:
  •  “Which pairs of individuals have a common parent?” 
  •  “Which people live with one of their parents?” 
  •  “Which pairs of (direct or indirect) descendants are there?” 
Relevant information cannot be represented in OWL ontology 
  • \(\forall x . \forall y . \forall z . \, \mathsf{brother} (y, z) \wedge \mathsf{father} (x, y) \to \mathsf{uncle} ( x, z) \) (works in OWL 2, with tricks)
  • \(\forall x. \, \mathsf{love} (x, x) \to \mathsf{narcissist} (x) \) (works in OWL 2)
OWL unsuitable for programming:
  • OWL is decidable: it can generally not express everything programmable (halting problem).
  • OWL is not “processed”, it is not procedural: Certain (built-in) extensions are difficult to implement.

1/4: Logical rules

  • Implications in predicate logic
  • For example: \[F\to G \;\;\; (\equiv\;\neg F \vee G)\]
  • Logical extension of the knowledge base → static  
  • Open World
  • Declarative (descriptive)

2/4: Procedural Rules

  • e.g. Production Rules
  • "If X then Y else Z"
  • Machine-executable instructions  → dynamic
  • Operational (meaning = effect in execution)

3/4: Logic programming

  • e.g. Prolog, F-logic
  • man(X) <- person(X) AND NOT woman(X)
    
  • Approximation of logical semantics with operational aspects, built-ins possible
  • often closed world
  • “Semi-declarative” 

4/4: Inference rules of a calculus

  • e.g. rules for RDF semantics
  • rules not part of the knowledge base, “meta-rules” 
  • not a subject of this lecture

Which rule language to choose?

Rule languages ​​are hardly compatible with each other!
→ choice of appropriate rule language is important

Possible criteria:
  • Clear specification of syntax and semantics?
  • Software tool support?
  • What expressivity do I need?
  • Complexity of the implementation? Performance?
  • Compatibility with existing languages such as OWL?
  • Declarative (description) or operational (programming) semantics?
  • ...

Summary of different rule language approaches

Logical rules (implications in predicate logic):
  • clearly defined, comprehensively researched, well-understood
  • highly compatible with OWL DL and RDF
  • cannot be decided without restrictions
Procedural rules (e.g. production rules):
  • many independent approaches, often only vaguely defined
  • Often used like programming languages, unclear relationship to OWL and RDF
  • efficient processing possible
Logic programming (e.g. Prolog, F-logic):
  • clearly defined, but many different approaches
  • partially compatible with OWL and RDF
  • Decidability and computational complexity strongly depends on the selected approach
Main topic of this lecture: predicate logic rules
(which are also the basis of logic programming)

Predicate logic as a rule language

  • Rules as implication formulas of predicate logic: \[\underbrace{A_1 \wedge A_2\wedge \ldots\wedge A_n}_{\textrm{Body}} \to \underbrace{H_{}}_{\mathrm{Head}}\] → Semantically equivalent to disjunction: \[ H\vee \neg A_1 \vee\neg A_2\vee \ldots\vee\neg A_n\]
  • Constants, variables, and function symbols allowed
  • Quantifiers for variables are often omitted:
    Understood as universally quantified variables (i.e. rule applies to all assignments)
  • Disjunction with several non-negated atoms
    → disjunctive rule: \[ \underbrace{A_1 \wedge A_2\wedge \ldots\wedge A_n}_{\textrm{Body}} \to \underbrace{H_1 \vee H_2 \vee \ldots\vee H_m}_{\mathrm{Head}}\]

Types of rules

Types of “rules” of predicate logic:
  • Clause: disjunction of atomic propositions or negated atomic propositions
  • Horn clause: clause with at most one non-negated atom
  • Definite clause: clause with exactly one non-negated atom
  • Fact: clause of a single non-negated atom
Examples:
  • Clause: \[\mathsf{Person}(x) \;\to\;\mathsf{Woman}(x) \vee \mathsf{Man}(x)\]
  • Definite clause: \[\mathsf{Man}(x) \wedge \mathsf{hasChild}(x,y) \;\to\;\mathsf{Father}(x)\]
  • Function symbol: \[\mathsf{hasBrother}(\mathsf{mother}(x),y) \;\to\;\mathsf{hasUncle}(x,y)\]
  • Horn clause (integrity constraint): \[\mathsf{Man}(x) \wedge \mathsf{Woman}(x) \;\to\;\]
    Think “\( \neg (\mathsf{Man}(x) \wedge \mathsf{Woman}(x)) \)”
  • Fact: \[\mathsf{Woman}(\mathsf{gisela})\]

Datalog

Restriction to Horn rules without function symbols

→ Datalog rules

Datalog

  • logical rule language, originally basis for deductive databases
  • Knowledge bases (“Datalog Programs”) of Horn clauses without function symbols
  • decidable
  • efficient for large amounts of data, overall complexity same as OWL Lite profile of OWL 1 (EXPTIME)

Semantics of rules

Standard semantics of predicate logic!
  • well known and well understood semantics
  • compatible with other predicate logic approaches (e.g. description logic)

Semantics of Datalog

Semantics defined by using logical models:
  • Interpretation of \(\mathcal{I} \) with domain \(\Delta_ {\mathcal{I}} \)
  • Evaluation of variables: variable assignment \(\mathcal{Z} \) (mapping variables to \(\Delta_ {\mathcal{I}} \))
  • Interpretation of terms and formulas under \(\mathcal {I} \) (and \(\mathcal {Z} \)):
    • Interpretation of a constant: \(a ^ {\mathcal{I}, \mathcal{Z}} = a ^ {\mathcal {I}} \in \Delta_ {\mathcal {I}} \)
    • Interpretation of a variable: \(x ^ {\mathcal{I}, \mathcal {Z}} = \mathcal{N} (x) \in \Delta_{\mathcal{I}} \)
    • Interpretation of an n-ary predicate: \(p ^ {\mathcal{I}} \in \Delta_{\mathcal{I}} ^ n \)
    • \(\mathcal{I}, \mathcal{Z} \models p (t_1, \ldots, t_n) \) if and only if \((t_1 ^ {\mathcal{I} \mathcal{N}}, \ldots , t_n ^ {\mathcal {I}, \mathcal{Z}}) \in p ^ {\mathcal{I}} \)
    • \(\mathcal{I} \models B \to H \) iff for each variable assignment \(\mathcal{Z} \) is either \(\mathcal{I}, \mathcal{Z} \models H \) or \(\mathcal{I}, \mathcal{Z} \not \models B \).
  • \(\mathcal{I} \) is a model for a rule set, if and only if \(\mathcal{I} \models B \to H \) for all rules \(B \to H \) in this set 

Datalog in practice

Datalog in practice:
  • Various implementations available
  • Adjustments for the Semantic Web: data types from XML Schema, URIs/IRIs
Extensions of Datalog:
  • disjunctive Datalog allows disjunctions in heads
  • non-monotonic negation (no predicate logic semantics)
  • Integration of information from OWL ontologies (e.g. dl-programs, dlvhex)
    → loose coupling of OWL and Datalog (no common predicate logic semantics)

How can we combine OWL DL and Datalog?

SWRL – “Semantic Web Rule Language” 

  • Proposed extension for OWL with rules
  • Idea: Datalog rules with connection to OWL ontology
  • Symbols in rules can be OWL identifiers or new Datalog identifiers
  • Additional built-ins to process data types
  • several syntactic representations

Semantics of SWRL

OWL DL (Description Logic) and Datalog use the same interpretations:

  • OWL individuals are Datalog constants
  • OWL classes are unary Datalog predicates
  • OWL roles are binary Datalog predicates

→ An interpretation \( \mathcal{I} \) can simultaneously be a model for an OWL ontology and a set of Datalog rules

→ Inferences on OWL-Datalog combination possible

Example

Combined SWRL knowledge base (Datalog + description logic):

  1. Vegetarian(x) ∧ FishProduct(y) → doesNotLike(x,y)
  2. ordered(x,y) ∧ doesNotLike(x,y) → Unhappy(x)
  3. ordered(x,y) → Food(y)
  4. doesNotLike(x,z) ∧ Food(y) ∧ includes(y,z) → doesNotLike(x,y)
  5. → Vegetarian(Markus)
  6. Happy(x) ∧ Unhappy(x) →
  7. ∃ordered.ThaiCurry(Markus)
  8. ThaiCurry ⊑ ∃ includes.FishProduct 
We can conclude: Unhappy(Markus)
(Note: many of the above rules can actually be expressed as description logic axioms – not always intuitively, though.)

How hard is SWRL

  • Reasoning in OWL 1 DL is NEXPTIME-complete.
  • Reasoning in OWL 2 DL is N2EXPTIME-complete.
  • Reasoning in Datalog is EXPTIME-complete.

→ How hard is logical reasoning in SWRL?

Logical reasoning in SWRL is undecidable
(For OWL 1 and thus also for OWL 2).

Undecidability of SWRL

SWRL is undecidable

There is no algorithm that can draw all logical conclusions from all possible SWRL knowledge bases, even if any (finite) amount of computing time and memory is available.

 In practice, however, the following are possible:

  1. Algorithms draw all inferences from a part of all SWRL knowledge bases
  2. Algorithms that draw from all SWRL knowledge bases a part of the conclusions

Both are trivially possible if the appropriate "part" is sufficiently small.

Description Logic Rules

Observation
Some SWRL rules can be expressed already in OWL 2 (i.e. the description logic \( \mathcal{SROIQ} \)).

  • Identification of these Description Logic Rules provides a decidable fragment of SWRL
  • Goal: Use “hidden” expressiveness of OWL 2
  • Implementation directly by OWL 2 tools

SROIQ (in addition to red = SHOIN)

Class expressions
Class name A, B
Conjunction
CD
Disjunction
CD
Negation ¬ C
Exist. Rollenrestr. ∃ R.C
Univ. Rollenrestr. ∀ R.C
Self ∃ S.Self
Greater than ≥ n S.C
Less-than ≤ n S.C
Nominal {A}
Roles
Role names R, S, T
simple roles S, T
Inverse roles R -
Universal role U
Tbox (class axioms)
Inclusion
CD
Equivalence C ≡ D
Rbox (role axioms)
Inclusion
R1R2
Eneral. Incl. R1()Rn()R)
Transitivity Tra(R)
Symmetry Sym(R)
Reflexivity Ref(R)
Irreflexivity Irr(S)
Disjoint Dis(S, T)
Abox (facts)
Class membership C(a)
Role relationship R(a, b)
Neg. Role relationship ¬ S(a, b)
Equality a ≈ b
Inequality
a b

Simple rules with SROIQ

All SROIQ axioms can be written as SWRL rules:
  • \( C \sqsubseteq D \) is C(x) → D(x) 
  • \( R \sqsubseteq S \) is R(x, y) → S(x, y) 
Some classes can be “dismantled” within rules:
  • Happy \(\sqcap\) Unhappy \(\sqsubseteq\) ⊥     corresponds to
    Happy(x) ∧ Unhappy(x) →
  • ∃placeOfResidence.∃liesIn.EUCountry \(\sqsubseteq\) EUCitizen      corresponds to
    placeOfResidence(x,y) ∧ liesIn(y,z) ∧ EUCountry(z) → EUCitizen(x)
SROIQ-role axioms provide additional rules:
  • hasMother ◦ hasBrother \(\sqsubseteq\) hasUncle 
    corresponds to
    hasMother(x,y) ∧ hasBrother(y,z) → hasUncle(x,z)

More rules

What about
doesNotLike(x,z) ∧ food(y) ∧ includes(y,z) → doesNotLike(x,y)?

  • Rule head with two variables → not representable by subclass axiom
  • Rule body contains class expressions → not representable by subproperty axiom

Nevertheless, this rule can be represented in OWL 2!

More rules (II)

Simpler example: Man(x) ∧ hasChild(x,y) → fatherOf(x,y)
Idea
Replace Man(x) by role atom, so that the rule is representable as a general role inclusion with ◦.
Trick: with ∃R.Self we can convert classes into roles:
  • Auxiliary role RMan
  • Auxiliary axiom Man ≡ ∃RMan.Self
  • Intuition: “Men are the very things that have an RMan relationship with themselves.” 
With this auxiliary axiom the rule can be written as:
RMan ◦ hasChild \(\sqsubseteq \) fatherOf

More rules (III)

Example:
doesNotLike(x,z) ∧ Food(y) ∧ includes(y,z)→doesNotLike(x,y)
becomes 

\[Food \equiv \exists R_{Food}.\mathsf{Self}\]

\[doesNotLike \circ includes^{-} \circ R_{Food} \sqsubseteq doesNotLike\]

More rules (IV)

Not so simple:
Vegetarian(x) ∧ FishProduct(y) → doesNotLike(x,y)
Idea
Connect disjointed parts in the rule body by universal role U.
  • Auxiliary roles: RVegetarian and RFishProduct
  • Auxiliary axioms: Vegetarian ≡ ∃RVegetarian.Self and FishProduct ≡ ∃RFishProduct.Self
With these auxiliary axioms the rule can be written as:
RVegetarianURFishProductdoesNotLike

The boundaries of Description Logic Rules

Not all SWRL rules can be represented as description logic axioms!

Example:
ordered(x,y)∧doesNotLike(x,y) → Unhappy(x)
can not be represented in SROIQ.

Possible transformations in the rule body at a glance
  • Reverse roles, for example contains(y,z) → contains(z,y)
  • “Rolling up” side arms, e.g. 
    liesIn(y,z) ∧ EUCountry(z) → ∃liesIn.EUCountry(y)
  • Replace concepts through roles, e.g. Man(x) → RMan(x,x)
  • Convert chains into role inclusions (replace ∧ by ◦)

Description Logic Rules: Definition

Preparation: normalizing rules
  • For each occurrence of a constant a in the rule:
    Add in the body {a}(x) with a new variable x and replace the occurrence of a by x.
  • Replace each atom R(x, x) by ∃R.Self(x).
Dependency graph of a rule: undirected graph with
  • Nodes = variables of the rule
  • Edges = role atoms of the rule body (without direction!)
A SWRL rule is a Description Logic Rule if:
  1. all atoms use SROIQ concepts and roles,
  2. the dependency graph of the normalized rule has no cycles

Example

DL Rules in the SWRL knowledge base of the earlier example: 
  1. Vegetarian(x) ∧ FishProduct(y) → doesNotLike(x,y)
  2. ordered(x,y) → Food(y)
  3. doesNotLike(x,z) ∧ Food(y) ∧ includes(y,z) → doesNotLike(x,y)
  4. → Vegetarian(Markus)
  5. Happy(x) ∧ Unhappy(x) →
Rule (2) ordered(x,y) ∧ doesNotLike(x,y) → Unhappy(x) is not a DL rule

Note: After conversion to \( \mathcal{SROIQ} \) description logic rules must meet the conditions of simple roles and regular Rboxes!
 

Conversion of DL rules to SROIQ (I)

Input: A Description Logic Rule
  1. Normalizing the rule.
  2. For each pair of variables x and y :
    Are x and y not connected in the dependency graph, i.e. there is no path between x and y , then add to the body U(x,y).
  3. The rule head is now of the form D(z) or S(z,z') .
    For each atom of R(x,y) in the body:
    If the path in the dependency graph from z to y is shorter than that from z to x, replace R(x,y) with R(y,x) .
  4. If the body contains an atom R(x,y), so that y does not occurs in any other binary atom of the rule:
    • If the body contains n unary atoms C1(y),...,Cn(y) then define \(E:  = C_1 \sqcap \ldots \sqcap C_n \) and remove C1(y),...,Cn(y) from the body. Otherwise define \(E: = \top \).
    • Replace R(x,y) by ∃R.E(x) 
    Repeat step 4 as long as there are such R(x,y).

Conversion of DL rules to SROIQ (II)

The rule can now be expressed in SROIQ:
  • If the rule head is unary, the rule has the form: 
    C1(x) ∧ ... ∧ Cn(x) → D(x) .
    Replace with \(C_1 \sqcap \ldots \sqcap C_n \sqsubseteq D\).
  • If the head is binary, then
    • For each unary Atom C(z) in the body:
      Create a new axiom C ≡ ∃RC.Self (the role RC is new)
      and replace C(z) by RC(z,z) .
    • The rule now has the form
      R1(x,x2) ∧ ... ∧ Rn(xn,y) → S(x,y) .
      Replace with \(R_1 \circ \ldots \circ R_n \sqsubseteq S \).
This transformation of SWRL rules in a knowledge base does not change its satisfiability.

Rule Interchange Format (RIF)

  • adopted as a W3C standard on 22 June 2010 
  • Focus is on the rules exchange - not a format for all standard languages
  • single language can not meet different requirements and needs for different paradigms of rules' usage
  • also known as family of languges or dialects
  • RIF is uniform and extensible

RIF dialects

Focus is on two kinds of dialects:

  1. logic-based (e.g. predicate logic, logic programming)
  2. "Rules with actions" (eg production rules)

RIF provides framework for defining your own dialects

RIF is compatible with RDF and OWL:

  • can be combined with semantic OWL / RDF
  • RDF syntax for RIF is available

RIF documents

Document Description
RIF-BLD:
The Basic Logic Dialect
definite Horn clauses, standard predicate logic semantics
RIF-PRD:
The Production Rule Dialect
covers the wide range of production control systems
RIF Core:
The Core Dialect
enables communication between control systems with logic rules and production rules
RIF-FLD:
The Framework for Logic Dialect
logical extensional framework to minimize efforts for definition of new logical dialects
RIF-RDF + OWL:
RDF and OWL Compatibility
Combination of RIF with RDF or OWL
RIF-DTB:
Datatypes and build-ins
contains data types, functions and predicates for RIF dialects
RIF + XML data: specifies how RIF can be combined with XML data sources (import, semantics)
RIF OWLRL:
OWL 2 RL in RIF
Axiomatization of OWL 2 RL in RIF
RIF RDF reversible mapping of RDF to RIF
RIF-UCR:
Use cases and requirements
Collection of use cases
RIF test:
Test Cases
Conformance testing for RIF implementations

RIF Core

is the easiest RIF dialect

A core document consists of:

  • Directives like import of URI sprefixes setting
  • a sequence of logical conclusions

RIF Core Example

Document( 
Prefix(cpt http://example.com/concepts#)
Prefix(person http://example.com/people#)
Prefix(isbn http://.../isbn/)
Group
(
Forall ?Buyer ?Book ?Seller (
cpt:buy(?Buyer ?Book ?Seller) :− cpt:sell(?Seller ?Book ?Buyer)
)
cpt:sell(person:John isbn:000651409X person:Mary)
)
)
From this can be derived the following relationship:
cpt:buy(person:Mary isbn:00065409X person:John)

Expressiveness of RIF Core

  • Datalog as a basis
  • contains intersection of RIF-BLD (Basic Logic Dialect) and RIF-PRD (Production Rule Dialect)
  • some extensions: Data Types (RIF-DTB), IRI
  • Forward-chaining is possible

Combination of RDF + RIF

Typical scenario:

  • the application data are available in RDF
  • the rules for the data are described by RIF
  • A RIF processor creates new relationships

RIF is compatible with RDF:

  • RDF triples are representable in RIF

Example in Turtle-based syntax

{ 
?x  rdf:type       p:Novel ;
p:page_number  ?n ;
p: price       [
p:currency  :Euro ;
rdf:value   ?z
] .
?n > "500" ^^xsd:integer .
?z < " 20.0 " ^^xsd:double .
}
=>
{ <me>  p:buys  ?x }

The same with RIF Presentation Syntax

Document ( 
Prefix ...
Group (
Forall ?x  ?n  ?z (
<me> [p:buys−>?x ] :− And (
?x  rdf:type  p:Novel
?x[p:page_number−>?n p:price−>_abc]
_abc [p:currency−>:Euro rdf:value−>?z]
External(pred:numeric−greater−than(?n "500"^^xsd:integer))
External(pred:numeric−less−than(?z "20.0"^^xsd:double))
)
)
)
)

Discover new relationships ...

Forall ?x  ?n  ?z ( 
  [p:buys−>?x] :− And(
    ?x  rdf:type  p:Novel
    ?x[p:page_number−>?n p:price−>_abc]
    _abc[p:currency −>:Euro rdf:value−>?z ]
    External(pred:numeric−greater−than(?n "500"^^xsd:integer))
    External(pred:numeric−less−than(?z "20.0"^^xsd:double))
  )
)
in combination with:
<http://.../isbn/...>  a              p:Novel ;
                       p:page_number  "600"^^xsd:integer ;
                       p: price      [
                           rdf:value   "15.0"^^xsd:double ;
                           p:currency  :Euro
                       ] .
results in:
<me>  p:buys  <http://...isbn/...> .

What's with OWL 2 RL?

OWL 2 RL stands for OWL Rule Language

OWL 2 RL is the intersection of RIF Core and OWL

  • Inferences in OWL RL can be expressed with RIF rules
  • RIF Core engines can behave like OWL RL engines
    • as described in the document RIF - OWL 2 RL can be processed by OWLRL directly in RIF

Outlook: combination of RIF and SPARQL 1.1

Exercise

Convert the following rule into SROIQ axioms:
worksIn(w,x) ∧ employment(w,PERMANENT) ∧ Uni(x) ∧ PhDStudent(y) ∧ supervisedBy(y,w) → professorOf(w,y)
Next step:
Normalizing the rule.

Exercise

Convert the following rule into SROIQ axioms:
worksIn(w,x) ∧ employment(w,z) ∧ {PERMANENT}(z) ∧ Uni(x) ∧ PhDStudent(y) ∧ supervisedBy(y,w) → professorOf(w,y)
Next step:
For each pair of variables x and y: If x and y are not connected in the dependency graph, i.e. there is no path between x and y, then add in the body U(x,y).

Exercise

Convert the following rule into SROIQ axioms:
worksIn(w,x) ∧ employment(w,z) ∧ {PERMANENT}(z) ∧ Uni(x) ∧ PhDStudent(y) ∧ supervisedBy(y,w) → professorOf(w,y)
Next step:
The rule head is now in the form D(z) or S(z,z0). For each atom of R(x,y) in the body: if in the dependency graph, the path from z to y is shorter than that from z to x, replace R(x,y) with R(y,x).

Exercise

Convert the following rule into SROIQ axioms:
worksIn(w,x) ∧ employment(w,z) ∧ {PERMANENT}(z) ∧ Uni(x) ∧ PhDStudent(y) ∧ supervisedBy(w,y) → professorOf(w,y)
Next step:
If the body contains an atom R(x, y), so that y occurs in no other binary atom of the rule:
  • If the body contains n unary atoms C1(y),...,Cn(y) then define \( E: C_1 = \sqcap \ldots \sqcap C_n \) and remove C1(y),...,Cn(y) from the body. Otherwise define \( E: = \top \).
  • Replace R(x,y) by ∃R.E(x) .
Repeat step 4 as long as there are such R(x,y).

Exercise

Convert the following rule into SROIQ axioms:
∃worksIn.Uni(w) ∧ ∃employment.{PERMANENT}(w) ∧ PhDStudent(y) ∧ supervisedBy−(w,y) → professorOf(w,y)
Next step:
For each unary atom C(z) in the body:
Create a new axiom C ≡ ∃RC.Self (the role RC is new) and replace C(z) by RC(z,z) .

Exercise

Convert the following rule into SROIQ axioms:
∃R1.Self ≡ ∃worksIn.Uni
∃R2.Self ≡ ∃employment.{PERMANENT}
∃R3.Self ≡ PhDStudent

R1(w,w) ∧ R2(w,w) ∧ R3(y,y) ∧ supervisedBy−(w,y) → professorOf(w,y)
Next step:
The rule now has the form R1(x,x2) ∧ ... ∧ Rn(xn,y) → S(x,y) .
Replace the rule with \(R_1 \circ \ldots \circ R_n \sqsubseteq S \).

Exercise: Solution

\[ \exists R_1.Self \equiv \exists worksIn.Uni \]\[ \exists R_2.Self \equiv \exists employment . \{ PERMANENT \} \]\[ \exists R_3.Self \equiv PhDStudent \]
\[ R_1 \circ R_2 \circ supervisedBy^{-} \circ R_3 \sqsubseteq professorOf \]

Summary

Predicate logic rules extensions for OWL DL

  • Datalog as a well-known formalism 
  • Combination with OWL possible: SWRL
  • Semantic description by logical extension of OWL interpretation
  • SWRL is undecidable

Description Logic Rules

  • in OWL2 expressible SWRL fragment
  • indirect support through all OWL2 tools
  • definition and algorithm based on dependency graph

RIF (Rule Interchange Format)

  • W3C standard for exchanging rules
  • extensible family of languages

Also relevant:

  • SPARQL 1.1 entailment regime
  • conjunctive queries for OWL DL
  • DL-safe rules (variables can take only constants as values)

Mini Project

Describe with a Horn clause that authors of a joint article are co-authors. Use only the binary predicates coauthor and author. Is it a Description Logic Rule (please explain)? If yes, specify the Description Logic Rule.

Literature

Linked Data Stack - SPARQL

Query: SPARQL User Interface & Applications Trust Crypto Proof Unifying Logic Ontology: OWL Rules: RIF RDF-Schema Data Interchange: RDF XML URI Unicode

Outline

  • About SPARQL
  • Basic SPARQL
    • Presenation
    • Hands-on
  • SPARQL in real-life
    • Presenation
    • Hands-on
  • Advanced SPARQL
    • Presenation
    • Hands-on 

What is SPARQL?

SPARQL stands for “SPARQL Protocol and RDF Query Language”.

In addition to the language, W3C has also defined:

  • The SPARQL Protocol for RDF specification: it defines the remote protocol for issuing SPARQL queries and receiving the results.
  • The SPARQL Query Results XML Format specification: it defines an XML document format for representing the results of SPARQL

Query Languages for RDF and RDFS

There have been many proposals for RDF and RDFS query languages:

  • RDQL (http://www.w3.org/Submission/2004/SUBM-RDQL-20040109/)
  • ICS-FORTH RQL (http://139.91.183.30:9090/RDF/RQL/) and SeRQL (http://www.openrdf.org/doc/sesame/users/ch06.html)
  • SPARQL (http://www.w3.org/TR/rdf-sparql-query/)

In this course we will only cover SPARQL which is the current W3C recommendation for querying RDF data

SPARQL 1.1

Basic SPARQL Structures - Outline

  • A bit of RDF and Semantic Web
  • First glance at triple patterns
  • Components of a SPARQL query
    • Graph patterns
    • Types of queries
    • Modifiers

Triples

Triples are the statements about things (resources), using URIs and Literal values


Triples

Graph

Graphs with Uris

Prefixes

Vocabularies

  • Share concept of a domain
  • Utilize URIs as unique identifier
  • Define Properties and Classes, and more ....

Well known vocabularies

rdf : <http://www.w3.org/1999/02/22-rdf-syntax-ns#

rdfs : <http://www.w3.org/2000/01/rdf-schema#>

foaf : <http://xmlns.com/foaf/0.1/>

dbpedia :  <http://dbpedia.org/resource>

Triple Stores and SPARQL endpoints

  • A SPARQL endpoint exposes one or more Graphs
  • HTTP
  • expects a parameter "query", either with POST or GET with the encoded query
  • no required relation between graph name and endpoint name, but good practice
     

A simple Query

A simple Query

A slightly more complex Query

A slightly more complex Query



SELECT ?friend ?friendname WHERE { jwebsp:John foaf:knows ?friend. ?friend foaf:firstname ?friendname }

A slightly more complex Query

A slightly more complex Query

 

Structure of a SPARQL query

# prefix declarations
PREFIX ex: <http://example.com/resources/>
....
# query type            # projection                 # dataset definition
SELECT                        ?x ?y                             FROM ...

# graph pattern
WHERE {
    ?x a ?y
}

# query modifiers
ORDER BY ?y

Prefixes

Syntactical sugar to keep queries readable

Examples:

PREFIX  :          <http://example.com/base/>

PREFIX  foaf:     <http://xmlns.com/foaf/0.1/>

<http://xmlns.com/foaf/0.1/knows> == foaf:knows

<http://example.com/base/Tim> == :Tim

Query Types

SELECT 

  • returns a result table

ASK 

  • returns (boolean) true, if the pattern can be matched

CONSTRUCT

  • creates triples using templates

DESCRIBE

  • returns descriptions of resources

From clause

 Specifies which graphs should be considered by the endpoint.

  • if omitted, the so called default graph is used.
  • if specified, the query is evaluated using all specified graphs.
  • if specified as named graph, the named graphs can be used in parts of the query.

Graphs can be dereferenced by the SPARQL endpoint.

Solution modifiers

Change the result of a query 

LIMIT and OFFSET slice the resultset, useful for pagination

  example: SELECT * WHERE {.....} LIMIT 10 

  --> display only 10 results

ORDER BY sorts the result set

 example: SELECT * WHERE {.....} ORDER BY ASC(...) LIMIT 10 

  --> display the first 10 of the sorted result set

Where clause

  • contains the graph patterns
  • conjunctive
  • variables are bound to the same values

Triple patterns

  • General form a triple (s p o)
  • On all positions variables may occur
  • Variables are bound by the SPARQL endpoint

Triple patterns - Example

:John foaf:knows :Tim .
:John foaf:name "John" .
:Tim foaf:knows :John .

:Tim foaf:name "Tim" .

SELECT ?name WHERE {:John foaf:name ?name}
-->    "John"

SELECT ?friend WHERE {:John foaf:knows ?friend}
-->     :Tim

SELECT ?friend ?name WHERE {:John foaf:knows ?friend. :John foaf:name ?name}
-->     :Tim "John"

SELECT ?friendsname WHERE {:John foaf:knows ?friend. ?friend foaf:name ?name}
-->    "Tim"


Triple patterns - Cartesian product

:John foaf:knows :Tim .
:John foaf:name "John" .
:Tim foaf:knows :John .
:Tim foaf:name "Tim" .

SELECT ?person ?friendsname WHERE {?person foaf:knows ?friend. ?somebody foaf:name ?friendsname}


:John "John"
:John "Tim"
:Tim "John"
:Tim "Tim"

Matching Resources

Match character by character

either with prefix or full <URI>

  • foaf:name   == <http://xmlns.com/foaf/spec/name>  

percent encoding of reserved characters (like space)

  • myns:John%20Doe != myns:John Doe              | Error!

case sensitive

  • foaf:name   != <http://xmlns.com/foaf/spec/Name>  

Matching Literals

Literals need to match for equality character-by-character

  • can have datatype: xsd:int, xsd:date
    • SPARQL engine may know interpretation of datatype 
    • for equality, it needs to match exactly
  • can have language tag


Filter

  • operate on graph patterns
  • testing values
  • most prominently: restrict Literal values 
    • string comparison
    • regular expressions
    • numeric comparators
  • type/language checks
  • evaluate in the end either to true, false or type error

Filter Overview

  • Logical: !, &&, || 
  • Math: +, -, *, /  
  • Comparison: =, !=, >, <, ...  
  • SPARQL tests: isURI, isBlank, isLiteral, bound 
  • SPARQL accessors: str, lang, datatype Other: sameTerm, langMatches, regex
  • Vendor specific: prefixed like bif:contains

String Filtering

str() just the literal value, without datatype

regex() full regular expression

bif:contains string search using special index

String Filtering Example

:John :age 32 .
:John foaf:name "John"@en .
:Tim :age 20.

:Tim foaf:name "Tim"^^xsd:string .


SELECT ?friend {?friend foaf:name "Tim".}

-->    empty

SELECT ?friend {?friend foaf:name ?name.
FILTER (str(?name) = "Tim")}

-->    :Tim

SELECT ?friend {?friend foaf:name ?name. ?name bif:contains "im")}

-->    :Tim

Language and Datatype Filtering

lang(?x)  accessor to the language of a literal

langMatches(lang(?x),"en") evaluates if a language tag matches an other language tag

datatype(?x) accesses the datatype of the literal ?x

Numeric Filtering

:John :age 32 .
:John foaf:name "John"@en .
:Tim :age 20.

:Tim foaf:name "Tim"^^xsd:string .

SELECT ?friend WHERE {?friend :age ?age
FILTER (?age>25)}

-->     :John

Logical Operators

:John :age 32 . 
:John foaf:name "John"@en . 
:Tim :age 20.
:Tim foaf:name "Tim"^^xsd:string .
SELECT ?friend {?friend foaf:name ?name.
FILTER (str(?name) = "Tim" && ?age>25)}
-->    NULL
SELECT ?friend {?friend foaf:name ?name.
FILTER (str(?name) = "Tim" || ?age>25)}
-->    :Tim
-->    :John

Optional values

  • Similar to left join in SQL
  • Allows querying for incomplete data
  • “Optional” takes a full graph pattern
  • Syntax {pattern1} OPTIONAL {optpattern}

Optional Example

:John foaf:knows :Tim .
:John foaf:name "John" .
:John foaf:phone "+123456" .
:Tim foaf:knows :John .

:Tim foaf:name "Tim" .

SELECT ?name ?phone
                        {?person foaf:name ?name. 
                         ?person foaf:phone ?phone}

--> "John" "+123456"

This is a bit unsatisfying

Optional Example

:John foaf:knows :Tim .
:John foaf:name "John" .
:John foaf:phone "+123456" .
:Tim foaf:knows :John .

:Tim foaf:name "Tim" .

SELECT ?name ?phone {?person foaf:name ?name.
                   OPTIONAL {?person foaf:phone ?phone}}
--> "John" "+123456"
--> "Tim"


Union

Syntax: {graph pattern} UNION {graph pattern}

Allows querying (partly) differing data structures

Union Example

:John rdf:type foaf:Person .
:John foaf:name "John" .
:Tim rdf:type foaf:Person .
:Tim foaf:name "Tim" .
:Jane rdf:type foaf:Person .

:Jane rdfs:label "Jane" .

SELECT ?name WHERE {?person a foaf:Person. ?person foaf:name ?name}

--> "John"
--> "Tim"

SELECT ?name WHERE {?person a foaf:Person.
           {?person foaf:name ?name} UNION{?person rdfs:label ?name}}

--> "John"
--> "Tim"
--> "Jane"

SELECT ?name WHERE {
           {?person foaf:name ?name. ?person a foaf:Person} UNION{?person rdfs:label ?name. ?person a foaf:Person. }}


Projection

SELECT * WHERE {.....}

  --> all variables mentioned in the graph patterns

SELECT ?s ?o WHERE {?s ?p ?o} 

  --> only the variables specified, in this case ?s and ?o

SELECT DISTINCT ........

  --> eliminates duplicates in the result

Count

a simple aggregate function

counts how often a variable is bound.

Example:

:John foaf:knows :Tim .
:John foaf:name "John" .
:Tim foaf:knows :John .

:Tim foaf:name "Tim" .

SELECT count(?person) {?person foaf:name ?name}

--> 2

SPARQL in Real-Life - Outline

  • We use the previously acquired knowledge for
    • Exploring unknown data structures and vocabularies
    • Querying inconsistent data structures


Some public SPARQL endpoints

SPARQLer: general-purpose query endpoint for Web-accessible data

DBpedia: extensive RDF data from Wikipedia

DBLP: bibliographic data from computer science journals and conferences

LMDB: data from MDB - Movies data base (without html form)

World Factbook: country statistics from the CIA World factbook

About DBpedia

  • Crystallization point of the Semantic Web
  • Single most important data source
  • Community effort
  • Extract from the semi-structured information in Wikipedia
  • Non-curated content



Know your limits!

The DBpedia endpoint popular and well-used

Always add a LIMIT statement, when constructing queries

Vocabulary Exploration

Exploration by examining instance data

  • Find descriptive information about the dataset
  • Use tools
  • Analyze the query dump
  • Dereference URI
  • Queries

Descriptive Information

Most datasets have publications decribing them

Find papers about them using scholar.google.com


Tools: Relationship finder

http://www.visualdataweb.org/relfinder.php

Dereference URIs

 The Linked Data principles allow dereferencing URIs to get decriptions

Instance data on Leipzig

--> http://dbpedia.org/resource/Leipzig

Vocabulary information about foaf:name

--> http://xmlns.com/foaf/0.1/name


Querying

DESCRIBE <http://dbpedia.org/resource/Leipzig>

SELECT ?p ?o WHERE {<http://dbpedia.org/resource/Leipzig> ?p ?o}


?p queries

Query resources with a variable in the predicate position of a triple pattern

?p queries - Example

:John foaf:name "John".
:John rdfs:label "This is John".
:John foaf:phone "+12312".


SELECT ?p           ?o            WHERE {:John ?p ?o}

-->     foaf:name     "John"
-->     rdfs:label      "This is John"
-->     foaf:phone    "+12312"


Querying for Classes

Vocabularies define classes

  • foaf:Person
  • foaf:Document

rdf:type/a associates an instance with a class

  • :John a foaf:Person == :John rdf:type foaf:Person


Querying for Classes - Example

:John a foaf:Person
:Pluto a animals:Dog

SELECT ?person WHERE {?person a foaf:Person}

-->            :John

SELECT ?class WHERE {?instance a ?class}

-->          foaf:Person
-->          animals:Dog


Some public SPARQL endpoints

SPARQLer: general-purpose query endpoint for Web-accessible data

DBPedia: extensive RDF data from Wikipedia

DBLP: bibliographic data from computer science journals and conferences

LMDB: data from MDB - Movies data base (without html form)

World Factbook: country statistics from the CIA World factbook

Types

Get all the possible types of concepts in DBpedia

Types A

SELECT distinct ?type

WHERE {
   ?e a ?type
}

Properties list

Get all the properties of the Actor class. Show also their titles

Properties list A

SELECT distinct ?p ?title 

WHERE {
   ?p rdfs:label ?title.
   ?e a <http://dbpedia.org/ontology/Actor>.
   ?e ?p ?v
}

Working with DBpedia page

 

Look through Ivan The Terrible DBpedia page. What properties you might use to get the full list of Russian Leaders?

Check your suggestions using the DBpedia endpoint

Compare the amount of results for different queries using COUNT aggregation function.

Working with DBpedia page A

SELECT ?e
WHERE {
   ?e dcterms:subject category:Russian_leaders
}

SELECT ?e
WHERE {
   ?e dbpprop:title dbpedia:List_of_Russian_rulers
}

...

SELECT count(?e)

WHERE {
   ?e dbpprop:title dbpedia:List_of_Russian_rulers .

...

Multiple patterns

 Change the previous query to show also the real name of the leader.

Multiple patterns A

SELECT ?e ?name
WHERE {
   ?e dbpprop:title dbpedia:List_of_Russian_rulers  .
   ?e dbpprop:name ?name

}   

Better version:

SELECT ?e ?name
WHERE {
   ?e dbpprop:title dbpedia:List_of_Russian_rulers  .
   ?e rdfs:label ?name
}

LIMIT

Show only 20 first results. Then show the next 20. 

Show twenty results starting from the 10th.

LIMIT A

SELECT ?e ?name

WHERE {
   ?e dbpprop:title dbpedia:List_of_Russian_rulers  .
   ?e rdfs:label ?name
}

LIMIT 20
OFFSET 10

FILTER

Filter the list and show only the results for Ivan_the_Terrible

FILTER A

SELECT ?e ?name
WHERE {
   ?e dbpprop:title dbpedia:List_of_Russian_rulers  .
   ?e rdfs:label ?name .

FILTER (?e = <http://dbpedia.org/resource/Ivan_the_Terrible>)
}

String Matching

 Show the list of all Russian leaders with the name "Ivan"

String matching A

SELECT ?e ?name
WHERE{
   ?e dbpprop:title dbpedia:List_of_Russian_rulers  .
   ?e rdfs:label ?name .

FILTER regex(?name, "ivan", "i")

Langmatching

Get a list of Russian leaders showing only Russian labels for the name.

Langmatching A

SELECT ?e ?name
WHERE {
   ?e dbpprop:title dbpedia:List_of_Russian_rulers  .
   ?e rdfs:label ?name .

FILTER (langMatches(lang(?name), "RU"))
}

Choosing properties to show

Rewrite the previous query to show the entry, the name, the name of predecessor and the name of successor. 

Choosing properties to show A

SELECT ?name ?predecessor_name ?successor_name
WHERE {
   ?e dbpprop:title dbpedia:List_of_Russian_rulers .
   ?e rdfs:label ?name .
   ?e dbpedia-owl:successor ?successor .
   ?successor rdfs:label ?successor_name .
   ?e dbpprop:predecessor ?predecessor .
   ?predecessor rdfs:label ?predecessor_name .
FILTER (langMatches(lang(?name), "EN") && langMatches(lang(?successor_name), "EN") && langMatches(lang(?predecessor_name), "EN"))
}

More practice

 Find a real name of the Russian leader, who was on the throne right before Catherine I ("Catherine I of Russia"@en)

Can  you find other ways to do the same task?

More practice A

SELECT ?name
WHERE{
   ?e dbpprop:title dbpedia:List_of_Russian_rulers .
   ?e rdfs:label ?name .
   ?e dbpedia-owl:successor ?successor  .
   ?successor rdfs:label "Catherine I of Russia"@en

} OR

SELECT ?name as ?leader
WHERE {
   ?e dbpprop:title dbpedia:List_of_Russian_rulers .
   ?e rdfs:label ?name .
   ?e dbpedia-owl:successor ?successor .
   ?successor  rdfs:label ?successor_name .
   FILTER (?successor_name = "Catherine I of Russia"@en)
}

OPTIONALs

Look at the page: http://dbpedia.org/page/Dmitry_of_Suzdal

Why this leader is not in the results of the previous queries?

Fix the problem.

OPTIONALs A

SELECT ?e ?name ?predecessor_name ?successor_name
WHERE {?e dbpprop:title dbpedia:List_of_Russian_rulers .
   ?e rdfs:label ?name .
   FILTER (langMatches(lang(?name), "EN")) .
   OPTIONAL {?e dbpedia-owl:successor ?successor .
     ?successor rdfs:label ?successor_name .
     FILTER (langMatches(lang(?successor_name), "EN"))  .
   }  .
   OPTIONAL {?e dbpprop:predecessor ?predecessor .
    ?predecessor rdfs:label ?predecessor_name .
    FILTER (langMatches(lang(?predecessor_name), "EN")) .
   }
}

UNIONs

Look at the http://dbpedia.org/page/Dmitry_of_Suzdal page more carefully. What can you say about successor and predecessor of the leader?

Fix the problem.

UNIONs A

SELECT ?e ?name ?predecessor_name ?successor_name
WHERE {?e dbpprop:title dbpedia:List_of_Russian_rulers .
   ?e rdfs:label ?name .
   FILTER (langMatches(lang(?name), "EN")) .
   OPTIONAL {{?e dbpedia-owl:successor ?successor} UNION  { ?e dbpprop:after ?successor } .
     ?successor rdfs:label ?successor_name .
     FILTER (langMatches(lang(?successor_name), "EN"))  .
   }  .
   OPTIONAL {{?e dbpprop:predecessor ?predecessor} UNION  { ?e dbpprop:before ?predecessor } .
    ?predecessor rdfs:label ?predecessor_name .
    FILTER (langMatches(lang(?predecessor_name), "EN")) .
   }  .

Final task

Show the list of actors, played together with Julia Roberts. For each result show also the name of the movie and the director. Order the results both by director and by movie.

Final task A


SELECT ?director_name ?movie_name ?actor_name WHERE { ?movie dbpedia-owl:starring <http://dbpedia.org/resource/Julia_Roberts> . ?movie dbpedia-owl:starring ?actor . ?movie rdfs:label ?movie_name . ?actor rdfs:label ?actor_name . ?movie dbpedia-owl:director ?director . ?director rdfs:label ?director_name . FILTER (langMatches(lang(?movie_name), "EN") && langMatches(lang(?actor_name), "EN") && langMatches(lang(?director_name), "EN")) . } ORDER BY ?director ?movie

Aggregate Functions

Aggregate functions similar to SQL were introduced with SPARQL 1.1

Most important min, max, avg, sum, count

Group by groups the results accordingly, neccessary for projection


Aggregate Functions - Example

:John :age 32 .
:John :gender :male .
:Tim :age 20.
:Tim :gender :male.
:Jane :gender :female.

:Jane :age 23.

SELECT avg(?age) WHERE {?person :age ?age}

-->    25

SELECT ?gender min(?age)  WHERE {?person :age ?age. ?person :gender ?gender} GROUP BY ?gender

-->    :male  20
-->    :female 23

Other query types

CONSTRUCT

--> creates a graph by binding variables in a template

ASK

--> returns a boolean values, if the pattern could be found

DESCRIBE

--> gives a short description about some resources


Other Query Types Examples

:John foaf:knows :Tim .
:John foaf:name "John" .
:John foaf:phone "+123456" .
:Tim foaf:knows :John .
:Tim foaf:name "Tim" .

CONSTRUCT {?person foaf:name ?name. ?person foaf:phone ?phone}
                   {?person foaf:name ?name. ?person foaf:phone ?phone}


--> :John foaf:name "John" .
--> :John foaf:phone "+123456" .


Other Query Types Examples

:John foaf:knows :Tim .
:John foaf:name "John" .
:John foaf:phone "+123456" .
:Tim foaf:knows :John .
:Tim foaf:name "Tim" .

DESCRIBE ?person WHERE {?person foaf:name ?name. ?person foaf:phone ?phone}
     
--> :John foaf:name "John" . 
--> :John foaf:phone "+123456" .


Other Query Types Examples

:John foaf:knows :Tim .
:John foaf:name "John" .
:John foaf:phone "+123456" .
:Tim foaf:knows :John .
:Tim foaf:name "Tim" .

ASK         {?person foaf:name ?name. ?person foaf:phone ?phone}
-->true


Named Grahps

Allow more control about from which graphs a triple is coming from

SELECT *
FROM NAMED <http://mygraph.example/>
WHERE{ GRAPH ?g {?s ?p ?o}}

--> <http://mygraph.example/> <s> <p> <o>

Named Graphs - Example

SELECT ?g ?o ?p2 ?o2
FROM <http://www.w3.org/People/Berners-Lee/card.rdf>
FROM NAMED <http://dig.csail.mit.edu/2008/webdav/timbl/foaf.rdf>
{ ?s ?p ?o.
GRAPH ?g {?o ?p2 ?o2}  }

--> A huge list of triples

Subquery with from clause

Subqueries on the cheap:

  1. Write the query which you want to use as basis as a CONSTRUCT Query
  2. URL encode it
  3. Create the other query
  4. Put the endpoint for the first + the encoded query in the FROM clause of the other query.


--> SELECT * FROM <http://dbpedia.org/sparql?query=SELECT%20....>



Negation

Question: How to find all contacts that do NOT have a phone number


Use a combination of not, bound and optional!

Negation Example

:John foaf:knows :Tim .
:John foaf:name "John" .
:John foaf:phone "+123456" .
:Tim foaf:knows :John .

:Tim foaf:name "Tim" .

SELECT ?name ?phone {
              ?person foaf:name ?name.
               OPTIONAL {?person foaf:phone ?phone}
               FILTER (!bound(?phone))}

--> Tim

Query Federation

The SERVICE keyword allows federation between multiple SPARQL endpoints

The endpoints distribute the query

Reasoning and SPARQL

Reasoning is not a SPARQL feature

Some reasoning can be simulated with SPARQL

Missing direct associations with parent classes can be queried with patterns like

{?sub rdfs:subClassof ?parent .
?subsub rdfs:subClassOf ?sub...}

Property Paths

Either syntactical sugar:

   ?person foaf:knows/foaf:name ?name ==

   ?person  foaf:knows ?friend. ?friend foaf:name ?name 

Or explorative:

  ?x foaf:knows+/foaf:name ?name .

The SPARQL 1.1 Recommendation has further helpful examples.






Queries and Algebra

  • SPARQL queries are compiled into algebraic expressions for evaluation
  • SPARQL queries with identical result sets can perform differently, depending on how well the query can be optimized.

Examples:

select * {?s ?p ?o. FILTER (?p = foaf:name && ?o = "Angela Merkel"@en) }

select * {?s foaf:name "Angela Merkel"@en }


Algebra

PREFIX  foaf: <http://xmlns.com/foaf/spec/>

SELECT  *
WHERE
  { ?s foaf:name "Angela Merkel"@en }
compiles into
  1 (base <http://example/base/>
  2   (prefix ((foaf: <http://xmlns.com/foaf/spec/>))
  3     (bgp (triple ?s foaf:name "Angela Merkel"@en))))



Algebra

PREFIX  foaf: <http://xmlns.com/foaf/spec/>

SELECT  *
WHERE
  { ?s ?p ?o
    FILTER ( ( ?p = foaf:name ) && ( ?o = "Angela 
Merkel"@en ) ) }

compiles into 

(base <http://example/base/>
  (prefix ((foaf: <http://xmlns.com/foaf/spec/>))
    (filter (&& (= ?p foaf:name) (= ?o "Angela
Merkel"@en)) (bgp (triple ?s ?p ?o)))))

Evaluation

  • SPARQL queries are recursively evaluated, starting from the triple patterns (leaf nodes)
  • Intermediate result sets are build

Usage of indexes for:

  • Resources
  • Literals

But not for:

  • regex
  • aggregate functions

Instead consider bif:contains (bif = built-in function, in some engines)

Also consider pushing filters as deep into queries as possible.

Not Bound

Check your final task from the basic SPARQL tutorial: try to find the movies, starring by Julia Roberts, where there is no information about director.

Not Bound A

SELECT ?movie_label WHERE { ?movie dbpedia-owl:starring <http://dbpedia.org/resource/Julia_Roberts> . ?movie rdfs:label ?movie_label . OPTIONAL {?movie dbpedia-owl:director ?director} . FILTER (langMatches(lang(?movie_label), "EN") && !bound(?director)) }

Aggregation

Collect the statistics about Russia: find the population, total number of cities, number of cities with the population more than 1billion, average population of cities.

Aggregation A1


SELECT ?population count(?city) WHERE { <http://dbpedia.org/resource/Russia> dbpprop:populationEstimate ?population . ?city a dbpedia-owl:PopulatedPlace . ?city dbpedia-owl:country <http://dbpedia.org/resource/Russia> . }

Aggregation A2


SELECT count(?billioners) WHERE { ?billioners a dbpedia-owl:PopulatedPlace . ?billioners dbpedia-owl:country <http://dbpedia.org/resource/Russia> . ?billioners dbpprop:pop2002census ?city_population . FILTER (?city_population > 1000000) }

Aggregation A3


SELECT AVG(?population) WHERE { ?city a dbpedia-owl:PopulatedPlace . ?city dbpedia-owl:country <http://dbpedia.org/resource/Russia> . ?city dbpprop:pop2002census ?population . }

AS

 Change the previous queries to show the correct titles of the table columns.

AS A


SELECT ?population count(?city) AS ?number_of_cities WHERE { <http://dbpedia.org/resource/Russia> dbpprop:populationEstimate ?population . ?city a dbpedia-owl:PopulatedPlace . ?city dbpedia-owl:country <http://dbpedia.org/resource/Russia> . }

MINUS

 Exclude the Novosibirsk when counting the average population of Russian cities

MINUS A


SELECT AVG(?population) WHERE { ?city a dbpedia-owl:PopulatedPlace . ?city dbpedia-owl:country <http://dbpedia.org/resource/Russia> . ?city dbpprop:pop2002census ?population . MINUS {<http://dbpedia.org/resource/Novosibirsk> dbpprop:pop2002census ?population} }

Retrieving the information

Show the information about Moscow. Show all triples, where Moscow is either a subject or an object.

Retrieving the information A


SELECT ?s ?p ?o WHERE { { ?s ?p ?o. filter (?s = <http://dbpedia.org/resource/Moscow>) } UNION { ?s ?p ?o. filter (?o = <http://dbpedia.org/resource/Moscow>) } }

Searching for commons

 Find the commons between Mikhail Gorbachev and Ivan The Terrible.

Searching for commons A


SELECT ?p ?c ?o WHERE { <http://dbpedia.org/resource/Ivan_the_Terrible> ?p ?o . <http://dbpedia.org/resource/Mikhail_Gorbachev> ?c ?o }

Use of relational finder

 Do the same task in RelFinder

Use of Hanne

 Find the best way to get the list of Russian football clubs using Hanne

How to connect

 Download the file

Where to write query

Find this part:

You can enter any query you want.

How to show the results

 Fill the table accordingly with the JSON object structure

SPARQL Endpoints

http://www.sparql.org/

http://dbpedia.org/sparql

http://www.w3.org/wiki/SparqlEndpoints

SPARQL enabled triple stores

Virtuoso Open Source   ---  http://virtuoso.openlinksw.com/

Jena & Fuseki --- http://jena.apache.org/

Sesame --- http://www.openrdf.org/

Local queries with ARQ

How to query RDF datasets in local files:

  1. Download Jena
  2. Learn about the SPARQL features that ARQ supports
  3. Use the arq.query command-line tool ( documentation )
  4. java -cp <jena>/lib/commons-codec-1.6.jar:…:<jena>/lib/xml-apis-1.4.01.jar arq.query --data=file.rdf --query=file.sparql
    • Wrap this into a shell script or alias to save time!
    • Java class path must contain all *.jar files of Jena
    • on Windows use ; instead of : as separator
    • data file must be RDF/XML and have *.rdf filename extension
    • trick in Unix-style shells (e.g. bash): instead of --query=file.sparql use --query=<(echo "SELECT ...") (“process substitution”)

ARQ: Output

Example of running ARQ (see previous slide for full command line):

$ ... arq.query --data=test.rdf --query=<(echo "SELECT DISTINCT ?class WHERE {?s a ?class}")
----------------------------------------
| class                                |
========================================
| <http://xmlns.com/foaf/0.1/Person>   |
| <http://xmlns.com/foaf/0.1/Document> |
----------------------------------------

Additional Tools

YASGUI – a user-friendly web GUI to query a given SPARQL endpoint, with syntax highlighting.

FedX --- http://www.fluidops.com/fedx/

Further Learning Resources

  • SPARQL Trainer (http://aksw.org/projects/sparqltrainer)
  • Learning SPARQL, Bob DuCharme, O'Reilly (2011)
  • Semantic Web for the Working Ontologist, Dean Allemang and James Hendler, Morgan Kaufmann (2011)  
  • SPARQL by example, http://www.cambridgesemantics.com/semantic-university/sparql-by-example

Additional Topics

GeoSparql

Task: Display monuments 30km away on a map.

Sparql Update

Task: Create a Graph with some personal information about you.



The end!

 

how to install fuseki and use it


Source: http://jena.apache.org/documentation/serving_data/

Define your own knowledge base and load it to fuseki and query it

This is the knowledge base:

John rdf:type foaf:Person . 

John foaf:name "John" . 

Tim rdf:type foaf:Person . 

Tim foaf:name "Tim" . 

Jane rdf:type foaf:Person .

Jane rdfs:label "Jane" .

Jane foaf:name "Jane" . 

and this is the query:

SELECT ?name WHERE {?person a foaf:Person. 

           {?person foaf:name ?name} UNION{?person rdfs:label ?name}}


  • You can also try more queries inspiring from the lecture.

Linked (Open) Data

Linked Data is data on the Web that satisfies certain basic principles (see next slide).

Linked Data is often Open Data (i.e. beeing freely available with an open license), then called Linked Open Data (LOD).
However, the Linked Data principles also work for “closed” data, e.g. in enterprise intranets.

In this lecture you will learn

  • the linked data principles
  • basic architecture principles for the Web
  • the 5-star scheme for open data

Linked Data Principles

Tim Berners-Lee outlined four principles of Linked Data in his Design Issues: Linked Data note, paraphrased along the following lines:

  • Use URIs to identify things that you expose to the Web as resources.
  • Use HTTP URIs so that people can locate and look up (dereference) these things.
  • Provide useful information about the resource when its URI is dereferenced.
  • Include links to other, related URIs in the exposed data as a means of improving information discovery on the Web.
  • Web Architecture Principles (I): Resources

    • Resources identify the items of interest in our domain - things whose properties and relationships we want to describe in the data
    • W3C Technical Architecture Group (TAG) distinguishes two kinds of resources:
      • Information resources: All the resources we find on the traditional document Web, such as documents, images, and other media files, are information resources.
      • Non-information resources (also 'other resources') . People, physical products, places, proteins, scientific concepts, etc. As a rule of thumb, all “real-world objects” that exist outside of the Web are non-information resources.

    Web Architecture Principles (II): Resource Identifiers

    • Resources are identified using Uniform Resource Identifiers (URIs) .
    • For Linked Data, we use HTTP URIs only (avoid other URI schemes such as URNs and DOIs)
    • HTTP URIs make good names for two reasons:
      • They provide a simple way to create globally unique names without centralized management;
      • URIs work not just as a name but also as a means of accessing information about a resource over the Web.
    • The preference for HTTP over other URI schemes is discussed at length in the W3C TAG draft finding URNs, Namespaces and Registries.

    Web Architecture Principles (III): Representations

    • Information resources can have representations.
    • A representation is a stream of bytes in a certain format, such as HTML, RDF/XML, or JPEG.
    • For example, an invoice is an information resource. It could be represented as an HTML page, as a printable PDF document, or as an RDF document.
    • A single information resource can have many different representations, e.g. in different formats, resolution qualities, or natural languages.

    Web Architecture Principles (IV): Dereferencing HTTP URIs

    URI Dereferencing = looking up a URI on the Web to get information about the referenced resource. W3C TAG distinguishes two different types of URIs:

    • Information Resources : When a URI identifying an information resource is dereferenced, the server usually generates a new representation, a new snapshot of the information resource's current state, and sends it back to the client using the HTTP response code 200 OK.
    • Non-Information Resources cannot be dereferenced directly.
      1. Instead of sending a representation of the resource, the server redirects the client the URI of a information resource which describes the non-information resource using the HTTP response code 303 See Other .
      2. The client dereferences this new URI and gets a representation describing the original non-information resource.

    Data publishers have two ways of providing clients with URIs of information resources describing non-information resources: Hash URIs and 303 redirects.

    Web Architecture Principles (V): HTTP requests

    1. The client performs an HTTP GET request on a URI identifying a non-information resource. In our case a vocabulary URI. If the client is a Linked Data browser and would prefer an RDF/XML representation of the resource, it sends an Accept: application/rdf+xml header along with the request. HTML browsers would send an Accept: text/html header instead.
    2. The server recognizes the URI to identify a non-information resource. As the server can not return a representation of this resource, it answers using the HTTP 303 See Other response code and sends the client the URI of an information resource describing the non-information resource. In the RDF case: RDF content location.
    3. The client now asks the server to GET a representation of this information resource, requesting again application/rdf+xml.
    4. The server sends the client an RDF/XML document containing a description of the original resource vocabulary URI.

    URI Aliases

    • Web = open environment => different information providers talk about the same non-information resource, e.g. a geographic location or a famous person => different URIs for identifying the same real-world object:
    • DBpedia: http://dbpedia.org/resource/BerlinGeonames: http://sws.geonames.org/2950159/ both URIs refer to the same non-information resource => URI aliases.
    • URI aliases:
      • common on the Web of Data, can not realistically be expected that all information providers agree on the same URIs to identify a non-information resources.
      • provide an important social function since they allow different views and opinions to be expressed.
      • common practice that information providers set owl:sameAs links to URI aliases they know about.

    LOD uses RDF Data Model

    Benefits of using the RDF Data Model in the Linked Data Context

    • Clients can look up every URI in an RDF graph over the Web to retrieve additional information.
    • Information from different sources merges naturally.
    • The data model enables you to set RDF links between data from different sources.
    • The data model allows you to represent information that is expressed using different schemas in a single model.
    • Combined with schema languages such as RDF Schema or OWL, the data model allows you to use as much or as little structure as you need, meaning that you can represent tightly structured data as well as semi-structured data.

    RDF Features Best Avoided in the Linked Data Context

    • Blank nodes: impossible to set external RDF links to a blank node, merging data from different sources becomes more difficult => use URI references.
    • RDF reification: semantics of reification unclear and reified statements are cumbersome to query with the SPARQL. Metadata can be attached to the information resource instead.
    • RDF collections or RDF containers do not work well together with SPARQL. Can the information also be expressed using multiple triples having the same predicate? makes SPARQL queries straight forward.

    Choosing URIs

    • Use HTTP URIs for everything. http:// scheme is (the only URI scheme) that is widely supported in tools and infrastructure.
    • Define your URIs in an HTTP namespace under your control, where you actually can make them dereferenceable.
    • Keep implementation cruft out of your URIs. Short, mnemonic names are better. Consider:
      • http://dbpedia.org/resource/Berlin
      • http://www4.wiwiss.fu-berlin.de:2020/demos/dbpedia/cgi-bin/resources.php?id=Berlin
    • Keep URIs stable and persistent. Changing URIs will break any already-established links.
    • URIs are constrained by the technical environment. server is called demo.serverpool.wiwiss.example.org, then your URIs will have to begin with http://demo.serverpool.wiwiss.example.org/. If server does not run on port 80, then URIs begin with http://demo.serverpool.example.org:2020/. Clean up URIs using URI rewriting rules.
    • Often three URIs related to a single non-information resource:
      • an identifier for the resource,
      • an identifier for a related information resource suitable to HTML browsers (with a web page representation),
      • an identifier for a related information resource suitable to RDF browsers (with an RDF/XML representation).
    • Several ideas for choosing these related URIs:
      • http://dbpedia.org/resource/Berlin http://id.dbpedia.org/Berlin http://dbpedia.org/Berlin
      • http://dbpedia.org/page/Berlin http://pages.dbpedia.org/Berlin http://dbpedia.org/Berlin.html
      • http://dbpedia.org/data/Berlin http://data.dbpedia.org/Berlin http://dbpedia.org/Berlin.rdf
    • Often some kind of primary key is required inside URIs, to make sure that each one is unique. Use a key that is meaningful inside your domain. E.g., when dealing with books, using the ISBN number is better than using the primary key of an internal database table.

    Vocabularies

    • Reusing existing terms
    • Well-known vocabularies have evolved in the Semantic Web community:
    • Friend-of-a-Friend (FOAF), vocabulary for describing people.
    • Dublin Core (DC) defines general metadata attributes. See also their new domains and ranges draft.
    • Semantically-Interlinked Online Communities (SIOC), vocabulary for representing online communities.
    • Description of a Project (DOAP), vocabulary for describing projects.
    • Simple Knowledge Organization System (SKOS), vocabulary for representing taxonomies and loosely structured knowledge.
    • Music Ontology provides terms for describing artists, albums and tracks.
    • Review Vocabulary, vocabulary for representing reviews.
    • Creative Commons (CC), vocabulary for describing license terms.

    What should be returned as RDF description for a URI?

    • The description: all triples from the dataset that have the resource's URI as the subject (immediate description of the resource)
    • Backlinks: all triples from the dataset that have the resource's URI as the object (allows browsers and crawlers to traverse links in either direction).
    • Related descriptions: additional information about related resources that may be of interest in typical usage scenarios. E.g., send information about the author along with information about a book, because clients interested in the book may also be interested in the author. Moderation is recommended, returning 1 MB of RDF will be considered excessive.
    • Metadata: any metadata, such as a URI identifying the author and licensing information. These should be recorded as RDF descriptions of the information resource that describes a non-information resource; i.e., the subject of the RDF triples should be the URI of the information resource. Attaching meta-information to that information resource, rather than attaching it to the described resource itself or to specific RDF statements about the resource (as with reification) plays nicely together with using Named Graphs and the SPARQL query language in Linked Data client applications. Each RDF document should contain a license under which the content can be used (e.g. Creative Commons).
    • Serialization: The data source should at least provide RDF/XML (official syntax for RDF), additionally Turtle (better readable), when asked for MIME-type text/turtle. In situations where people might want to use data together with XML technologies (XSLT or XQuery), also serve a TriX serialization (works better with these technologies than RDF/XML).

    Example: Returning RDF from a URI

    Metadata and Licensing Information

    <http:dbpedia.org/data/alec_empire/>
            rdfs:label "RDF description of Alec Empire" ;
    rdf:type foaf:Document ;
    dc:publisher <http:dbpedia.org/resource/dbpedia/> ; dc:date "2007-07-13"^^xsd:date ;
    dc:rights <http:en.wikipedia.org/wiki/wp:gfdl/> .

    The description

    <http:dbpedia.org/resource/alec_empire/>
            foaf:name "Empire, Alec" ;
    rdf:type foaf:Person, <http:dbpedia.org/class/yago/musician/> ; rdfs:comment "Alec Empire (born May 2, 1972) is a German musician who is ..."@en ;
    rdfs:comment "Alec Empire (eigentlich Alexander Wilke) ist ein deutscher Musiker. ..."@de ;
    dbpedia:genre <http:dbpedia.org/resource/techno/> ;
    dbpedia:associatedActs <http:dbpedia.org/resource/atari_teenage_riot/> ;
    foaf:page <http:en.wikipedia.org/wiki/alec_empire/>, <http:dbpedia.org/page/alec_empire/> ;
    rdfs:isDefinedBy <http:dbpedia.org/data/alec_empire/> ; owl:sameAs <http:zitgist.com/music/artist/d71ba53b-23b0-4870-a429-cce6f345763b/> .

    Backlinks

    • <http:dbpedia.org/resource/60_second_wipeout/> dbpedia:producer <http:dbpedia.org/resource/alec_empire/> .
    • <http:dbpedia.org/resource/limited_editions_1990-1994/> dbpedia:artist <http:dbpedia.org/resource/alec_empire/> .

    5 star Open Data

    Learning Objectives

    • Relation between Relational Databases and RDF.
    • Basic understanding of mapping principles.

    Classic web deployment

    • Shared access to the
      data.
    • Exposes data as
      webpages for human consumption.

    Triplification by Materialization

    • Direct access on the data, users can create their own queries.
    • Linked Data allows other applications to consume date.
    • Negative: Needs an other server with indexes / memory footprint. 

    Triplification by SPARQL-to-SQL-Rewriting

    • All benefits from previous plus:
    • Reduced deployment overhead, small memory footprint
    • Data always up to date

    Mappings for Triplification

    •  Work for both Materialization and SPARQL-to-SQL-Rewriting
    • R2RML is the most prominent RDB to RDF Mapping Language
      • Custom mappings for converting RDB into RDF
      • W3C Recommendation since September 2012
      • Turtle serialization



    R2RML Core Concepts

    Term Map creates RDF terms (IRIs, Literals and Blank Nodes)

    • from a template, or
    • from a column, or
    • from a constant expression.
       

    R2RML Core Concepts

    Term Map creates RDF terms (IRIs, Literals and Blank Nodes)

    • from a template, or
    • from a column, or
    • from a constant expression.

    Triples Map create triples

    • from the rows of a table or view,
    • using Term Maps.

    R2RML Core Concepts

    Term Map creates RDF terms (IRIs, Literals and Blank Nodes)

    • from a template, or
    • from a column, or
    • from a constant expression.

    Triples Map create triples

    • from the rows of a table or view,
    • using Term Maps.  

    Referencing Object Map models a relations between Triples Maps.

    Mapping with R2RML

     

    Mapping with R2RML

     

    Simple Mapping Executed

     A Sample Database


    Simple Mapping Executed

    For the previous mapping


    Simple Mapping Executed

    Results in


    Outline

    1. Motivation and Definition
    2. Overview of Ontology Learning Approaches
    3. In Detail: Learning Definitions with Refinement Operators
    4. Conclusions

    Outline

    1. Motivation and Definition
    2. Overview of Ontology Learning Approaches
    3. In Detail: Learning Definitions with Refinement Operators
    4. Conclusions

    Definition: Ontology Learning

    • "Ontology Learning is a subtask of information extraction. The goal of ontology learning is to (semi-)automatically extract relevant concepts and relations from a given corpus or other kinds of data sets to form an ontology." (Wikipedia, today)
    • "Ontology Learning is a mechanism for semi-automatically supporting the ontology engineer in engineering ontologies.''
      A. D. Mädche. Ontology Learning for the Semantic Web. Dissertation. Universität Karlsruhe, 2001
    • "Ontology Learning aims at the integration of a multitude of disciplines in order to facilitate the construction of ontologies, in particular ontology engineering and machine learning."
      A. D. Mädche, S. Staab. Ontology Learning. Handbook of Ontologies in Information Systems, 2004

    Classification of Ontology Learning Data

    sometimes heterogeneous sources of evidence (e.g., hyponymy [Snow et al. 2006], subsumption [Cimiano et al. 2005], [Manzano-Macho et al. 2008], [Buitelaar et al. 2008], disjointness [Völker et al. 2007])

    Classification of Ontology Learning DataII

    Ontology Learning Layer Cake [Cimiano 2006]

     

    Patterns [Hearst 1992] for Class Subsumption

    • NP such as {NP,}* {or|and} NP
      • games such as baseball and cricket
    • NP {,NP}* {,} {and|or} other NP
      • rabbits and other animals
      • but: „rabbits and other pets
    • NP {,} including {NP,}* {or|and} NP
      • fruits including apples and pears
    • NP {,} especially {NP,}* {or|and} NP
      • Europeans, especially Italians
      • but: „US presidents, especially democrats

    Patterns [Ogata and Collier 2004]

    • NP is a NP
      • „A kangaroo is an animal living in Australia.“
    • a NP named|called NP
      • „Japanese people like to play a game called Go .“
    • NP, NP
      • Sencha , the most popular tea in Japan, ..."
    • NP. The NP
      • „John loves his Ferrari . The car ...“
    • Among NP, NP
      • Among all musical instruments, violins are ..."
    • NP except for|other than NP
      • Employees except for managers suffer from ..."

    JAPE Rule

    • GATE = General Architecture for Text Engineering
    • written in Java
    • mature, used worldwide
    • JAPE = language for rapid prototyping and efficient implementation of shallow analysis methods
    • can be used e.g.~for domain specific patterns (financial blogs etc.)

    JAPE Rule II

    rule: Hearst_1 ( (NounPhrase):superconcept {SpaceToken.kind == space} {Token.string=="such"} {SpaceToken.kind == space} {Token.string=="as"} {SpaceToken.kind == space} (NounPhrase):subconcept ):hearst1

    -->

    :hearst1.SubclassOfRelation = { rule = "Hearst1" }, :subconcept.Domain = { rule = "Hearst1" }, :superconcept.Range = { rule = "Hearst1" }

    Lexical Context Similarity (e.g. [Cimiano and Völker 2005])

    • "Columbus is the capital of the state of Ohio. Columbus has a population of about 700,000 inhabitants."
    • Columbus (capital (1), state (1), Ohio (1), population (1), inhabitant (1) )
    • City (country (2), state (1), inhabitant (2), mayor (1), attraction (1) )
    • Explorer (ship (1), sailor (2), discovery (1) )  

    „most probably“: City(Columbus)

    Subcategorization Frames

    • "Tina drives a Ford."
      • Person(Tina). Vehicle(Ford).
    • "Her father drives a bus."
      • Father subclass-of Person
      • Bus subclass-of Vehicle
    • subcat: drive( subj: person, obj: vehicle )
      • \[Person \sqsubseteq \forall drive.Vehicle \]

    Text2Onto

    Suchanek et al. 2009

    Learning from text and background knowledge via reasoning:

    "Washington is the capital of the US. (...) New York is the US capital of fashion."

    • extracted: hasCapital(US, New York); hasCapital(US, Washington)
    • background knowledge: hasCapital is a functional property
    • possible inferences:
      • New York = Washington
      • inconsistency (unique names assumption)
    • logical contradictions can help to detect errors in automatically extracted information

    LeDA

    Other Approaches

    • Association rules and co-occurrence statistics
    • WordNet : \[hyponymy \approx subsumption \]
      • hyponym( bank\(\sharp\)1‚ institution\(\sharp\)1 )
      • Bank subclass-of Institution
    • Noun phrase heuristics
      • „image processing software“
    • Instance clustering (e.g. Columbus and Washington)
      • Hierarchical clustering of context vectors
    • Knowledge Base Completion / Formal Concept Analysis (FCA)
      • asks knowledge engineer questions to complete a knowledge base
      • tool: OntoComp [Sertkaya et al.]

    Tools and Frameworks

    Table: Lexical ontology learning: informal or semi-formal data (e.g. texts)

    Tools and Frameworks II

    Problems and Challenges

    • Homonymy and polysemy e.g. [Ovchinnikova et al. 2006]
      • "Peter is sitting on the bank in front of the bank."
      • "An interesting book is lying on the table."
    • Semantics of adjectives
      • "red flower", "false friend"
    • Empty heads e.g. [Völker et al. 2005], [Cimiano and Wenderoth 2005]
      • "Tuna is a kind of fish. The Southern Bluefin is one of the most endangered types of Tuna."
    • Ellipsis and underspecification
      • "Mary started the book."
    • Anaphora (e.g. pronouns) e.g. [Cimiano and Völker 2005]
      • "There is an apple on the table. It is red."

    Problems and Challenges (Ctd.)

    • Metaphors and analogies e.g. [Gust et al. 2007]
      • " Live is a journey ."
    • Opinions, quotations and reported speech
      • "Tom thinks that dolphins are mammals."
    • What should be represented as an individual? e.g. [Zirn et al. 2008]
      • "The kangaroo is an animal living in Australia."
    • Class, relation (object property) or attribute (datatype property)?
      • "All elephants are grey."
      • "Easter monday is a national holiday."
    • Knowledge is changing e.g. [Stojanovic 2004], [Zablith et al. 2009]
      • "Pluto is a planet."

    Learning OWL Class Expressions

    • given:
      • background knowledge (particularly OWL/DL knowledge base)
      • positive and negative examples (particulary individuals in knowledge base)
    • goal:
      • logical formula (particularly OWL Class Expression) covering positive examples and not covering negative examples

    ILP and Semantic Web

    • since early 90s Inductive Logic Programming
    • only few approaches based on description logics
    • Web Ontology Language (OWL) becomes W3C standard in 2004
    • increasing number of RDF/OWL knowlegde bases, but ILP still mainly focuses on logic programs -->  research gap

    Why ILP in the Semantic Web?

    • Ontology Learning:
      • given class A in K
      • instances of A as positive examples
      • non-instances as negative examples
      • definitions can be learned if ABox data is available
    • improvement of existing ML problem solutions
    • direct usage of knowledge in the Semantic Web instead of conversion in e.g. horn clauses to apply ML methods

    TODO: /refinerho missing... Refinement Operators - Definitions

    • given a DL \(\mathcal{L}\), consider the quasi-ordered space \(\langle\mathcal{C}(\mathcal{L}),\sqsubseteq_ T\rangle\) over concepts of \(\mathcal{L}\)
    • \(\rho: \mathcal{C}(\mathcal{L})\to 2^{\mathcal{C}(\mathcal{L})}\) is a downward \(\mathcal{L}\) refinement operator if for any \(C \in \mathcal{C}(\mathcal{L})\):\[D \in \rho(C) \text{ implies } D \sqsubseteq_ T C\]
    • notation: Write \(C \to D\) instead of \(D \in \rho(C)\)
    • example refinement chain in \(\langle\mathcal{C}(EL), \sqsubseteq_ T\rangle\): \[ \top \to_{\rho} male \to male \sqcap \exists hasChild.\top \]

    Learning with Refinement Operators

    TODO: \refinerho missing... Properties of Refinement Operators

    An \(La\) downward refinement operator \(rho\) is called
    • finite iff \(\rho(C)\) is finite for any concept \(\in \mathcal{C}(\mathcal{L})\)
    • redundant iff there exist two different \(\rho\) refinement chains from a concept C to a concept D.
    • proper iff for \( C,D\in \mathcal{C}(\mathcal{L}), C refinerho D \) implies \(C \not\equiv_T D \)
    • ideal iff it is finite, complete, and proper.
    • complete iff for \( C,D\in \mathcal{C}(La) with D \sqsubseteq_ T C there is a concept E with E \equiv_ T D and a refinement chain C refinerho \cdots refinerho E\)
    • weakly complete iff for any concept \(C\) with \(C \sqsubseteq_T \top\) we can reach a concept \(E\) with \(E \equiv_T C\) from \(\top\) by \(\rho\).
    • ideal = complete + proper + finite

    Properties of Refinement Operators II

    • Properties indicate how suitable a refinement operator is for solving the learning problem:
      • Incomplete operators may miss solutions
      • Redundant operators may lead to duplicate concepts in the search tree
      • Improper operators may produce equivalent concepts (which cover the same examples)
      • For infinite operators it may not be possible to compute all refinements of a given concept
    • We researched properties of refinement operators in Description Logics
    • Key question: Which properties can be combined?

    Refinement Operator Property Theorem

    Theorem

    Maximal sets of properties of \(\mathcal{L}\) refinement operators which can be combined for \(\mathcal{L} \in \{\mathcal{ALC}, \mathcal{ALCN}, \mathcal{SHOIN}, \mathcal{SROIQ} \}\):

    1. {weakly complete, complete, finite}
    2. {weakly complete, complete, proper}
    3. {weakly complete, non-redundant, finite}
    4. {weakly complete, non-redundant, proper}
    5. {non-redundant, finite, proper}
    "Foundations of Refinement Operators for Description Logics",
    J. Lehmann, P. Hitzler, ILP conference, 2008

    "Concept Learning in Description Logics Using Refinement Operators",
    J. Lehmann, P. Hitzler, Machine Learning journal, 2010

    Refinement Operator Property Theorem II

    • no ideal refinement in OWL and many description logics
    • indicates that learning in DLs is hard
    • algorithms need to counteract disadvantages
    • goal: develop operators close to theoretical limits

    Definition of \(\mathcal{p}\)

     

    Definition of \(\mathcal{p}\) II

    Definition of \(\mathcal{p}\) III

    Definition of \(\mathcal{p}\) IV

    TODO: Characters.. \(\mathcal{p}\) Properties

    • \(\op\) is complete
    • \(\op\) is infinite , e.g. there are infinitely many refinement steps of the form: \( \top \refineop C_1 \sqcup C_2 \sqcup C_3 \sqcup \dots \)
    • \(\op\) not proper, but can be extended to a \emph{proper operator \(\opclosed\)} (refinements more expensive to compute)
    • \(\op\) is redundant:

    TODO: Characters.. \(\mathcal{p}\) Properties II

    • \(\op\) is complete
    • \(\op\) is infinite , e.g. there are infinitely many refinement steps of the form: \( \top \refineop C_1 \sqcup C_2 \sqcup C_3 \sqcup \dots \)
    • \(\op\) not proper , but can be extended to a proper operator \(\opclosed\) (refinements more expensive to compute)
    • \(\op\) is redundant :

    "A Refinement Operator Based Learning Algorithm for the \(\mathcal{ALC}\) Description Logic",
    J. Lehmann, P. Hitzler, ILP conference, 2008

    "Concept Learning in Description Logics Using Refinement Operators",
    J. Lehmann, P. Hitzler, Machine Learning journal, 2010

    OCEL

    • uses \(mathcal{p}\) for top down search
    • OCEL is complete - it always find a solution if one exists
    • highly configurable, e.g. felxible target language, termination criteria and heuristics
    • implements redundancy elimination technique with polynommial complexity wrt. search tree size based on ordered negation normal form
    • can handle infinite refinement operators by stepwise length-limited horizontal expansion

    TODO: Stepwise Node Expansion

    Scalability: Reasoning

    \(\mathcal{K} = \{ \mathcal{male} \sqsubseteq \mathcal{person}\),
    \(\mathcal{OnlyMaleChildren}(a)\),
    \(\mathcal{Person}(a), \mathcal{Male}(a_1), \mathcal{Male}(a_2)\),
    \(\mathcal{hasChild}(a,a_1), \mathcal{hasChild}(a,a_2) \} \)

    • given \(\mathcal{K}\), we want to learn a description of \(\mathcal{OnlyMaleChildren}\)
    • \(C = \mathcal{person} \sqcap \forall \mathcal{hasChild}.\mathcal{male}\) appears to be a good solution, but \(\mathcal{a}\) is not an instance of \(mathcal{C}\) under OWA
    • idea: dematerialise \(K\) using standard (OWA) DL reasoner, but perform instance checks using CWA
    • closer to intuition and provides order of magnitude performance improvements
    • optimised for thousands of instance checks on a static knowledge base

    Scalability: Stochastic Coverage Computation

    Heuristics often require expensive instance checks or retrieval, e.g.:

    \[\begin{aligned} %\acc(C) & = \frac{1}{2} \cdot \left( \frac{\mathbf{|R(A) \cap R(C)|}}{|R(A)|} + \sqrt{\frac{\mathbf{|R(A) \cap R(C)|}}{\mathbf{|R(C)|}}} \right) %\acc(C) & = \frac{1}{2} \cdot \left( \frac{|R(A) \cap R(C)|}{|R(A)|} + \sqrt{\frac{|R(A) \cap R(C)|}{|R(C)|}} \right) \end{aligned}\]

    Scalability: Stochastic Coverage Computation II

    Heuristics often require expensive instance checks or retrieval, e.g.:

    \[\begin{aligned} %\acc(C) & = \frac{1}{2} \cdot \left( \frac{a}{|R(A)|} + \sqrt{\frac{a}{b}} \right) \end{aligned}\]
    • replace \(|R(A) \cap R(C)|\) und \(|R(C)|\) by variables \(a\) and \(b\) we want to estimate
    • Wald-Method for computing the 95% confidence interval
    • first estimate \(mathcal{a}\), then the whole expressions
    • method can be applied to various heuristics
    • in tests on real ontologies up to 99% less instance checks and algorithm up to 30 times faster
    • low influence on learning results empirically shown in 380 learning problems on 7 real ontologies (differs by ca. \(0,2\% \pm 0,4\%\))

    Scalability: Fragment Extraction

    Extraction of Fragments from SPARQL Endpoints / Linked Data:

    "Learning of {OWL} Class Descriptions on Very Large Knowledge Bases",
    Hellmann, Lehmann, Auer, Int. Journal Semantic Web Inf. Syst, 2009

    Evaluation Setup

    • lack of evaluation standards in OWL/DL learning
    • procedure: convert existing benchmarks to OWL (time consuming, requires domain knowledge)
    • measure predictive accuracy in ten fold cross validation
    • part 1: evaluation against other OWL/DL learning systems
    • part 2: evaluation against other ML systems (carcinogenesis problem)
    • part 3: evaluation of ontology enginering

    Evaluation: Accuracy

    • Collection of 6 Benchmarks
    • OCEL often stat. significantly better than other algorithms for most benchmarks

    Evaluation: Readability

    • YinYang generates significantly longer solutions

    Evaluation: Runtime

    Carcinogenesis

    • goal: predict whether chemical compounds cause cancer
    • Why?
      • more than 1000 new substances each year
      • substances can often only be tested via long and expensive experiments on rats and mice
    • background knowledge:
      • database of US National Toxicology Program (NTP)
      • converted from Prolog to OWL

    "Obtaining accurate structural alerts for the causes of chemical cancers is a problem of great scientific and humanitarian value." (A. Srinivasan, R.D. King, S.H. Muggleton, M.J.E. Sternberg 1997)

    Carcinogenesis II

    • very challenging problem: low accuracy, high standard deviation
    • OCEL stat. sign. better than most other approaches

    Ontology Learning Evaluation

    • 5 PhD studens
    • 5 real ontologies in different domains
    • 998 decision of each test person for 92 classes
    • in 35% of the cases accepted suggestions for ontology enhancements
    • problem: ontology quality, modelling errors (unsatisfiable classes, disjunction and conjunction confused etc.)

    DL-Learner Project

    • DL-Learner Open-Source-Projekt: http://dl-learner.org, http://sf.net/projects/dl-learner
    • extensible platform for different learning problems and algorithms
    • Interfaces: command line, GUI, Web-Service
    • supports common OWL formats
    • allows different reasoners (via OWL API, DIG, OWLLink)
    • mloss.org (ML & Open Source Software): 1600 Downloads

    Applications

    • "classical" ML problems
      • carcinogenesis
      • other biomedical tasks

    Applications II

    • "classical" ML problems
      • carcinogenesis
      • other biomedical tasks
    • Ontology Learning
      • Protégé Plugin

    Applications III

    • "classical" ML problems
      • carcinogenesis
      • other biomedical tasks
    • Ontology Learning
      • Protégé Plugin
      • OntoWiki Plugin

    Applications IV

    • "classical" ML problems
      • carcinogenesis
      • other biomedical tasks
    • Ontology Learning
      • Protégé Plugin
      • OntoWiki Plugin
      • ORE

    Applications V

    • "classical" ML problems
      • carcinogenesis
      • other biomedical tasks
    • Ontology Learning
      • Protégé Plugin
      • OntoWiki Plugin
      • ORE
    • Recommendation/Navigation
      • moosique.net

    Applications VI

    • "classical" ML problems
      • carcinogenesis
      • other biomedical tasks
    • Ontology Learning
      • Protégé Plugin
      • OntoWiki Plugin
      • ORE
    • Recommendation/Navigation
      • moosique.net
      • DBpedia Navigator

    Applications VI

    • "classical" ML problems
      • carcinogenesis
      • other biomedical tasks
    • Ontology Learning
      • Protégé Plugin
      • OntoWiki Plugin
      • ORE
    • Recommendation/Navigation
      • moosique.net
      • DBpedia Navigator
    • other/external:
      • ISS (Gerken et al.)
      • Learning in Probabilistic DLs (Ochoa Luna et al.)
      • TIGER Corpus Navigator (Hellmann et al.)

    Conclusions

    • Ontology Learning is a diverse research area involving several research disciplines (NLP, Machine Learning, Ontology Engineering)
    • approaches vary in used data sources and the expressiveness of the created ontologies
    • refinement operator based learning as one method for learning definitions (with applications outside of learning ontologies)
    • new Wiki (under construction): http://ontology-learning.net
    • new ontology learning book in 2011

    How to set links?

    • Manually
      • Uriqr or Sindice to search for existing URI
    • Automatic generation
      • Link Discovery
        • LIMES – Link Discovery Framework for Metric Spaces provides time-efficient approaches for discovery and computing the results of link specifications.
        • Silk - A Link Discovery Framework for the Web of Data tool for discovering relationships between data items within different Linked Data sources. Data publishers can use Silk to set RDF links from their data sources to other data sources on the Web.
        • TopBraid Composer (ontology editor made by TopQuadrant) has a wizard for linking ontology instances to corresponding DBpedia concepts.
        • SemMF SemMF is a framework for calculating semantic similarity between objects that are represented as arbitrary RDF graphs. The framework allows taxonomic and non-taxonomic concept matching techniques to be applied to selected object properties.
        • Yves Equivalence Miner together with an experience report about the problems he ran into while interlinking Jamendo and Musicbrainz.

    Link Discovery

    • Cannot be carried out manually at Web scale
      • 31 billion triples
      • Freebase contains over 20 million entities
      • Over 250 knowledge bases
    • Automatic approaches
      • Ontology Matching
      • Instance Matching

    Ontology Matching

    • Goal : Find OWL class expression that express the relation between the ontologies

    OM: Approaches

    • Sense-Based: WordNet hierarchy distance

    OM: Approaches

    • Extensional techniques: Compare the instances

    OM: Approaches

    • Often smaller datasets that in instance matching
    • For most ontology, simple matching suffices
    • Problem: Accuracy in the long tail
    • Need for formally correct statements (DL statements)

    Link Discovery

    • Goal: Discover related entities across knowledge bases

    TODO: Bild.. Formal Definition

    • Goal: For all sS and tT, find all pairs (s, t) such that s(s, t) > q
    • Equivalent formulation : Find classifier C: S×T  {-1, +1} such that C(s, t) = -1 iff s(s, t) < q , else C(s, t) = +1

    Link Discovery

    • Two main problems
      • Runtime
      • Complexity of specifications
    • Runtime
      • Large number of instances
      • Brute-force approach in O(|S||T|)
      • Comparisons of strings comprising m tokens is in O(m2)

    Link Discovery

    • Two main problems
      • Runtime
      • Complexity of specifications
    • Complexity of specifications
      • Which properties should be used?
      • Which similarity measures work best?
      • Which threshold settings should be used?

    Link Discovery

    LD: Runtime

    • Aggregration and Blocking (SILK)

    LD: Runtime

    • Aggregration and Blocking (SILK)

    TODO: Bild .. LD: Runtime

    • Hybrid (LIMES)

    LD: Runtime

    • PassJoin

    TODO: Bild .. HYPPO

    • D = t/a

    TODO: Bild.. HYPPO

    HYPPO

    HYPPO

    • D

    HYPPO

    • D

    HYPPO

    • Approximation rate:
    • Number of cubes:
    • Tradeoff: high granularity leads to better approximation but to more cubes
    • a = 1
    • a = 2
    • a = 4

    Learning Link Specifications

    • Supervised Learning
      • Batch learning
      • Active Learning
    • Unsupervised Learning
      • Self-Configuration
      • Optimization of objective function

    RAVEN

    • Hospital/Residents
    • Learning classifier C involves learning
    • Two sets of restrictions resp. that specify the sets S resp. T,
    • a specification of a complex similarity measure σ as the combination of several atomic similarity measures σ1, ..., σn and
    • a set of weights/thresholds q1, ..., qn such that qi is the threshold for σi.
    •  

    RAVEN

    • Hospital/Residents + Classifier model
    • Learning classifier C involves learning
    • Two sets of restrictions resp. that specify the sets S resp. T,
    • a specification of a complex similarity measure σ as the combination of several atomic similarity measures σ1, ..., σn and
    • a set of weights/thresholds q1, ..., qn such that qi is the threshold for σi.
    •  

    RAVEN

    • Active Perceptron Learning
    • Learning classifier C involves learning
    • Two sets of restrictions resp. that specify the sets S resp. T,
    • a specification of a complex similarity measure σ as the combination of several atomic similarity measures σ1, ..., σn and
    • a set of weights/thresholds q1, ..., qn such that qi is the threshold for σi.
    •  

    RAVEN

    RAVEN

    RAVEN

    RAVEN

    Hospital/Residents

    Hospital/Residents

    TODO: Formal Definition

    • Learning classifier C involves learning
      1. Two sets of restrictions resp. that specify the sets S resp. T,
      2. a specification of a complex similarity measure σ as the combination of several atomic similarity measures σ1, ..., σn and
      3. a set of thresholds q1, ..., qn such that qi is the threshold for σi.
    • NB: Assume restrictions are class restrictions

    Restriction Discovery

    1. Start with source and target knowledge bases KS and KT

    Restriction Discovery

    2. Sample instances randomly across KS and KT

    Restriction Discovery

    3. Count the number of owl:sameAs links between Si and Tj

    Restriction Discovery

    4. Solve equivalent Hospital/Resident problem

    Restriction Discovery

    4. Solve equivalent Hospital/Resident problem

    Problem: Not enough owl:sameAs links

    Restriction Discovery

    3. Count the number of instances of Si and Tj that share common property values

    Restriction Discovery

    4. Solve equivalent Hospital/Resident problem

    Restriction Discovery

    • Source

    • Target

    • S

    • T

    • Drugbank

    • Disesome

    • Targets

    • Genes

    • Sider

    • Diseasome

    • Side-Effect

    • Diseases

    • DBpedia

    • Dailymed

    • Organization

    • Organization

    • Sider

    • Dailymed

    • Drugs

    • Offer

    • Drugbank

    • DBpedia

    • Targets

    • Protein

    RAVEN

    • Begin with unclassified links

    TODO: Formel.. RAVEN

    Initialize classifier:

    TODO: sign.. RAVEN

    • Get most informative positive (L+) and negative candidates (L-)

    RAVEN

    • Ask the oracle for classification

    RAVEN

    • Ask the oracle for classification

    TODO: Formel.. RAVEN

    • Update L:

    RAVEN

    • Fetch most informative positive and negative candidates

    RAVEN

    • Ask oracle

    TODO: Formel.. RAVEN

    • Terminate when agrees with oracle on all and return classification

    TODO: Bild.. Goal

    • Drugbank
    • Dailymed
    • db:Drugs
    • rdfs:label
    • dm:name
    • db:brandName
    • dm:name
    • dm:Offer
    • Trigrams
    • Trigrams
    • > 0.9

    TODO: Bild.. EAGLE

    • Idea: Specifications are trees
    • Goal: Learn elements of trees through genetic operations until best specification is found

    TODO: Bild.. EAGLE

    • Step 1: Generate initial population
      • Random process (property pairs, thresholds)
      • Compute fitness
      • Fitness = F1-measure w.r.t known data

    EAGLE

    • Step 2: Evolve population
      • Tournament between two individuals
      • Two operators: Mutation and crossover

    EAGLE

    • Step 2: Evolve population
      • Tournament between two individuals
      • Two operators: Mutation and crossover

    EAGLE

    • Step 2: Evolve population
      • Tournament between two individuals
      • Two operators: Mutation and crossover

    EAGLE

    • Step 2: Evolve population
      • Tournament between two individuals
      • Two operators: Mutation and crossover

    EAGLE

    • Step 2: Evolve population
        • Tournament between two individuals
      • Two operators: Mutation and crossover

    EAGLE

    • Step 3: Computation of most informative links
      • Previous approaches define amount of information of link as closeness to the decision boundary
      • Here, use disagreement amongst elements of population of size n
      • Function is maximal when n/2 count (s,t) as positive and n/2 as negative
      • Can be modelled with other functions such as entropy