IBM 1620 data processing machine, 1962

Who is this?

The Web

The Web accompanies the transition from an industrial to an information society and provides the infrastructure for a new quality of information handling regarding acquisition as well as provisioning

  • high availability
  • high relevance
  • low cost

 

The Web penetrates society

  • Social contacts (social networking platforms, blogging, ...)
  • Economics (buying, selling, advertising, ...)
  • Administration (eGovernment)
  • Work life (information gathering and sharing)
  • Recreation (games, role play, creativity, ...)
  • Education (eLearning, Web as information system, ...)

The current Web

 Immensely successful.

  • Huge amounts of information and data.
  • Syntax standards for transfer of structured data.
  • Machine-processable, human-readable documents.
BUT:
  • Content/knowledge cannot be accessed by machines.
  • Meaning (semantics) of transferred data is not accessible.

Limitations of the Web

Too much information with too little structure and made for human consumption
  • Content search is very simplistic
  • →  future requires better methods
Web content is heterogeneous
  • in terms of content
  • in terms of structure
  • in terms of character encoding
  • → future requires intelligent information integration
Humans can derive new (implicit) information from given pieces of information but on the current Web we can only deal with syntax
  • → requires automated reasoning techniques

What Google does not find

There are many information needs current search engines can not satisfy:

  • Apartments for rent close to well rated Thai restaurants
  • Bi-lingual English-German child care in Berlin reachable in 15 minutes from my place of work
  • Kid-friendly holiday destinations with culture and sports activities
  • Researchers working in south-east Asia on information retrieval topics
  • ERP service providers with offices in Vienna and Berlin
  • ...

We have subconsciously learned not to ask search engines such questions.

In principle, all the required knowledge is on the Web – most of it even in machine-readable form. However, without automated data integration, processing (and reasoning) we cannot obtain a useful answer.

What's the problem with the Web

  • inability to integrate and fuse information from different sources
  • there is lack of comprehensive background knowledge to interpret information found on the Web
  • current Web search is restricted to text in a certain language - there are many “smaller” languages with much less information available than in English

Basic ingredients for the Semantic Web

  • Open Standards for describing information on the Web
  • Methods for obtaining further information from such descriptions

 
We’ll talk about these matters in this course.

Data Models, Access & Integration

Data Integration Enterprise Information Integration
sets of heterogeneous data sources appear as a single, homogeneous data source
Data Warehousing
  • Based on extract, transform load (ETL)
  • Global-As-View (GAV)
Research
  • Mediators
  • Ontology-based
  • P2P
  • Web service-based
Data Web
  • URIs as entity identifiers
  • HTTP as data access protocol
  • Local-As-View (LAV)
Data Access Object relational mappings (ORM)
  • NeXT’s EOF / WebObjects
  • ADO.NET Entity Framework
  • Hibernate
Procedural APIs
  • ODBC
  • JDBC
Query Languages
  • Datalog, SQL
  • XPath/XQuery
  • SPARQL
Linked Data
  • de-referenceable URIs
  • RDF serialization
Data Models   RDBMS
  • organize data in relations, rows, cells
  • Oracle, DB2, MS-SQL
     

LOD Cloud May 2007

LOD = Linked Open Data

LOD Cloud October 2007

LOD Cloud February 2008

LOD Cloud September 2008

LOD Cloud March 2009

Colours indicate different domains; e.g.: orange = social networks

LOD Cloud September 2010

LOD Cloud September 2011

LOD Cloud August 2014

First update after 3 years – what do you think were the reasons?

LOD Cloud February 2017

LOD cloud February 2017

The Web of Data

 
  • >70 bilion facts
  • covering many different domains (life-sciences, geo, user generated content, government, bibiographic, ...)

Map to the Semantic Web

The Semantic Data Web Stack

User Interface & Applications Trust Crypto Proof Unifying Logic Rules: RIF Ontology: OWL Query: SPARQL RDF-Schema Data Interchange: RDF XML URI Unicode

… also known as “layer cake”

How Mark A. Greenwood realised the Semantic Web:

URIs and Unicode

  • URI = Uniform Resource Identifier
  • Used to create globally unique names for resources
  • Every object with clear identity can be a resource
    • Books, places, organizations ...
  • In the books domain the ISBN serves the same purpose
  • IRIs: Unicode-aware extension of URIs (I = Internationalized)

Resource Description Framework – RDF

 Information is represented in RDF in triples (also called statements, facts):

  • Modeled on linguistic categoriesbut not always consistent
  • Allowed assignments:
    • Subject: URI or blank node
    • Predicate: URI (a.k.a. property)
    • Object: URI, blank node or literal
  • Node and edge labels should be unambiguous, so that the original graph is reconstructable from a list of triples

 

RDF Schema

Not all triples make sense:

Cinema  AlbertEinstein  2012

How can we constrain the use of RDF? 

RDFS (S = “Schema”) allows to define classes, properties and restrict their use.

SPARQL – Query Language for RDF

SELECT * WHERE { jwebsp:John  foaf:knows  ?friend }

Web Ontology Language – OWL

  • OWL: acronym for Web Ontology Language, more easily pronounced than WOL
  • family of languages for authoring ontologies
  • since 2004, OWL 2.0 since 2009     
  • Semantic fragment of FOL

Features

  • Instantiation of classes by individuals
  • Concept hierarchies (taxonomies, inheritance): classes, terms
  • Binary relations between individuals: Properties, Roles
  • Properties of relations (e.g., range, transitive)
  • Data types (e.g. Numbers): concrete domains
  • Logical means expression
  • Clear semantics!

RDFa Content Editor – RDFaCE

supports the automatic semantic annotation of texts

http://rdface.aksw.org/

Literature

  • Pascal Hitzler, Markus Krötzsch, Sebastian Rudolph: Foundations of Semantic Web Technologies, Chapman & Hall/CRC, 2009, 455 pages, hardcover, ISBN: 9781420090505, http://www.semantic-web-book.org
  • Amit Sheth, Krishnaprasad Thirunarayan:  Semantics Empowered Web 3.0: Managing Enterprise, Social, Sensor, and Cloud-based Data and Services for Advanced Applications (Synthesis Lectures on Data Management),  Morgan & Claypool Publishers (December 19, 2012), ISBN: 1608457168
  • Tom Heath, Christian Bizer:  Linked Data (Synthesis Lectures on the Semantic Web: Theory and Technology), Morgan & Claypool Publishers; 1 edition (February 20, 2011), ISBN: 1608454304. http://linkeddatabook.com

Questions

All the corresponding questions for the Introductions are covered in the Questions part of the Deck.

 

Motivation

How do you encode the piece of knowledge:

"The theory of relativity was discovered by Albert Einstein." 

or or
There is no unique way (in XML) to represent knowledge.
Information represented in such ways is not easy to integrate. (Why?)
RDF helps to solve this problem.

Goals

  • Understand the RDF data model, including
    • URI and IRI concepts
    • Triples
    • Resources
    • Literals
    • Blank nodes
    • Lists

Prerequisites

  • Basic understanding of Web technologies, data types

RDF Overview

  • RDF = Resource Description Framework
  • W3C Recommendation since 1998
  • RDF is a data model
    • Originally used for metadata for web resources, then generalized
    • Encodes structured information
    • Universal, machine readable exchange format
  • Data structured in graphs
    • Vertices, edges

Parts of the RDF graph

  •  URIs
    • Used to reference resources unambiguously
  • Literals
    • Describe data values with no clear identity like "100 km/h"
  • Blank nodes
    • Facilitate existential quantification for an individual with certain properties without naming it

Example of an RDF graph

 

RDF Triple

 Components of an RDF triple:

  • Modeled using linguistic categories (but not always consistent)
  • Allowed assignments:
    • Subject: URI or blank node
    • Predicate: URI (a.k.a. property)
    • Object: URI, blank node or literal
  • Node and edge labels should be unambiguous, so that the original graph is reconstructable from triple list

URI

  • URI = Uniform Resource Identifier
  • Used to create globally unique names for resources
  • Every object with a clear identity can be a resource
    • Books, places, organizations ...
  • In books domain the ISBN serves the same purpose

URI Syntax

  • Extension of the URL concept
  • Not every URI denotes a web document, but the URL is often used as URI for web documents
  • Starts with URL schema, which is separated from the rest by ":"
    • examples: http, ftp, mailto, file
  • Typically hierarchical structure
    • [scheme:][//authority][path][?query][#fragment]

Self-defined URIs

  • Necessary if resource has no URI yet or URI is not known
  • Use HTTP URIs of own website to avoid naming collisions
  • Facilitates creation of documentation of URI at this location
  • Example: http://jens-lehmann.org/foaf.rdf#i


  • Separation of URI for …
    • a resource (a real-world thing)
    • and its documentation (e.g. an HTML page)
    … with the help of URI references (with “#”-attached fragments) or content negotiation
  • Example: URI for Shakespeare's "Othello":
    • bad (why?): http://de.wikipedia.org/wiki/Othello
    • good: http://de.wikipedia.org/wiki/Othello#URI

IRIs

  • IRI = Internationalized Resource Identifier
  • Generalization of URI concept
  • IRI can contain Unicode
  • Example:
    • http://www.example.org/Wüste
    • http://www.example.org/사막


Literals

  • Used to model data values
  • Representation as strings
  • Interpretation through datatype
  • Literals without datatype are treated as strings
  • Literals may never be the origin of a node of an RDF graph
  • Edges may never be labeled with literals

Turtle Syntax

  • Language to serialize RDF Triples to strings
  • Turtle – Terse RDF Triple Language  
  • URIs in angle brackets: <http://dbpedia.org/resource/Leipzig>
  • Literals in quotes
    • "Leipzig"@de 
    • "51.333332"^^xsd:float
  • Triples are subject-predicate-object sentences terminated with a dot.
    <http://dbpedia.org/resource/Leipzig> <http://www.w3.org/2000/01/rdf-schema#label> "Leipzig"@de.
    
  • Whitespace and line breaks are ignored outside of identifiers
  • Status:  W3C Recommendation, http://www.w3.org/TR/turtle/

Turtle Abbreviations (1/2)

In Turtle one can use abbreviations

  • Syntax: @prefix abbr ':'  <URI> .
  • E.g. @prefix dbr:  <http://dbpedia.org/resource/> .

One can transform

<http://dbpedia.org/resource/Leipzig> <http://www.w3.org/2000/01/rdf-schema#label> "Leipzig"@de.

into

@prefix dbr: <http://dbpedia.org/resource/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
dbr:Leipzig rdfs:label "Leipzig"@de .

Turtle Abbreviations (2/2)

  • Triples with the same subject can be grouped together
    @prefix rdf: 
    ...
    @prefix geo: 
    
    dbr:Leipzig dbp:hasMayor dbr:Burkhard_Jung ;
                rdfs:label   "Leipzig"@de ;
                geo:lat      "51.333332"^^xsd:float ;
                geo:long     "12.383333"^^xsd:float .   
    
  • Even triples with the same subject and predicate can be grouped together
    @prefix dbr:  .
    @prefix dbp:  .
    dbr:Leipzig dbp:locatedIn dbr:Saxony, dbr:Germany;
                dbp:hasMayor  dbr:Burkhard_Jung .
    

Literals II – Datatypes

  •  Example: xsd:decimal

Datatypes in RDF

  • So far: literals are untyped, treated as strings: "02" < "100" < "11" < "2"
  • Typing allows better, in other words, semantic interpretation of values
  • Datatypes get identified by URIs and are freely choosable
  • Typically usage of XML Schema Datatypes (XSD)
  • Syntax: "data value"^^<datatype-URI>
  • rdf:HTML and rdf:XMLLiteral are the only predefined datatypes in RDF
    • Used for HTML and XML fragments

Example

 Graph:



Turtle:


@prefix dbr: <http://dbpedia.org/resource/> .
@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
dbr:Leipzig    geo:lat "51.333332"^^xsd:float ; geo:long "12.383333"^^xsd:float .

Language declaration

  • Influences only untyped literals
  • Example:  
  • In RDF 1.0 the following literals were all different, but implementations typically treated them the same.
  • into    

Turtle Abbreviations (2/2)

  • Triples with the same subject can be grouped together
    @prefix rdf: 
    ...
    @prefix geo: 
    
    dbr:Berlin  dbpedia:country  dbpedia:Germany ;
                rdfs:label       "Berlin"@de ;      
    
  • Even triples with the same subject and predicate can be grouped together
    @prefix dbr: 
    ...
    @prefix dbp: 
    
    dbr:Leipzig dbp:locatedIn dbr:Saxony, dbr:Germany;
                dbp:hasMayor  dbr:Burkhard_Jung .
    

Advantages and Disadvantages of Turtle

  •  Advantages:
    • Concise, thus efficient to store
    • Easy to read for humans
  • Disadvantages:
    • Limited tool support so far (compared to RDF/XML)

N-Triples

  • N-Triples is a line-based, plain text format (http://www.w3.org/TR/n-triples/)
  • N-Triples is a subset of Turtle and Notation 3
    • Abbreviations and groupping not allowed
    • Limited to ASCII character set
  • All tools which support input in either of those formats above will support N-Triples
  • Don't confuse it with Notation 3: Notation 3 is a superset of Turtle and N-Triples
<http://www.w3.org/2001/sw/RDFCore/ntriples/> <http://purl.org/dc/elements/1.1/creator> "Dave Beckett".

<http://www.w3.org/2001/sw/RDFCore/ntriples/> <http://purl.org/dc/elements/1.1/creator> "Art Barstow".

<http://www.w3.org/2001/sw/RDFCore/ntriples/> <http://purl.org/dc/elements/1.1/publisher> <http://www.w3.org/>.

N-Quads

  •  Extends N-Triples with context
<subject> <predicate> <object> <context> .
  • <context> may denote (in state-of-the-art RDF Stores) the provenance of data
    • useful when linking datasets
<http://example.org/bob/foaf.rdf#me> <http://xmlns.com/foaf/0.1/homepage>

<http://example.org/bob/> <http://example.org/bob/foaf.rdf> .
  • <context> can be a URI or a blank node or a literal

Why one should (not) use XML for RDF?

Why?

  • Better support of tools in many programming languages and environments
  • Wide spread of XML in both business and academia
  • RDF standard states that if  RDF data is published it should be available in RDF/XML

Why not?

  • RDF/XML is complicated to understand due to the encoding of a graph in triples and finally in an XML tree
  • RDF/XML blows up files (might be mitigated by compression)
  • generates much overhead, since XML documents have to be parsed and the results additionally processed to obtain the RDF data

XML Syntax of RDF

  • Usage of XML namespaces to disambiguate tag names 
  • RDF tags have their own fixed namespace, usually abbreviated using prefix rdf
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"           xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"           xmlns:xsd= "http://www.w3.org/2001/XMLSchema#"           xmlns:dbp="http://dbpedia.org/property/"           xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#">   <rdf:Description rdf:about="http://dbpedia.org/resource/Leipzig">     <dbp:hasMayor  rdf:resource="http://dbpedia.org/resource/Burkhard_Jung"/>     <rdfs:label xml:lang="de">Leipzig</rdfs:label>     <geo:lat rdf:datatype="float">51.3333</geo:lat>     <geo:long rdf:datatype="float">12.3833</geo:long>   </rdf:Description> </rdf:RDF>

XML Syntax: rdf:Description

  • Each rdf:Description element stands for a subject
    • URI is the value of rdf:about attribute
  • Each element of rdf:Description stands for a predicate-object pair
    • Name of the child element is the predicate name
    • Value of rdf:resource is the URI of the object
<rdf:Description rdf:about="http://dbpedia.org/resource/Leipzig">   <rdfs:label xml:lang="de">Leipzig</rdfs:label> </rdf:Description>

XML Syntax: Abbreviations

  • Literals can be enclosed by predicates as free form text
  • One subject can contain several property elements
  • One object description can nest several further subjects, e.g.
<rdf:Description rdf:about="http://dbpedia.org/resource/Leipzig"> <dbr:name>Leipzig</dbr:name> <dbp:hasMayor> <rdf:description rdf:about="http://dbpedia.org/resource/Burkhard_Jung"> <dbp:name>Burkhard Jung</dbp:name> </rdf:description> </dbp:hasMayor> <geo:lat rdf:datatype=".../XMLSchema#float">51.3333</geo:lat> <geo:long rdf:datatype=".../XMLSchema#float">12.3833 </geo:long> </rdf:Description>

XML Syntax: Attributes for Literals

  • Literals can expressed using XML Attributes
  • Attribute names will be property URIs
  • Subject URI given by rdf:about
<rdf:Description rdf:about="http://dbpedia.org/resource/Leipzig"
          dbp:name="Leipzig">
    <geo:lat rdf:datatype="float">51.3333</geo:lat>
    <geo:long rdf:datatype="float">12.3833</geo:long>
</rdf:Description>

XML Syntax base URIs

  • Definition of a base URI, against which relative URIs resolve
  • Relative URIs have no schema part.
  • Resolution is often string concatenation (base + relative), but also has more complicated cases (see RFC 3986)
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xml:base="http://dbpedia.org/resource/" >
  <rdf:Description rdf:about="Berlin">
    <property:country rdf:resource="Germany"/>
  </rdf:Description>
</rdf:RDF>

Advantages and Disadvantages of RDF/XML

  • Advantages:
    • Good tool support
    • Reuse of XML transformation tools via XSLT
    • Parsing and in-memory representation via DOM/SAX
  • Disadvantages:
    • Very long and hard to read

RDFa Syntax

  • RDFa = RDF in attributes
  • Developed to embed RDF into HTML and XML
  • Embedded triples can be accessed or extracted
  • IRIs can be used
    (XML and HTML nowadays typically encoded as UTF-8 Unicode)
  • RDFa = Microformats done right ;-)

Motivation

 presentation vs. semantics

On the left, what browsers see. On the right, what humans see. Can we bridge the gap so that browsers see more of what we see?

The schema.org Vocabulary

schema.org :

Example: Movie description.

What the user sees:
Avatar
Director: James Cameron (born August 16, 1954)
Science fiction
Trailer

Movie homepage in HTML

What the browser sees:

<div class="movie"> <h1>Avatar</h1> <div class="director"> Director: James Cameron (born August 16, 1954) </div> <span class="genre">Science fiction</span> <a href="../movies/avatar-theatrical-trailer.html"> Trailer</a> </div>

Movie homepage with RDFa

RDFa annotations using the schema.org vocabulary:

<div vocab="http://schema.org/" typeof="Movie"> <h1 property="name">Avatar</h1> <div rel="director" typeof="Person"> Director: <span property="name">James Cameron</span> (born <span property="birthDate" content="1954-08-16"> August 16, 1954</span>) </div> <span property="genre" xml:lang="en">Science fiction</span> <a href="../movies/avatar-theatrical-trailer.mp4" rel="trailer">Trailer</a> </div>

RDFa Visualised as a Graph

You can do this interactively with RDFa Play .

<div vocab="http://schema.org/" typeof="Movie"> <h1 property="name">Avatar</h1> <div rel="director" typeof="Person"> Director: <span property="name">James Cameron</span> (born <span property="birthDate">August 16, 1954</span>) </div> <span property="genre">Science fiction</span> <a href="../movies/avatar-theatrical-trailer.html" rel="trailer">Trailer</a> </div>

Social Data with schema.org

Example: reviews of a movie

schema.org in a Search Engine

Google calls these “Rich Snippets”.

  • Observe the rich appearance of the 3rd result.
  • Note: Annotation does not influence ranking.

CURIEs

Short notation for URIs (Compact URIs)


prefix    ::= NCName

reference ::= ( ipath-absolute / ipath-rootless / ipath-empty ) 
[ "?" iquery ] [ "#" ifragment ] (as defined in [RFC3987])

curie     ::=  [ [ prefix ] ':' ] reference

RDFa requires the following context information:

  • the set of mappings from prefixes to URIs is provided by the current in-scope prefix declarations of the [current element] during parsing;
  • the mapping to use with the default prefix (e.g. :name) is the current default prefix mapping;
  • the mapping to use when there is no prefix is not defined for RDFa in HTML, which effectively prohibits the use of CURIEs that do not contain a colon
    • (however with default terms and a default vocabulary there are further convenience mechanisms)
  • the mapping to use with the '_' prefix , is not explicitly stated, but since it is used to generate [bnode]s, its implementation needs to be compatible with the RDF definition.

RDFa Attributes

Some of the following attributes are reused from HTML; in particular, href and src are redundant with resource but convenient in HTML.

  • about a CURIE or IRI, used for stating what the data is about (a subject in RDF terminology);
  • content a CDATA string, for supplying machine-readable content for a literal (a literal object, in RDF terminology);
  • datatype a term or CURIE or absolute IRI representing a datatype, to express the datatype of a literal;
  • href a traditionally navigable IRI for expressing the partner resource of a relationship (a resource object, in RDF terminology);
  • inlist An attribute used to indicate that the object associated with a rel or property attribute on the same element is to be added to the list for that predicate. The value of this attribute must be ignored. Presence of this attribute causes a list to be created if it does not already exist.
  • prefix a white space separated list of prefix-name IRI pairs of the form NCName ':' ' '+ xsd:anyURI
  • property a white space separated list of terms or CURIEs or absolute IRIs, used for expressing relationships between a subject and either a resource object if given or some literal text (also a predicate);
  • rel a white space separated list of terms or CURIEs or absolute IRIs, used for expressing relationships between two resources (predicates in RDF terminology);
  • resource a CURIE or IRI for expressing the partner resource of a relationship that is not intended to be navigable (e.g., a “clickable” link) (also an object);
  • rev a white space separated list of terms or CURIEs or absolute IRIs, used for expressing reverse relationships between two resources (also predicates);
  • src an IRI for expressing the partner resource of a relationship when the resource is embedded (also a resource object);
  • typeof a white space separated list of terms or CURIEs or absolute IRIs that indicate the RDF type(s) to associate with a subject;
  • vocab A IRI that defines the mapping to use when a term is referenced in an attribute value.

RDFa Example

<html prefix="rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# rdfs: http://www.w3.org/2000/01/rdf-schema# xsd: http://www.w3.org/2001/XMLSchema# dbp: http://dbpedia.org/property/ geo: http://www.w3.org/2003/01/geo/wgs84_pos#"> <head> <title>Leipzig</title> </head> <body about="dbr:Leipzig"> <h1 property="rdfs:label" xml:lang="de">Leipzig</h1> <p>Leipzig is a city in Germany. It is located at latitude <span property="geo:lat" datatype="xsd:float">51.3333</span> and longitude <span property="geo:long" datatype="xsd:float">12.3833</span>. </p> </body> </html>

RDFa Lite

<p vocab="http://schema.org/" typeof="Person"> My name is <span property="name">Sören Auer</span> and you can give me a ring via <span property="telephone">49-341-97-32367</span> or visit <a property="url" href="http://aksw.org/SoerenAuer.html"> my homepage</a>. </p>

Advantages and Disadvantages of RDFa

 
  • Advantages
    • Integrated handling of human (HTML) and machine (RDF) representation
    • Re-uses a number of HTML features
    • "principles of interoperable metadata" are enforced by RDFa:
      • Publisher Independence – every web site can use its own representation
      • Data Reuse - data is not duplicated. Separate RDF and HTML sections are not required anymore for the same content.
      • Self Containment - RDF data is nevertheless separated from the HTML (sitting in special attributes, can be extracted)
      • Schema Modularity - attributes are reusable
      • Evolvability - additional fields can be added and XML transforms can extract the semantics of the data from an HTML file
  • Disadvantages
    • Readability is lower than with Turtle
    • Backwards compatibility with RDFa 1.0 adds some overhead.

JSON-LD

  • JSON = JavaScript Object Notation
  • JSON data are JavaScript data structures and can be interpreted via eval()
  • For most other programming languages there also exist parsers
  • Java Script Object  Notation for Linking Data
  • W3C Recommendation of 2014     
  • Plain JSON example:
{
  "name": "John Lennon",
  "born": "1940-10-09",
  "spouse": "http://dbpedia.org/resource/Cynthia_Lennon"
}
  • How can we enrich this with semantics?
  • JSON-LD answer: add an RDF context!  

JSON-LD Contexts

  • @context is a special keyword to make explicit the semantic context in which some JSON data is communicated.
  • The context includes, e.g., name-to-IRI mappings.
  • The @id keyword assigns IRIs to things.
{ "@context":
  { "name": "http://schema.org/name",
    "born": { "@id": "http://schema.org/birthDate",
              "@type": "http://www.w3.org/2001/XMLSchema#date" },
    "spouse": { "@id": "http://schema.org/spouse",
                "@type": "@id" } }
  "@id": "http://dbpedia.org/resource/John_Lennon",
  "name": "John Lennon",
  "born": "1940-10-09",
  "spouse": "http://dbpedia.org/resource/Cynthia_Lennon" }
  • Note that the remaining JSON remained unchanged.
  • It is possible to centralise context definitions in external locations and point to them:
{ "@context": "http://json-ld.org/contexts/person.jsonld",
  "@id": "http://dbpedia.org/resource/John_Lennon",
  "name": "John Lennon", ... }

RDF/JSON

{ "http://dbpedia.org/resource/Leipzig" : { "http://dbpedia.org/property/hasMayor": [ { "type":"uri", "value":"http://dbpedia.org/resource/Burkhard_Jung" } ], "http://www.w3.org/2000/01/rdf-schema#label": [ { "type":"literal", "value":"Leipzig", "lang":"en" } ] , "http://www.w3.org/2003/01/geo/wgs84_pos#lat": [ { "type":"literal", "value":"51.3333", "datatype":"http://www.w3.org/2001/XMLSchema#float" } ] "http://www.w3.org/2003/01/geo/wgs84_pos#lon": [ { "type":"literal", "value":"12.3833", "datatype":"http://www.w3.org/2001/XMLSchema#float" } ] } }

RDF/JSON Syntax

  • RDF/JSON has the form subject S, predicate P, object O
     
    { "S" : { "P" : [ O ] } }      
    
  • Type : has to be an URI, literal or blank node AND  has to be written in lower case
  • Value : Describes data of an object
    • Best practice: Render the whole URI
  • Lang : Language of an literal. Optional, but if it exists it might not be empty
  • Datatype : Data type of an object, optional.

Advantages and Disadvantages of JSON-LD

  • Advantages:
    • Compact data format to exchange data between applications
    • Very good tool support (almost every programming language supports JSON)
    • Less overhead while parsing and serialization than XML
  • Disadvantages
    • RDF structures that go beyond key/value pairs (i.e. property/object pairs attached to a given subject) are not as easy to read for humans as in Turtle

IRI Serialization (1/2)

  • IRIs are a URI generalization that allows Unicode characters.
    • Defined in RFC 3987
  • Only the following formats are fully compatible with the IRI RFC
    • RDFa
    • Notation 3
    • JSON-LD (the obsolete RDF/JSON as well)
  • NTriples & NQuads do not support IRIs
    • Both use 7-bit US-ASCII character encoding
    • http://www.w3.org/TR/rdf-testcases/#character
       

IRI Serialization (2/2)

  •  RDF/XML & Turtle provide partial IRI support
    • Their grammar is not fully mapped to the IRI grammar
  • In RDF/XML predicates must be declared as XML Elements
  • Turtle is fully compatible only when using absolute IRIs

Syntax and Usage of data types

  • Difference between lexical and value domain
    • Lexical: "3.14", "+04.1300", "-2,5"
    • Value: 3.14, 4.13, -2.5
  • Untyped literals get treated like char sequences
    • "02" < "100" < "11" < "2" (lexicographic order)
  • Typing allows to handle values 'semantically'
    • Data types will be identified by URIs
    • Syntax: "VALUE"^^data type URI
    • In fact data type labels are freely selectable URIs
  • Most commonly one uses XML Schema data types (XSD)
    • Further complexity beyond this should be modelled using additional RDF properties.
    • Example: "2,718 km"

Example for data type usage

Turtle:
@prefix dbr: <http://dbpedia.org/resource/> .
@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
dbr:Leipzig    geo:lat     "51.333332"^^xsd:float ,
               geo:long    "12.383333"^^xsd:float .
XML:
<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
         xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#">
<rdf:Description rdf:about="http://dbpedia.org/resource/Leipzig">
   <geo:lat rdf:datatype="http://www.w3.org/2001/XMLSchema#float">51.33
    </geo:lat>
   <geo:long rdf:datatype="http://www.w3.org/2001/XMLSchema#float">12.38
    </geo:long>
</rdf:Description>

Predefined Data Type

  • rdf:HTML and rdf:XMLLiteral are the only datatypes predefined by RDF
  • Used for arbitrary but balanced XML/HTML fragments

RDF/XML has the following special syntax for the rdf:XMLLiteral datatype:

<rdf:Description rdf:about="http://example.org/SemanticWeb">
<ex:Titel rdf:parseType="Literal">
  <b>Semantic Web</b>
  <br />
  Grundlagen
</ex:Titel>
</rdf:Description>

Language definitions

Language information influences only untyped literals

XML:
<rdf:Description rdf:about="http://dbpedia.org/resource/Leipzig">
   <rdfs:label xml:lang="de">Leipzig</rdfs:label>
   <rdfs:label xml:lang="ru">Лейпциг</rdfs:label>
</rdf:Description>
Turtle:
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
http://dbpedia.org/resource/Leipzig   rdfs:label     "Leipzig"@de ,
                                      rdfs:label     "Лейпциг"@ru .
According to the RDF 1.1 specification the 2nd literal has a type different from the 1st and 3rd one (but often similiarly implemented)! The 1st and 3rd are the same (difference from RDF 1.0).
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix dbr: <http://dbpedia.org/resource/> .
dbr:Leipzig    rdfs:label     "Leipzig" ,
                              "Leipzig"@de ,
                              "Leipzig"^^xsd:string .

References

Exercise 1 - dataset modeling

  1. Think of a little dataset that models a person with his/her personal information such as contact info and links to homepage.

  2. Write it down in:
    1. RDF/XML
    2. JSON-LD
    3. Turtle
    4. N-Triples (i.e. no abbreviations)

Exercise 2

  1. Think of a little dataset that models something from your daily life
    1. It has to have at least 5 resources (types being from two different vocabularies) and 3 literals
  2. Write it down in:
    1. RDF/XML
    2. JSON-LD
    3. Turtle
    4. N-Triples (i.e. no abbreviations)
  3. Measure the number of characters you need without whitespaces
  4. Try to compress your dataset as much as you can and tell us about your compression factor (baseline: N-Triples)

Create a small dataset modelling some domain you know well (e.g. family relationships).

Here is a possible vocabulary:
  • In the lecture on RDFS we'll see how this can be modelled more formally.
  • Note: Top 100 namespaces can be found in: http://goo.gl/fU8JNS

Populate the dataset with at least 3 instances

Write it down in: RDF/XML

<?xml version ="1.0" encoding = "utf-8"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ex = "http://example.org/family#" xmlns:foaf= "http://xmlns.com/foaf/0.1/"> <rdf:Description rdf:about="http://example.org/family#DoeFamily" > <ex:familyName rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >Doe </ex:familyName> <ex:streetAddress rdf:datatype ="http://www.w3.org/2001/XMLSchema#string" > High Street </ex:streetAddress> <ex:hasNeighbour rdf:resource = "http://example.org/family#SmithFamily" /> <ex:livesIn rdf:resource = "http://dbpedia.org/resource/Bonn" /> <ex:hasMember> <foaf:Person rdf:about= "http://example.org/family#John" /foaf:Person> </ex:hasMember> <ex:hasMember> <foaf:Person rdf:about="http://example.org/family#Jenny" /foaf:Person> </ex:hasMember> </rdf:Description> <rdf:Description rdf:about ="http://example.org/family#Bonn"> </rdf:Description> </rdf:RDF>

Write it down in: JSON-LD

Write it down in: Turtle

 

Write it down in: N-Triples (no abbreviations)

<http://example.org/family#DoeFamily> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://example.org/family#Family> .
<http://example.org/family#DoeFamily> <http://example.org/family#familyName> "Doe"^^<http://www.w3.org/2001/XMLSchema#string> .
<http://example.org/family#DoeFamily> <http://example.org/family#streetAddress> "High Street"^^<http://www.w3.org/2001/XMLSchema#string> .
<http://example.org/family#DoeFamily> <http://example.org/family#hasNeighbour> <http://example.org/family#SmithFamily> .
<http://example.org/family#DoeFamily> <http://example.org/family#livesIn> <http://dbpedia.org/resource/Bonn> .
<http://example.org/family#DoeFamily> <http://example.org/family#hasMember> <http://example.org/family#John> .
<http://example.org/family#DoeFamily> <http://example.org/family#hasMember> <http://example.org/family#Jenny> .
<http://example.org/family#Bonn> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/ontology/Place> .
<http://example.org/family#John> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
<http://example.org/family#Jenny> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .

Measure the number of characters you need without whitespaces

  • RDF/XML – 776 characters
  • JSON-LD – 515 characters
  • Turtle – 380 characters
  • N-Triples – 878 characters

Compressing Ontology

@prefix dbo: .
@prefix foaf: .
@prefix rdf: .
@prefix rdfs: .
@prefix rel: .
@prefix xml: .
@prefix xsd: .
rel:Family a rdfs:Class .
rel:familyName a rdfs:Property ;
rdfs:domain rel:Family ;
rdfs:range xsd:string .
rel:hasMember a rdfs:Property ;
rdfs:domain rel:Family ;
rdfs:range foaf:Person .
rel:hasNeighbour a rdfs:Property ;
rdfs:domain rel:Family ;
rdfs:range rel:Family .
rel:livesIn a rdfs:Property ;
rdfs:domain rel:Family ;
rdfs:range dbo:Place .
rel:streetAddress a rdfs:Property ;
rdfs:domain rel:Family ;
rdfs:range xsd:string .

 

Linked Data Stack – RDFS

RDF Schema User Interface & Applications Trust Crypto Proof Unifying Logic Ontology: OWL Rules: RIF Query: SPARQL Data Interchange: RDF XML URI Unicode

Goals

  • Understand for which use cases RDF Schema is suited
  • Understand the semantics of RDF Schema
    • Be able to read RDF Schema
    • Be able to create your own RDF Schema
  • Know its limitations

Prerequisites

  • Basic knowledge of RDF stack
  • Know the RDF data model

What is RDF Schema?

We can use RDF triples to express facts like:

 ex:AlbertEinstein   ex:discovered   ex:TheoryOfRelativity .

But how we can refine such a fact?

  • How we can define that the predicate ex:discovered has a person as subject and a theory as object?
  • How we can express that Albert Einstein was a researcher and that every researcher is a human?

Such knowledge is called schema knowledge or terminological knowledge

RDF Schema gives us the possibility to model such knowledge

RDF Schema (short: RDFS)

  • is a part of the W3C RDF recommendation family
  • used for schema/terminological knowledge
  • itself is an RDF vocabulary (thus every RDF Schema graph is an RDF graph)
  • its vocabulary is generic (not bound to a specific application area)
  • allows to specify semantics of user-defined RDF vocabularies

The Namespace of RDF Schema is http://www.w3.org/2000/01/rdf-schema#

(common prefix: rdfs)

Classes

A Class is a set of things (or entities). In RDF these things are identified by URIs.

The membership of an entity to a class is defined using the rdf:type property.

The fact that ex:MyBlueVWGolf is a member/instance of the class ex:Car can be expressed:

ex:MyBlueVWGolf   rdf:type   ex:Car .

A resource can belong to several classes:

ex:MyBlueVWGolf   rdf:type   ex:Car .
ex:MyBlueVWGolf rdf:type ex:GermanProduct .



Hierarchies of Classes

Classes can be arranged in hierarchies using the rdfs:subClassOf property.

Every ex:Car is an ex:MotorVehicle:

ex:Car   rdfs:subClassOf   ex:MotorVehicle .


Implicit Knowledge

Using the schema definition we are able to identify implicit knowledge.

ex:MyBlueVWGolf  rdf:type         ex:Car .
ex:Car rdfs:subClassOf ex:MotorVehicle .

implicitly contains the following statement as a logical consequence

ex:MyBlueVWGolf   rdf:type   ex:MotorVehicle .



Implicit Knowledge

The statements

ex:Car           rdfs:subClassOf   ex:MotorVehicle .
ex:MotorVehicle rdfs:subClassOf ex:Vehicle .

implicitly contains the following statement as a logical consequence

ex:Car   rdfs:subClassOf   ex:Vehicle .

We can see that rdfs:subClassOf is transitive.

Defining an own Class

Every URI denoting a class is an instance of rdfs:Class. For defining an own class we have to write:

ex:Car   rdf:type   rdfs:Class .


Which makes rdfs:Class itself an instance of rdfs:Class.

rdfs:Class   rdf:type   rdfs:Class .


Equivalence of Classes

To express the equivalence of two classes we can use

ex:Car         rdfs:subClassOf   ex:Automobile .
ex:Automobile rdfs:subClassOf ex:Car .

Which leads to the statement

ex:Car   rdfs:subClassOf   ex:Car .

We can see that rdfs:subClassOf is reflexive.


Predefined RDFS Classes

There are several other predefined classes than rdfs:Class:

  • rdfs:Resource is the class of all things. It is the superclass of all classes.
  • rdf:Property is the class of all properties.
  • rdfs:Datatype is the class of all datatypes.
    (Every instance of this class is a subclass of rdfs:Literal.)
  • rdfs:Literal is the class of literal values such as Strings or Integers.
    (If such a literal is typed, it is an instance of rdfs:Datatype.)
  • rdf:langString is the class of language-tagged string literals. It is an instance of rdfs:Datatype and a subclass of rdfs:Literal.
  • rdf:XMLLiteral is the class of XML literal values. Its a subclass of rdfs:Literals and an instance of rdfs:Datatype.
  • rdf:Statement is the class of the RDF statements. So every RDF triple can be seen as an instance of this class with a rdf:subject, rdf:predicate and rdf:object property.
  • rdfs:Container is a super-class of the RDF Container classes.
    (i.e. rdf:Bag, rdf:Seq, rdf:Alt)

Yes, rdf:Property, rdf:XMLLiteral, rdf:Statement, etc., are in the RDF vocabulary already, but only RDFS declares them to be classes.

Defining an own Property

As we can define Classes we can define new Properties.

For expressing that there is a new Property we define it as an instance of the property class.

ex:drives   rdf:type   rdf:Property .

With this new Property we can express that Max drives a VW Golf (not just any one, but a specific one).

ex:Max   ex:drives   ex:MyBlueVWGolf .


Hierarchies of Properties

Using the rdfs:subPropertyOf property we can define a hierarchy of properties.

ex:drives   rdfs:subPropertyOf   ex:controls .

(You see that a vocabulary is often an idealized model of the real world!)

With the former statement

ex:Max   ex:drives   ex:MyBlueVWGolf .

We can infer that

ex:Max   ex:controls   ex:MyBlueVWGolf .

Range and Domain of Properties

Every property has a Domain and a Range that specify which class the subject or the object must have.

ex:Max   ex:drives   ex:MyBlueVWGolf .
^^^^^^               ^^^^^^^^^^^^^^^
Domain                    Range

Using the Properties rdfs:domain and rdfs:range we can define the Domain and Range of a Property.

ex:drives   rdfs:domain   ex:Person .
ex:drives   rdfs:range    ex:Vehicle .

The same can be done for datatypes

ex:hasAge   rdfs:range   xsd:nonNegativeInteger .

Important to understand:

  1. “must have” above is not a constraint in the sense of “if ex:MyBlueVWGolf is not an ex:Vehicle, then the RDF statement above is illegal”.
  2. It means “given that ex:MyBlueVWGolf is used with ex:drives, we know that it is an ex:Vehicle (in addition to whatever else it may be)”.
  3. Possibility (1) wouldn't make sense, as in RDF Schema there is no way of expressing that something is not an instance of some class.

Multiple Range Statements

The statements

ex:drives   rdfs:range   ex:Car .
ex:drives   rdfs:range   ex:Ship .

mean that the Range of ex:drives has to be both  an ex:Car and an ex:Ship!

If you want to express that the object of the Property has to be a car or a ship, a better expression would be

ex:Car     rdfs:subClassOf   ex:Vehicle .
ex:Ship rdfs:subClassOf ex:Vehicle .
ex:drives  rdfs:range   ex:Vehicle .

Implicit Knowledge

Once we define the Domain and Range of properties, we have to take care that using such properties could lead to unintended consequences.

The schema

ex:isMarriedTo    rdfs:domain  ex:Person .
ex:isMarriedTo rdfs:range ex:Person .
ex:instituteAKSW rdf:type ex:Institution .

and the additional statement

ex:Max   ex:isMarriedTo   ex:instituteAKSW .

leads to the logical consequence

ex:instituteAKSW   rdf:type   ex:Person .

False Conclusion

Some people might be confused about the combination of specifying domain and range of a property and the hierarchy of the classes. So we want to look at an example:

ex:drives  rdfs:range       ex:Car .     # (1)
ex:Car     rdfs:subClassOf  ex:Vehicle . # (2)

These two triples do not entail the following relation

ex:drives   rdfs:range   ex:Vehicle .    # (3)

I.e. we do not gain new terminological knowledge. Still, of any concrete triple having the predicate ex:drives, we know that its object is of type ex:Vehicle – which is the same effect as if we had statement (3) in our schema.

Container Class

RDF Schema defines the rdfs:Container class which is a superclass for the Containers defined by RDF:

  • rdf:Seq
  • rdf:Bag
  • rdf:Alt

Container Membership

RDF Schema defines new classes for working with Containers:

  • The rdfs:ContainerMembershipProperty class which contains all properties that are used to state that a resource is a member of a container (e.g. rdf:_1, rdf:_2, …).
  • The rdfs:member Property is a superproperty for all Properties of all Container membership Properties.
    (So every instance of rdfs:ContainerMembershipProperty is a rdfs:subPropertyOf the rdfs:member Property)

Reification

How can we state in RDF
"The detective supposes that the butler killed the gardener." ?

ex:Detective   ex:supposes   "The butler killed the gardener" .
ex:Detective   ex:supposes   ex:theButlerKilledTheGardener .

Both ways are unsatisfactory. What we would like to talk about is the triple

ex:Butler   ex:killed   ex:Gardener .

itself.

Reification

With the class rdf:Statement RDF Schema gives the possibility of reification . It has the following Properties

  • rdf:subject defining an rdfs:Resource which is the subject of the statement
  • rdf:predicate defining an rdf:Property which is the predicate of the statement
  • rdf:object defining an rdf:Resource which is the object of the statement

Using Reification we can see an RDF triple as a Resource and formulate facts about it. (e.g. that the theory hasn't been proved)

ex:Detective  ex:supposes    _:theory .
_:theory      rdf:type       rdf:Statement .
_:theory      rdf:subject    ex:Butler .
_:theory      rdf:predicate  ex:hasKilled .
_:theory      rdf:object     ex:Gardener .
_:theory      ex:hasState    "unproved" .

Note that the following statement is not a logical consequence of this:

ex:Butler   ex:killed   ex:Gardener .

If the RDF(S) specification said that it is a logical consequence, this wouldn't make sense – it would prevent us from being able to talk about unproved or wrong statements.

Reification

rdf:type ex:supposes ex:hasState rdfs:subject rdfs:predicate rdfs:object rdfs:Statement ex:Detective "unproven" ex:Butler ex:killed ex:Gardener

Supplementary Information

RDF Schema gives the possibility to add additional information to resources using the following Properties:

  • rdfs:label can be used to give a human readable name for a resource.
  • rdfs:comment for adding a longer comment or explanation.
  • rdfs:seeAlso points to an URI where additional information about the resource can be found.
  • rdfs:isDefinedBy points to an URI where the resource is defined.
    (rdfs:isDefinedBy is a subproperty of rdfs:seeAlso)
ex:VWGolf   rdfs:label        "VW Golf" .
ex:VWGolf   rdfs:comment   "The VW Golf is a popular German car..." .
ex:VWGolf   rdfs:seeAlso   wikipedia:VW_Golf .
ex:VWGolf   rdfs:isDefinedBy  ex2:VolkswagenDataset .

The advantage of using these properties is that the additional information is represented as structured RDF, too.

Define acid and base

We want to implement a System which should be able to calculate the needed amount of an acid or a base to neutralize a given solution.

Therefore our System has to handle information about acids and bases. For this information we define an own schema for our data.

We start by defining acid and base as own classes

ex:Acid   rdf:type   rdfs:Class .
ex:Base rdf:type rdfs:Class .

Both can be described as chemical compounds. So we define this as an own class and the other two subclasses of it.

ex:ChemicalCompound   rdf:type   rdfs:Class .
ex:Acid rdfs:subClassOf ex:ChemicalCompound .
ex:Base rdfs:subClassOf ex:ChemicalCompound .

Define some instances

After that, we are able to add some acids and bases. For example:

  • hydrogen chloride (ex:HCl)
  • phosphoric acid (ex:H3PO4)
  • Sodium hydroxide (ex:NaOH) and
  • Calcium hydroxide (ex:Ca_OH2).
ex:HCl     rdf:type   ex:Acid .
ex:H3PO4 rdf:type ex:Acid .
ex:NaOH rdf:type ex:Base .
ex:Ca_OH2 rdf:type ex:Base .

Intermediate result

The Picture shows what we have done so far.

rdfs:subClassOf rdfs:subClassOf rdf:type rdf:type rdf:type rdf:type ex:ChemicalCompound ex:Acid ex:Base ex:HCl ex:H3PO4 ex:NaOH ex:Ca_OH2

Add the molar mass

After creating the classes we want to add a typical chemical property, like the molar mass. At first we define it as a property.

ex:hasMolarMass   rdf:type   rdfs:Property .

Every chemical compound can have an information about its molar mass.

ex:hasMolarMass   rdfs:domain   ex:ChemicalCompound .

The molar mass itself could be defined as an own datatype with some additional information. But for our small example we want to use a simple existing floating point datatype. So the Range of the ex:hasMolarMass property is

ex:hasMolarMass   rdfs:range   xsd:float .

After defining the property we can use it:

ex:NaOH   ex:hasMolarMass   "39.9971" .

Add additional Properties

As a general difference between an acid and a base we want to define that an acid can donate protons and a base can accept protons. Therefore we want to define two properties with which we can store how many protons can be accepted or donated.

ex:canDonateProtons   rdf:type     rdfs:Property .
ex:canDonateProtons   rdfs:domain  ex:Acid .
ex:canDonateProtons   rdfs:range   xsd:integer .

ex:canAcceptProtons   rdf:type     rdfs:Property .
ex:canAcceptProtons   rdfs:domain  ex:Base .
ex:canAcceptProtons   rdfs:range   xsd:integer .

Using these new properties we can distinguish the acids and bases by there strength (that means by the number of protons they donate or accept).

ex:HCl     ex:canDonateProtons  "1" .
ex:H3PO4   ex:canDonateProtons  "3" .
ex:NaOH    ex:canAcceptProtons  "1" .
ex:Ca_OH2  ex:canAcceptProtons  "2" .

Define solution

Our program has to work with solutions which contain different amounts of chemical compounds. So we have to define the class ex:Solution and a property describing its mass.

ex:Solution  rdf:type     rdfs:Class .

ex:hasMass rdf:type rdfs:Property .
ex:hasMass rdfs:domain ex:Solution .
ex:hasMass rdfs:range xsd:float .

Define ingredient

Our program has to work with solutions which contain different amounts of chemical compounds. However, a single Property which describes that a chemical compound is inside a solution is not expressive enough, because we need also the possibility to give information about the amount of this compound.

So we have to define the class ex:Ingredient and a Property for describing the amount in percentages.

ex:Ingredient  rdf:type     rdfs:Class .

ex:hasAmount rdf:type rdfs:Property .
ex:hasAmount rdfs:domain ex:Ingredient .
ex:hasAmount rdfs:range xsd:float .

Additionally this class needs two Properties to be connected to an ex:Solution and an ex:ChemicalCompound.

ex:isPartOf   rdf:type     rdfs:Property .
ex:isPartOf rdfs:domain ex:Ingredient .
ex:isPartOf rdfs:range ex:Solution .

ex:contains rdf:type rdfs:Property .
ex:contains rdfs:domain ex:Ingredient .
ex:contains rdfs:range ex:ChemicalCompound .

Using the schema

Now our small schema is able to express all information which a program needs to solve the following typical task in chemistry:

You have 100g of a solution which consists of 20% phosphoric acid. How much of a 10% sodium hydroxide solution do you need to neutralize this solution?

We could start by inserting the information contained inside the task description into our knowledge base (which relies on our schema):

ex:GivenSolution  rdf:type    ex:Solution .
ex:GivenSolution ex:hasMass "100" .

ex:PhosAcidIng20Perc rdf:type ex:Ingredient .
ex:PhosAcidIng20Perc ex:hasAmount "20" .
ex:PhosAcidIng20Perc ex:contains ex:H3PO4 .
ex:PhosAcidIng20Perc ex:isPartOf ex:GivenSolution .

ex:SearchedSolution rdf:type ex:Solution .

ex:SodiumHyIng10Perc rdf:type ex:Ingredient .
ex:SodiumHyIng10Perc ex:hasAmount "10" .
ex:SodiumHyIng10Perc ex:contains ex:NaOH .
ex:SodiumHyIng10Perc ex:isPartOf ex:SearchedSolution .

Using the schema

The program could query the following needed information from our knowledge base:

ex:H3PO4  ex:hasMolarMass      "97.995" .
ex:H3PO4 ex:canDonateProtons "3" .

ex:NaOH   ex:hasMolarMass   "39.9971" .
ex:NaOH   ex:canAcceptProtons "1" .


Relying on all these information which was modeled using our own schema the program could easely compute the result for the given task:

We need 244.9g of the sodium hydroxide solution for neutralizing the given phosphoric acid solution.

RDF and RDF Schema

RDF Schema extends RDF with a special Vocabulary for terminological knowledge.

The picture shows the different kinds of knowledge and the different usages of RDF-Schema and the “normal” RDF

ex:Vehicle rdfs:subClassOf terminological knowledge ex:Person rdfs:domain ex:drives rdfs:range ex:Car RDF-Schema RDF rdf:type rdf:type ex:Max ex:drives ex:MyVWGolf assertional knowledge

Limitations of RDFS

RDF Schema can be used as a lightweight language for defining a vocabulary (also called ontology) used in RDF graphs.

However, RDF Schema has some limitations regarding the possibilities of formulating ontologies.

Missing Expressivity

RDF Schema does not contain possibilities to make the following Expressions:

  • It is not possible to define a negation of an expression.
    For example it is not possible that the Domain of a property does not contain a certain class.
  • It is not possible to define cardinalities.
    For example it is not possible that a Person has either 0 or 1 ex:isMarriedTo relations.
  • It is not possible to define a set of classes.
    For example we can not express that the Domain of a property should be one or another class. We have to create a new superclass of both which we can use as the domain of the property.
  • It is not possible to define metadata of the schema.
    We are not able to add important metadata like a version to the schema.

These examples are just the most important limitations of RDFS. Therefore we will be looking at the mightier OWL in another lecture.

Summary

  • RDF Schema (short: RDFS) can be used to express terminological knowledge by defining Classes and Properties.
  • The Classes and Properties can be arranged in hierarchies.
  • The Domain and Range of Properties can be defined.
  • The Schema allows the inference of knowledge which has been defined implicitly.
  • RDF Schema can be used to define a "lightweight" ontology but it is not as expressive as OWL.
  • The current standard, RDF Schema 1.1, only has minor changes over RDFS 1.0. 

References

Tasks & mini projects

This slide contains some suggestions for tasks and mini projects you can complete in addition to the multiple-choice self-assessment test in order to practice and prepare for an exam:

  1. Create a schema with which you can describe your familiar situation
    • Start with needed classes and the relations in the first degree (spouse, parents and children)
    • go on with the relations in the second degree (sister, brother, grandfather,...)
    • How would you model the gender? As a property of a "Person" superclass or would you create two separated classes "Man" and "Woman"? What would be better for defining the domain or range of properties like "isTheGrandfatherOf"?

Exercises 1 - Classes

  1. Create graph representation of different courses presented by different people in a university. (For example lectures or seminars as sub-classes of courses are presented by professors or researchers as sub-classes of person.)
  2. Create Turtle and RDF graph of this example
     

Exercises 2 - Properties

  1. Create graph representation of different properties and hierarchies of properties
  2. Create Turtle and RDF graph of this example

Exercises 3 - Lists


Tasks

  • Create a schema with which you can describe your familiar situation
    • Start with needed classes and the relations in the first degree (spouse, parents and children)
    • go on with the relations in the second degree (sister, brother, grandfather,...)
    • How would you model the gender? As a property of a "Person" superclass or would you create two separated classes "Man" and "Woman"? What would be better for defining the domain or range of properties like "isTheGrandfatherOf"?

Create a schema with which you can describe your familiar situation


controlled 3D RDF Visualization

source : http://www.ebremer.com/system/files/story/images/nexus-dna-force-directedlayout-2011-05-15.jpg

Overview

  1. What is Semantics?
  2. What is Model-theoretic Semantics?
  3. Model-theoretic Semantics for RDF(S)
  4. What is Proof-theoretic Semantics?
  5. Proof-theoretic Semantics for RDF(S)
  6. References

Overview

  1. What is Semantics?
  2. What is Model-theoretic Semantics?
  3. Model-theoretic Semantics for RDF(S)
  4. What is Proof-theoretic Semantics?
  5. Proof-theoretic Semantics for RDF(S)

Syntax and Semantics

Syntax: character strings without meaning
Semantics: meaning of the character strings


Semantics of Programming Languages

Semantics of Logic

Recall: Implicit Knowledge

 If an RDFS document contains the triples

u rdf:type ex:Textbook .

and

ex:Textbook rdfs:subClassOf ex:Book .

then

u rdf:type ex:Book .

is implicitly also the case: it is a logical consequence . We can also say it is deduced (deduction) or inferred (inference). We do not have to state this explicitly. Which statements are logical consequences is governed by the formal semantics.

Recall: Implicit Knowledge

From

ex:Textbook  rdfs:subClassOf  ex:Book .
ex:Book      rdfs:subClassOf  ex:PrintMedia .

The following is a logical consequence:

ex:Textbook rdfs:subClassOf ex:PrintMedia .

That is, rdfs:subClassOf is transitive .

What Semantics Is Good For

Opinions differ. Here is one:

The Semantic Web requires shareable, declarative and computable semantics.

That is, the semantics must be a formal entity, which is clearly defined and automatically computable.

Ontology languages provide this by means of their formal semantics.

Semantic Web semantics is given by a relation—the logical consequence relation.

In other words …

 We capture the meaning of information

  • not by specifying its meaning (which is impossible)
  • by specifying how information interacts with other information

We describe the meaning indirectly through its effects.

Overview

  1. What is Semantics?
  2. What is Model-theoretic Semantics?
  3. Model-theoretic Semantics for RDF(S)
  4. What is Proof-theoretic Semantics?
  5. Proof-theoretic Semantics for RDF(S)

Model-theoretic Semantics

You need:

  • a language/syntax
  • a notion of a model for sentences of the language

Models

  • are made such that each sentence is either true of false w.r.t. a given model
  • if a sentence \(\alpha\) is true in a model \(M\) then we write \(M \models \alpha\).

Logical consequence

  • \(\beta\) is a logical consequence of \(\alpha\) (written \(\alpha\models\beta\)), if \(\forall M : M \models\alpha \rightarrow M \models \beta\).
  • if \(K\) is a set of sentences, we write \(K \models \beta\), if \(\exists \kappa \in K: \kappa \models \beta\).
  • if \(J\) is another set of sentences, we write \(K \models J\), if \(\forall \beta \in J : K \models \beta\).

Note: the notation \(\models\) is overloaded.

Logical Consequence

Model Theory—(contrived) example

Language

  • variables  \(\ldots,w,x,y,z,\ldots\)
  • symbol  \(\eta\)
  • allowed sentences:  \(a \mathop{\eta} b\) for  \((a, b)\) any variables

We want to know

  • What are the logical consequences of the set  \(\{x \mathop{\eta} y, y \mathop{\eta} z\}\)

To answer this we must say what the models in our semantics are.

Model Theory—(contrived) example

Say, a model \(I\) of a set \(K\) of sentences consists of

  • a set \(C\) of cars and
  • a function \(I(\cdot)\), which maps each variable to a car in \(C\) such that, for each sentence \(a \mathop{\eta} b\) in \(K\) we have that \(I(a)\) has more horsepower than \(I(b)\).

We now claim that \(\{x \mathop{\eta} y, y \mathop{\eta} z\} \models x \mathop{\eta} z\).

Proof: Consider any model \(M\) of \(\{x \mathop{\eta} y, y \mathop{\eta} z\}\). Since \(M \models \{x \mathop{\eta} y, y \mathop{\eta} z\}\), we know that

  • \(M(x)\) has more horsepower than \(M(y)\) and
  • \(M(y)\) has more horsepower than \(M(z)\).

Hence, \(M(x)\) has more horsepower than \(M(z)\), i.e. \(M \models x \mathop{\eta} z\).

This argument holds for all models of \(\{x \mathop{\eta} y, y \mathop{\eta} z\}\), therefore \(\{x \mathop{\eta} y, y \mathop{\eta} z\} \models x \mathop{\eta} z\).

Model Theory—(contrived) example

Say, a model \(I\) of a set \(K\) of sentences consists of

  • a set \(C\) of cars and
  • a function \(I(\cdot)\), which maps each variable to a car in \(C\) such that, for each sentence \(a \mathop{\eta} b\) in \(K\) we have that \(I(a)\) has more horsepower than \(I(b)\).

An interpretation \(I\) for our language consists of

  • a set \(C\) of cars and
  • a function \(I(\cdot)\), which maps each variable to a car in \(C\).

And that's it. No information whether a sentence is true or not w.r.t. \(I\).

Overview

  1. What is Semantics?
  2. What is Model-theoretic Semantics?
  3. Model-theoretic Semantics for RDF(S)
  4. What is Proof-theoretic Semantics?
  5. Proof-theoretic Semantics for RDF(S)

Model-theoretic Semantics applied to RDF(S)

Language: valid RDF(S)

Sentences are triples; graphs are sets thereof.


Interpretations are given via sets and functions  from language vocabularies to these sets.

Models are defined such that they capture the intended meaning of the RDF(S) vocabulary.

Three different notions: 


Simple Interpretations

A simple interpretation \(\mathcal{I}\) of a given vocabulary \(V\) consists of:
  • \(\mathit{IR}\), a non-empty set of resources , alternatively called domain or universe of discurse of \(\mathcal{I}\), 
  • \(\mathit{IP}\), the set of properties of \(\mathcal{I}\) (which may overlap with \(\mathit{IR}\)), 
  • \(\mathrm{I_{EXT}}\), a function assigning to each property a set of pairs from \(\mathit{IR}\), i.e. \(\mathrm{I_{EXT}} : \mathit{IP} \longrightarrow 2^{\mathit{IR} \times \mathit{IR}} \), where \(\mathrm{I_{EXT}}(p)\) is called the extension of property \(p\),
  • \(\mathrm{I_S}\), a function mapping URIs from \(V\) into the union of the sets \(\mathit{IR}\) and \(\mathit{IP}\), i.e. \(\mathrm{I_S} : V \longrightarrow \mathit{IR} \cup \mathit{IP}\), 
  • \(\mathrm{I_L}\), a function from the typed literals in \(V\) into the set \(\mathit{IR}\) of resources and
  • \(\mathit{LV}\), a particular subset of \(\mathit{IR}\), called the set of literal values , containing (at least) all untyped literals from \(V\).

Simple Interpretation Function

 

The simple interpretation function \(\cdot^\mathcal{I}\) (written as exponent) is defined as follows:
  • every untyped literal \("\!\!a\!\!"\) is mapped to \(a\), formally \(("\!\!a\!\!")^{\mathcal{I}}=a\),
  • every untyped literal carrying language information \("\!\!a\!\!"\!\!@t\) is mapped to the pair \(\langle a, t \rangle\),
  • every typed literal \(l\) is mapped to \(\mathrm{I_L}(l)\), formally \(l^\mathcal{I}=\mathrm{I_L}(l)\), and
  • every URI \(u\) is mapped to \(\mathrm{I_S}(u)\), i.e. \(u^\mathcal{I}=\mathrm{I_S}(u)\).

Simple Interpretations

Simple Models

The truth value \((s \, p \, o.)^\mathcal{I}\) of a (grounded) triple \(s \, p \, o .\) is true iff \(s\), \(p\), and \(o\) are contained in \(V\) and \(\langle s^\mathcal{I}, o^\mathcal{I}\rangle \in \mathrm{I_{EXT}}(p^\mathcal{I})\).

Simple Models

The truth value \((s \, p \, o.)^\mathcal{I}\) of a (grounded) triple \(s \, p \, o .\) is true iff \(s\), \(p\), and \(o\) are contained in \(V\) and \(\langle s^\mathcal{I}, o^\mathcal{I}\rangle \in \mathrm{I_{EXT}}(p^\mathcal{I})\).


What about blank nodes?

Say, \(A(\cdot)\) is a function from blank nodes to URIs. (These URIs need not be contained in the graph we are looking at.)

If, in a graph \(G\), we replace each blank node \(x\) by \(A(x)\), the we obtain the graph \(G^\prime\), which is called a grounding of G.

We know how to do semantics for the grounded graph.

So define \(I \models G\) iff \(I \models G^\prime\) for at least one grounding \(G^\prime\) of \(G\).

Simple Entailment

A graph \(G\) simply entailes a graph \(G^\prime\) if every simple interpretation that is a model of \(G\) is also a model of \(G^\prime\).

(Recall that a simple interpretation is a model of a graph \(G\) if it is a model of each triple in \(G\).)

It's really simple

Basically, \(G \models G^\prime\) iff \(G^\prime\) can be obtained from \(G\) by replacing some nodes in \(G\) with blank nodes.

It is really simple entailment.

RDF Semantic Conditions

An RDF Interpretation of a vocabulary \(V\) is a simple interpretation of the vocabulary \(V \cup V_\mathrm{RDF}\) that additionally satisfies the following conditions:
  • \(x \in \mathit{IP}\) iff \(\langle x, \verb|rdf:Property|^\mathcal{I} \rangle \in \mathrm{I_{EXT}}(\verb|rdf:type|^\mathcal{I})\).
  • if \("\!\!s\!\!"\!\!\verb|^^rdf:XMLLiteral|\) is contained in \(V\) and \(s\) is a well-formed XML literal, then
    • \(\mathrm{I_L}("\!\!s\!\!"\!\!\verb|^^rdf:XMLLiteral|)\) is the XML value of \(s\);
    • \(\mathrm{I_L}("\!\!s\!\!"\!\!\verb|^^rdf:XMLLiteral|) \in \mathit{LV}\);
    • \(\langle \mathrm{I_L}("\!\!s\!\!"\!\!\verb|^^rdf:XMLLiteral|), \verb|rdf:XMLLiteral|^\mathcal{I} \rangle \in \mathrm{I_{EXT}}(\verb|rdf:type|^\mathcal{I})\)
  • if \("\!\!s\!\!"\verb|^^rdf:XMLLiteral|\) is contained in \(V\) and \(s\) is an ill-formed XML literal, then
    • \(\mathrm{I_L}("\!\!s\!\!"\!\!\verb|^^rdf:XMLLiteral|) \not \in \mathit{LV}\);
    • \(\langle \mathrm{I_L}("\!\!s\!\!"\!\!\verb|^^rdf:XMLLiteral|), \verb|rdf:XMLLiteral|^\mathcal{I} \rangle \not \in \mathrm{I_{EXT}}(\verb|rdf:type|^\mathcal{I})\)

RDF Axiomatic Triples

In addition, each RDF Interpretation has to evaluate the following triples to true:

     
 
rdf:type
rdf:type
rdf:Property .
 
rdf:subject
rdf:type
rdf:Property .
 
rdf:predicate
rdf:type
rdf:Property .
 
rdf:object
rdf:type
rdf:Property .
 
rdf:first
rdf:type
rdf:Property .
 
rdf:rest
rdf:type
rdf:Property .
 
rdf:value
rdf:type
rdf:Property .
 
rdf:_i
rdf:type
rdf:Property .
 
rdf:nil
rdf:type
rdf:List .

RDFS Semantic Conditions (1)

Define (for a given RDF Interpretation \(\mathcal{I}\)):

  • \(\mathrm{I_{CEXT}} : \mathit{IR} \longrightarrow 2^\mathit{IR}\)—we define \(\mathrm{I_{CEXT}}(y)\) to contain exactly those elements \(x\) for which \(\langle x, y \rangle\) is contained in \(\mathrm{I_{EXT}}(\verb|rdf:type|)^\mathcal{I}\). The set \(\mathrm{I_{CEXT}}(y)\) is then also called the (class) extension of \(y\).
  • \(\mathit{IC}=\mathrm{I_{CEXT}}(\verb|rdfs:Class|^\mathcal{I})\)
  • \(\mathit{IR}=\mathrm{I_{CEXT}}(\verb|rdfs:Resource|^\mathcal{I})\)
  • \(\mathit{LV}=\mathrm{I_{CEXT}(\verb|rdfs:Literal|^\mathcal{I})}\)
  • If \(\langle x, y\rangle\ \in \mathrm{I_{EXT}}(\verb|rdfs:domain|^\mathcal{I})\) and \(\langle u, v \rangle \in \mathrm{I_{EXT}}(x)\), then \(u \in \mathrm{I_{CEXT}}(y)\)
  • If \(\langle x, y\rangle\ \in \mathrm{I_{EXT}}(\verb|rdfs:range|^\mathcal{I})\) and \(\langle u, v \rangle \in \mathrm{I_{EXT}}(x)\), then \(v \in \mathrm{I_{CEXT}}(y)\)
  • \(\mathrm{I_{EXT}}(\verb|rdfs:subPropertyOf|^\mathcal{I})\) is reflexive and transitive on \(\mathit{IP}\).

RDFS Semantic Conditions (2)

  • If \(\langle x, y \rangle \in \mathrm{I_{EXT}}(\verb|rdfs:subPropertyOf|^\mathcal{I})\),
    then \(x, y \in \mathit{IP}\) and \(\mathrm{I_{EXT}}(x) \subseteq \mathrm{I_{EXT}}(y)\).
  • If \(x \in \mathit{IC}\),
    then \(\langle x, \verb|rdfs:Resource|^\mathcal{I} \rangle \in \mathrm{I_{EXT}}(\verb|rdfs:subClassOf|^\mathcal{I})\).
  • If \(\langle x, y \rangle \in \mathrm{I_{EXT}}(\verb|rdfs:subClassOf|^\mathcal{I})\),
    then \(x, y \in \mathit{IC}\) and \(\mathrm{I_{CEXT}}(x) \subseteq \mathrm{I_{CEXT}}(y)\).
  • \(\mathrm{I_{EXT}}(\verb|rdfs:subClassOf|^\mathcal{I})\) is reflexive and transitive on \(\mathit{IC}\).
  • If \(x \in \mathrm{I_{CEXT}}(\verb|rdfs:ContainerMembershipProperty|^\mathcal{I})\),
    then \(\langle x, \verb|rdfs:member|^\mathcal{I} \rangle \in \mathrm{I_{EXT}}(\verb|rdfs:subPropertyOf|^\mathcal{I})\).
  • If \(x \in \mathrm{I_{CEXT}}(\verb|rdfs:Datatype|^\mathcal{I})\),
    then \(\langle x, \verb|rdfs:Literal|^\mathcal{I} \rangle \in \mathrm{I_{EXT}}(\verb|rdfs:subClassOf|^\mathcal{I})\).

RDFS Axiomatic Triples (1)

Furthermore all of the following axiomatic triples must be satisfied:

rdf:typerdfs:domainrdfs:Resource .
rdfs:domainrdfs:domainrdf:Property .
rdfs:rangerdfs:domainrdf:Property .
rdfs:subPropertyOfrdfs:domainrdf:Property .
rdfs:subClassOfrdfs:domainrdfs:Class .
rdf:subjectrdfs:domainrdf:Statement .
rdf:predicaterdfs:domainrdf:Statement .
rdf:objectrdfs:domainrdf:Statement .
rdf:memberrdfs:domainrdfs:Resource .
rdf:firstrdfs:domainrdf:List .
rdf:restrdfs:domainrdf:List .
rdf:seeAlsordfs:domainrdfs:Resource .
rdf:isDefinedByrdfs:domainrdfs:Resource .
rdfs:commentrdfs:domainrdfs:Resource .
rdfs:labelrdfs:domainrdfs:Resource .
rdfs:valuerdfs:domainrdfs:Resource .

RDFS Axiomatic Triples (2)

Furthermore all of the following axiomatic triples must be satisfied:

rdf:typerdfs:rangerdfs:Class .
rdfs:domainrdfs:rangerdfs:Class .
rdfs:rangerdfs:rangerdfs:Class .
rdfs:subPropertyOfrdfs:rangerdfs:Property .
rdfs:subClassOfrdfs:rangerdfs:Class .
rdf:subjectrdfs:rangerdfs:Resource .
rdf:predicaterdfs:rangerdfs:Resource .
rdf:objectrdfs:rangerdfs:Resource .
rdfs:memberrdfs:rangerdfs:Resource .
rdf:firstrdfs:rangerdfs:Resource .
rdf:restrdfs:rangerdfs:List .
rdfs:seeAlsordfs:rangerdfs:Resource .
rdfs:isDefinedByrdfs:rangerdfs:Resource .
rdfs:commentrdfs:rangerdfs:Literal .
rdfs:labelrdfs:rangerdfs:Literal .
rdf:valuerdfs:rangerdfs:Resource .

RDFS Axiomatic Triples (3)

Furthermore all of the following axiomatic triples must be satisfied:

rdfs:ContainerMembershipPropertyrdfs:subClassOfrdf:Property .
rdfs:Altrdfs:subClassOfrdfs:Container .
rdfs:Bagrdfs:subClassOfrdfs:Container .
rdfs:Seqrdfs:subClassOfrdfs:Container .
rdfs:isDefinedByrdfs:subPropertyOfrdfs:seeAlso .
rdf:XMLLiteralrdfs:typerdfs:Datatype .
rdf:XMLLiteralrdfs:subClassOfrdfs:Literal .
rdfs:Datatyperdfs:subClassOfrdfs:Class .
rdf:_irdf:typerdfs:ContainerMembershipProperty .
rdf:_irdfs:domainrdfs:Resource .
rdf:_irdfs:rangerdfs:Resource .

Overview

  1. What is Semantics?
  2. What is Model-theoretic Semantics?
  3. Model-theoretic Semantics for RDF(S)
  4. What is Proof-theoretic Semantics?
  5. Proof-theoretic Semantics for RDF(S)

Back to our contrived example

Say, a model \(I\) of a set \(K\) of sentences consists of

  • a set\(C\) of cars and
  • a function\(I(\cdot)\), which maps each variable to a car in \(C\) such that, for each sentence \(a \mathop{\eta} b\) in \(K\) we have that \(I(a)\) has more horsepower than \(I(b)\).

Can we find an algorithm that computes all logical consequences of a set of sentences?

Algorithm input: set \(K\) of sentences.

  1. The algorithm non-deterministically selects two sentences from \(K\). If the first sentence is \(a\mathop{\eta}b\) and the second sentences is \(b\mathop{\eta}c\) then add \(a\mathop{\eta}c\) to \(K\). That is
    \(\verb|if| \quad a \; \eta \; b \in K\) and \(b \; \eta \; c \in K \quad \verb|then| \quad K \leftarrow K \cup \{a \; \eta \; c\}\)
  2. Repeat step 1. until no selection results in a change of \(K\).
  3. Output \(K\).

Back to our example

The algorithm produces only logical consequences: it is sound w.r.t. the model-theoretic semantics.

The algorithm produces all logical consequences: it is complete w.r.t. the model-theoretic semantics.

The algorithm always terminates.

The algorithm is non-deterministic.

What is the computational complexity of this algorithm?

What do we do again?

Recall: \(\beta\) is a logical consequence of \(\alpha\) (\(\alpha \models \beta\)), if for all \(M\) with \(M \models \alpha\), we also have \(M \models \beta\).

Implementing model-theoretic semantics directly is not feasible: We would have to deal with all models of a knowledge base. Since there are a lot of cars in this world, we would have to check a lot of possibilities.

Proof theory reduces model-theoretic semantics to symbol manipulation. It removes the models from the process.

Deduction rules

\(\verb|if| \quad a \; \eta \; b \in K\) and \(b \; \eta \; c \in K \quad \verb|then| \quad K \leftarrow K \cup \{a \; \eta \; c\}\)

is a so-called deduction rule. Such rules are usually written schematically as

\[\frac{a \; \eta \; b \quad b \; \eta \; c}{a \; \eta \; c}\]

Overview

  1. What is Semantics?
  2. What is Model-theoretic Semantics?
  3. Model-theoretic Semantics for RDF(S)
  4. What is Proof-theoretic Semantics?
  5. Proof-theoretic Semantics for RDF(S)

Notation

\(a\) and \(b\) can refer to arbitrary URIs (i.e. anything admissible for the predicate position in a triple,

\(\mathit{\_\!\!:\!\!n}\) will be used for the ID of a blank node,

\(u\) and \(v\) refer to arbitrary URIs or blank node IDs (i.e. any possible subject of a triple),

\(x\) and \(y\) can be used for arbitrary URIs, blank node IDs or literals (anything admissible for the object position in a triple), and

\(l\) may refer to any literal.

Simple Entailment Rules


\[\frac{u \quad a \quad x \quad .}{u \quad a \quad \mathit{\_\!\!:\!\!n} \quad .} \qquad \mathrm{se1}\]


\[\frac{u \quad a \quad x \quad .}{\mathit{\_\!\!:\!\!n} \quad a \quad x \quad .}\qquad \mathrm{se2}\]


\(\mathit{\_\!\!:\!\!n}\) must not be contained in the graph the rules is applied to.

RDF Entailment Rules

\[\frac{}{u \quad a \quad x \quad .} \qquad \mathrm{rdfax}\]
for all RDF axiomatic triples \(u \; a \; x \; .\)

 

\[\frac{u \quad a \quad l \quad .}{u \quad a \quad \mathrm{\_\!\!:\!\!n} \quad .} \qquad \mathit{lg}\]
where \(\mathit{\_\!\!:\!\!n}\) does not yet occur in the graph

 

\[\frac{u \quad a \quad y \quad .}{a \quad \verb|rdf:type| \quad \verb|rdf:Property| \quad .} \qquad \mathrm{rdf1}\]
\[\frac{u \quad a \quad l \quad .}{\mathit{\_\!\!:\!\!n} \quad \verb|rdf:type| \quad \verb|rdf:XMLLiteral| \quad .} \qquad \mathrm{rdf2}\]
where \(\mathit{\_\!\!:\!\!n}\) does not yet occur in the graph, unless it has been introduced by a preceding application of the \(\mathrm{lg}\) rule

RDFS Entailment Rules (1)

\[\frac{}{u \quad a \quad x \quad .} \qquad \mathrm{rdfax}\]
for all RDFS axiomatic triples \(u \; a \; x \; .\)

\[\frac{u \quad a \quad l \quad .}{\mathit{\_\!\!:\!\!n} \quad \verb|rdf:type| \quad \verb|rdfs:Literal| \quad .} \qquad \mathrm{rdfs1}\] with \(\mathit{\_\!\!:\!\!n}\) as usual

\[\frac{a \quad \verb|rdfs:domain| \quad x \quad . \qquad u \quad a \quad y \quad .}{u \quad \verb|rdf:type| \quad x \quad .} \qquad \mathrm{rdfs2}\]
\[\frac{a \quad \verb|rdfs:range| \quad x \quad . \qquad u \quad a \quad v \quad .}{v \quad \verb|rdf:type| \quad x \quad .} \qquad \mathrm{rdfs2}\]
\[\frac{u \quad a \quad x \quad .}{u \quad \verb|rdf:type| \quad \verb|rdfs:Resource| \quad .} \qquad \mathrm{rdfs4a}\]
\[\frac{u \quad a \quad v \quad .}{v \quad \verb|rdf:type| \quad \verb|rdfs:Resource| \quad .} \qquad \mathrm{rdfs4b}\]

RDFS Entailment Rules (2)

 

\[\frac{u \quad \verb|rdfs:subPropertyOf| \quad v \quad . \qquad v \quad \verb|rdfs:subPropertyOf| \quad x \quad .}{u \quad \verb|rdfs:subPropertyOf| \quad x \quad .}\quad \mathrm{rdfs5}\]

\[\frac{u \quad \verb|rdf:type| \quad \verb|rdf:Property| \quad .}{u \quad \verb|rdfs:subPropertyOf| \quad u \quad .} \quad \mathrm{rdfs6}\]

\[\frac{a \quad \verb|rdfs:subPropertyOf| \quad b \quad . \qquad u \quad a \quad y \quad .}{u \quad b \quad y \quad .}\quad \mathrm{rdfs7}\]

\[\frac{u \quad \verb|rdf:type| \quad \verb|rdfs:Class| \quad .}{u \quad \verb|rdfs:subClassOf| \quad \verb|rdfs:Resource| \quad .} \quad \mathrm{rdfs8}\]

\[\frac{u \quad \verb|rdfs:subClassOf| \quad x \quad . \qquad v \quad \verb|rdf:type| \quad u \quad .}{u \quad \verb|rdf:type| \quad x \quad .}\quad \mathrm{rdfs9}\]

\[\frac{u \quad \verb|rdf:type| \quad \verb|rdfs:Class| \quad .}{u \quad \verb|rdfs:subClassOf| \quad u \quad .} \quad \mathrm{rdfs10}\]

RDFS Entailment Rules (3)

 

\[\frac{u \quad \verb|rdfs:subClassOf| \quad v \quad . \qquad v \quad \verb|rdfs:subClassOf| \quad x \quad .}{u \quad \verb|rdfs:subClassOf| \quad x \quad .}\quad \mathrm{rdfs11}\]

\[\frac{u \quad \verb|rdf:type| \quad \verb|rdfs:ContainerMembershipProperty| \quad .}{u \quad \verb|rdfs:subPropertyOf| \quad \verb|rdfs:member| \quad .}\quad \mathrm{rdfs12}\]

\[\frac{u \quad \verb|rdf:type| \quad \verb|rdfs:Datatype| \quad .}{u \quad \verb|rdfs:subClassOf| \quad \verb|rdfs:Literal| \quad .}\quad \mathrm{rdfs13}\]

\[\frac{u \quad a \quad \mathit{\_\!\!:\!\!n} \quad .}{u \quad a \quad l \quad .}\quad \mathrm{gl}\]
where \(\mathit{\_\!\!:\!\!n}\) identifies a blank node introduced by an earlier weakening of the literal \(l\) via the rule \(\mathrm{lg}\)

Completeness

The deduction rules for simple and RDF entailment are sound and complete.

The deduction rules for RDFS entailment are sound.

According to the spec [2] they are also complete, but they are not:

ex:isHappilyMarriedTo   rdfs:subPropertyOf      _:bnode .
_:bnode                 rdfs:domain             ex:Person .
ex:Bob                  ex:isHappilyMarriedTo   ex:Alice .

has the logical consequence

ex:Bob  rdf:type  ex:Person .

but is not derivable using the RDFS deduction rules.


Which rule(s) need to be changed and how in order to fix this?

Complexity

Simple, RDF, and RDFS entailment are NP-complete problems.

If we disallow blank nodes, all three entailment problems are polynomial [3].

Does RDFS semantics do what it should

Does

ex:speaksWith   rdfs:domain       ex:Homo .
ex:Homo         rdfs:subClassOf   ex:Primates .

entail

ex:speaksWith   rdfs:domain   ex:Primates .

?

Efficient RDFS Reasoning

  

Efficient RDFS Reasoning

 

References

  1. Pascal Hitzler et al.: Foundations of Semantic Web Technologies, Chapman & Hall, 2010
  2. Patrick Hayes, RDF Semantics, W3C Recommendation, http://www.w3.org/TR/rdf-mt/, W3C, 2004
  3. Herman J. ter Horst: Completeness, decidability and complexity of entailment for RDF Schema and a semantic extension involving the OWL vocabulary, J. Web Sem. 3(2-3): 79-115 (2005)

Model theoretic Semantics

Recap the notions “theory”, “logical consequence”, and “equivalence” and decide whether the following statements are true. Provide an (informal) explanation for your claim.

For arbitrary theories T and S holds:


1- If a formula F is a tautology then T |= F holds, i.e. every theory entails at least all tautologies.

2- The bigger a logical theory is, the more models it has. This means, if T ⊆ S, then every model of T is also a model of S.

3- If ¬F ∈ T, then T |= F cannot hold.



Shared blank nodes

What about blank nodes?


The universe {Alice, Bob, Monica, Ruth} with:
I( ex:Alice )=Alice, I( ex:Bob )=Bob, IEXT(I( ex:hasChild ))={<Alice,Monica>,<Bob,Ruth> }

Source : https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-mt/index.html

Tasks

  • Entail triples from the following Graph
    • ex:Garfield rdf:type ex:Cat
    • ex:Cat rdfs:subClassOf ex:Animal
    • ex:hasPet rdfs:range ex:Animal
    • ex:hasPet rdfs:domain ex:Person
    • ex:Person rdfs:subClassOf ex:Animal
    • ex:hasPet rdfs:subPropertyOf ex:livesWith
    • ex:Judie ex:hasPet ex:Casimir

Linked Data Stack - OWL

Ontology: OWL User Interface & Applications Trust Crypto Proof Unifying Logic Rules: RIF Query: SPARQL RDF-Schema Data Interchange: RDF XML URI Unicode

Ontology—Philosophy

  •  Singular only (there are no "ontologies")
  • The "science of being"
  • Found in Aristotle (Socrates), Thomas Aquinas, Descartes, Kant, Hegel, Wittgenstein, Heidegger, Quine, ....

Ontology—Computer Science

Gruber (1993): An Ontology is a formal specification of a shared conceptualization of a domain of interest.
  • Machine interpretable
  • Based on consensus
  • Describes concepts
  • Related to a topic (subject matter)

Ontology—Practical, some Requirements

  • Instantiation of classes by individuals
  • Concept hierarchies (taxonomies, inheritance): classes, terms
  • Binary relations between individuals: Properties, Roles
  • Properties of relations (e.g., range, transitive)
  • Data types (e.g. Numbers): concrete domains
  • Logical means expression
  • Clear semantics!

OWL—In General

  • OWL acronym for Web Ontology Language, more easily pronounced than WOL
  • family of languages for authoring ontologies
  • since 2004, OWL 2.0 since 2009
  • Semantic fragment of FOL
  • W3C-Documents contain details that can not all be addressed here

RDF Schema as Ontology Language?

  • suitable for simple ontologies
  • Advantage: automatic inference is relatively efficient
  • But unsuitable for modeling complex: but
  • Usage of more powerful languages ​​like
    • OWL
    • F-Logic
    • ... 

Individuals

Manchester Syntax

Individual: Einstein  Types: Professor

Turtle Syntax 

:Einstein rdf:type :Professor.

The head of an OWL document

Defining namespaces in the root
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
@prefix owl: <http://www.w3.org/2002/07/owl#>.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
Turtle Syntax

The head of an OWL document

General Information

:bookstoreOntology rdf:type owl:Ontology;
rdfs:comment "Bookstore Ontology";
owl:versionInfo "1.2";
owl:imports "http://library.org/books".

Turtle Syntax

The head of an OWL document

inherited from RDFS for versioning
rdfs:comment owl:versionInfo
rdfs:label owl:priorVersion
rdfs:seeAlso owl:backwardCompatibleWith
rdfs:isDefinedBy owl:incompatibleWith
owl:DeprecatedClass
owl:DeprecatedProperty
 
 
 
 
 
 
 
 
 

also owl:imports

OWL Documents

  • Consist of a set of Axioms
  • An axiom can be expressed as a set of RDF triples
  •  Use RDF / XML as a standard syntax
  • There are other formats that are often even easier to read and process
  • Simple example: Ontology http://my-ontology.org with two classes and a student and a person between which is a subclass-of relationship

OWL Documents

  • Several syntaxes for OWL according to the use case:
    • RDF/XML: Standard syntax / data exchange
    • OWL/XML: easier for XML tools
    • Turtle: easier to read and write RDF triples
    • MOS: easier to read and write DL ontologies
    • Functional: easier to see formal structure
  • Provision of RDF/XML "duty" upon publication of ontologies, other types optionally
  • OWL file consists:
    • Header with general information
    • Rest with actual ontology

OWL Syntax—Turtle

@prefix : <http://my-ontology.org/>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix owl : <http://www.w3.org/2002/07/owl#>.
@prefix rdf : <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@base <http://my-ontology.org/>.
<http://my-ontology.org/> rdf:type owl:Ontology.
:Person rdf:type owl:Class.
:Student rdf:type owl:Class ; rdfs:subClassOf :Person .
  • best allround RDF syntax
  • concise and easy to read
  • not specifically designed for OWL, complex expressions impractical
  • very good tool support

OWL Syntax—Manchester


Prefix: rdfs = <http://www.w3.org/2000/01/rdf-schema#>
Prefix: owl = <http://www.w3.org/2002/07/owl#>
Prefix: rdf = <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
Ontology: <http://my-ontology.org/>
Class: Person
  SubClassOf: owl:Thing
Class: Student
  SubClassOf: Person

  • very easy to read and write
  • cumbersome to write some types of axioms
  • Description Logic: Student ⊆ Person
  • functional syntax  
  • used for OBO (Open Biomedical Ontologies)

OWL Syntax—RDF/XML

<?xml version="1.0"?> <!DOCTYPE rdf:RDF [ <!ENTITY owl "http://www.w3.org/2002/07/owl#" > <!ENTITY rdfs "http://www.w3.org/2000/01/rdf-schema#" > <!ENTITY rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#" > ]> <rdf:RDF xmlns="http://my-ontology.org/" xml:base="http://my-ontology.org/" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <owl:Ontology rdf:about=""/> <owl:Class rdf:about="Person"> <rdfs:subClassOf rdf:resource="&owl;Thing"/> </owl:Class> <owl:Class rdf:about="Student"> <rdfs:subClassOf rdf:resource="Person"/> </owl:Class> </rdf:RDF>
  • best tool support
  • very verbose and difficult to read

OWL Syntax—OWL/XML

For the complete syntax see the W3C Recommendation .

  • specifically designed for OWL, thus more simple then RDF/XML
  • still very verbose
<?xml version="1.0"?>
<!DOCTYPE Ontology [
<!ENTITY owl "http://www.w3.org/2002/07/owl#" > [...] ]>
<Ontology [...] URI="http://my-ontology.org/">
    <SubClassOf>
        <Class URI="&my-ontology;Person"/>
        <Class URI="&owl;Thing"/>
    </SubClassOf>
    <SubClassOf>
        <Class URI="&my-ontology;Student"/>
        <Class URI="&my-ontology;Person"/>
    </SubClassOf>
</Ontology>

Classes, Roles and Individuals

The three components of Ontology axioms (analogous to RDFS):

  • Individuals / objects
    • Concrete elements in modeled world
  • Classes / concepts
    • Sets of objects
  • Roles / Properties
    • Associates two individuals

Classes

Definition

Turtle Syntax :Professor rdf:type owl:Class .
Manchester Syntax Class: Professor
 
 
 

predefined: owl:Thing , owl:Nothing

Abstract Roles (Turtle)

 Abstract roles are defined like classes

 :
 belongsTo 
 rdf:type owl:ObjectProperty. 
Turtle Syntax

Domain and Range of abstract roles

 :
 belongsTo 
 rdf:type owl:ObjectProperty;    rdfs:domain :Person;    rdfs:range :Organisation. 
Turtle Syntax

Abstract Roles (Manchester)

Abstract roles are defined like classes

 ObjectProperty: belongsTo
Manchester Syntax

Domain and Range of abstract roles

 ObjectProperty: belongsTo  Domain: Person  Range:  Organisation
Manchester Syntax

Concrete roles

Concrete roles have a data type in the range

DatatypeProperty: firstName 
Manchester Syntax

Domain and Range of concrete roles

DatatypeProperty: firstName
 Domain: Person
 Range: string 
Manchester Syntax  

Concrete roles

Concrete roles have a data type in the range

 :firstName rdf:type owl:DatatypeProperty.  
Turtle Syntax

Domain and Range of concrete roles

 :firstName rdf:type owl:DatatypeProperty;
         rdfs:domain :Person;
         rdfs:range  xsd:String.  
 Turtle Syntax  

Individuals and roles

:Einstein           rdf:type             :Professor ;
                    :belongsTo           :Princeton,
                                         :Bern;
                    :firstName           "Albert"^^xsd:string .
Turtle Syntax

Roles are generally not functional.

Individuals and roles

Individual: Einstein
 Types: Person
 Facts: belongsTo Princeton, belongsTo Bern,  firstName Albert
Manchester Syntax

Roles are generally not functional.

Class Relations

 

owl:equivalentClass

It follows by inference, that Book is a subclass of Publications .

    :Book                 rdf:type                    owl:Class;
    rdfs:subClassOf        :Publication.
    :Publication        rdf:type                    owl:Class;
    owl:equivalentClass :Publications.
    :Publications      rdf:type                    owl:Class .
  
Turtle Syntax
Class: Book
 SubClassOf: Publication

Class: Publication
 EquivalentTo: Publications

Class: Publications
Manchester Syntax

rdfs:subClassOf

:Professor           rdf:type          owl:Class;
                     rdfs:subClassOf   :FacultyStaff.

:FacultyStaff        rdf:type          owl:Class ;
                     rdfs:subClassOf   :Person .  
Turtle Syntax
Class: Professor
 SubClassOf: FacultyStaff


Class: FacultyStaff
 SubClassOf: Person  
Manchester Syntax 

It follows by inference, that Professor is a subclass of Person.

owl:disjointWith


:Book rdf:type owl:Class; rdfs:subClassOf :Publication.
:FacultyStaff rdf:type owl:Class; owl:disjointWith :Publication.
:Professor rdf:type owl:Class; rdfs:subClassOf :FacultyStaff.
:Publication rdf:type owl:Class.

Turtle Syntax

Class: Professor
 SubClassOf: FacultyStaff
Class: Book
 SubClassOf: Publication
Class: FacultyStaff
 DisjointWith: Publication 

Manchester Syntax

It follows by inference, that Professor and Book are also disjoint classes.

Individuals and class relations


:Book                    rdf:type          owl:Class ; 
                         rdfs:subClassOf   :Publication . 
:SemanticWebGrundlagen   rdf:type          :Book ;     
                         :autor            :MarkusKroetzsch,
                                           :PascalHitzler, 
                                           :SebastianRudolph,  
                                           :YorkSure .
Turtle Syntax

It follows by inference, that SemanticWebGrundlagen is a Publication.

Individuals and class relations

Class: Buch   SubClassOf: Publication 
Individual: SemanticWebGrundlagen  
 Types: Buch  
 Facts: autor PascalHitzler, autor MarkusKroetzsch,
        autor SebastianRudolph,       
        autor YorkSure   
 
Manchester Syntax

It follows by inference, that SemanticWebGrundlagen is a Publikation.

Role Relationships

Analogous to classes, for roles there is rdfs:subPropertyOf and owl:equivalentProperty. However, Roles can also be inverses (owl: inverseOf) of each other:

:examinedBy owl:inverseOf :examinerOf.

Turtle Syntax

ObjectProperty: examinedBy   InverseOf: examinerOf

Manchester Syntax 

Role Properties

  • Domain
  • Range
  • Transitivity, i.e. r (a, b) and r (b, c) implies r(a,c)
  • Symmetry, i.e. r (a, b) implies r (b, a)
  • Functionality r (a, b) and r (a, c) implies b=c
  • Inverse functionality r (a, b) and r (c, b) implies a=c

Relationships between individuals

 Individual: Faehnrich   SameAs: ProfessorFaehnrich  
Manchester Syntax
 :Faehnrich rdf:Type :Professor;            owl:sameAs :ProfessorFaehnrich.  
Turtle Syntax

It follows by inference, that ProfessorFaehnrich is a Professor.

Inequality of individuals is specified by owl:differentFrom.

Relationships between individuals

DifferentIndividuals: YorkSure, PascalHitzler, RudiStuder
Manchester Syntax
[ rdf:type owl:AllDifferent ;    
  owl:distinctMembers ( :PascalHitzler  :RudiStuder  :YorkSure) 
] .
Turtle Syntax

Task—Contradiction?

Class: Organisation
Class: Office  SubClassOf: wgs84:SpatialThing
ObjectProperty: hasOffice
 Domain: Organisation
 Range: Office
#imported axioms to longitude and latitude
ObjectProperty: wgs84:lat
 Domain: wgs84:SpatialThing
ObjectProperty: wgs84:long
 Domain: wgs84:SpatialThing
Individual: CompanyXY
 Types: Organisation
 Facts: hasOffice OfficeXY,
        wgs84:lat "41.8292928"^^xsd:double,
        wgs84:long "17.8282"^^xsd:double      
Manchester Syntax 

Enumerations/Nominals


Class: MoonOfMars  EquivalentTo: {Phobos, Deimos} 

Manchester Syntax


:MoonsOfMars rdf:type owl:Class;
owl:equivalentClass
[
rdf:type owl:Class ;
owl:oneOf ( :Phobos :Deimos )
] .

Turtle Syntax


This states that mars has just these two moons.

Logical class constructors

  • Logical AND (conjunction): owl:intersectionOf
  • Logical OR (disjunction): owl:unionOf
  • Logical NOT (negation): owl:complementOf
  • Used to construct complex classes from simple classes

Intersection

Class: MoonOfMars   EquivalentTo: Moon and ObjectNearMars 
Manchester Syntax
:MoonOfMars
  owl:equivalentClass
[
 rdf:type            owl:Class ;
 owl:intersectionOf  (:Moon :ObjectNearMars)
 ].
Turtle Syntax
It follows e.g. by inference that all moons of Mars are objects near Mars.

Union

Class: Boat   SubClassOf: SailBoat or MotorBoat
Manchester Syntax
:Boat  rdfs:subClassOf
 [
  rdf:type            owl:Class ;
  owl:UnionOf  ( :SailBoat :MotorBoat )
 ] . 
Turtle Syntax

Disjoint Union

A professor can either be active or retired but not both at the same time.

<owl:Class rdf:about="#Professor">
 <rdfs:subClassOf>
  <owl:unionOf rdf:parseType="Collection">
   <owl:Class rdf:about="#Active"/>
   <owl:Class rdf:about="#Retired"/>
  </owl:unionOf>
 </rdfs:subClassOf>
</owl:Class>

RDF/XML Syntax

Class: Professor   DisjointUnionOf: Active, Retired 

Manchester Syntax
:Professor  rdfs:subClassOf
[
rdf:type            owl:Class ;
owl:DisjoinUnionOf  ( :Active :Retired )
] .
Turtle Syntax

Complement

<owl:Class rdf:ID="FacultyStaff">
 <rdfs:subClassOf>
  <owl:complementOf rdf:resource="#Publication"/>
 </rdfs:subClassOf>
</owl:Class>
<!-- semantically equivalent statement: -->
<owl:Class rdf:ID="FacultyStaff">
 <owl:disjointWith rdf:resource="#Publication"/>
</owl:Class> 
RDF/XML Syntax
Class: FacultyStaff   SubClassOf: not Publication 
Manchester Syntax
:FacultyStaff rdf:type owl:Class ;
 rdfs:subClassOf
 [
  rdf:type owl:Class ;
  owl:complementOf :Publication
 ].
Turtle Syntax

Modeling Example

Modeling task: all persons are either male or female.

:Male       rdfs:subClassOf  :Person .
:Female     rdfs:subClassOf  :Person ;
            owl:disjointWith  :Male .

:Person     owl:equivalentClass
[
 rdf:type owl:Class ;
 owl:unionOf (:Male, :Female )
]. 
Solution (Turtle Syntax)
Class: Male    SubClassOf: Person 
Class: Female   SubClassOf: Person   DisjointWith: Male 
Class: Person   EquivalentTo: Male or Female  
Solution (Manchester Syntax)

Role limitations (allValuesFrom)

Are used to define complex classes by roles

<owl:Class rdf:ID="Exam">   
 <rdfs:subClassOf>         
  <owl:Restriction>       
   <owl:onProperty rdf:resource="#examiner"/>             
    <owl:allValuesFrom rdf:resource="#Professor"/>    
  </owl:Restriction>     
 </rdfs:subClassOf> 
</owl:Class> 
Manchester Syntax 
:Pruefung   rdfs:subClassOf 
[ rdf:type  owl:Restriction ;   owl:onProperty  :examiner ;   
owl:allVauesFrom  :Professor ] . 
Turtle Syntax
I.e. all examiners must be professors.

Role limitations (someValuesFrom)

<owl:Class rdf:ID="Exam">     
 <rdfs:subClassOf>         
  <owl:Restriction>             
   <owl:onProperty rdf:resource="#examiner"/>             
    <owl:someValuesFrom rdf:resource="#Person"/>         
  </owl:Restriction>     
 </rdfs:subClassOf> 
</owl:Class> 
RDF/XML Syntax
 Class: Exam   SubClassOf: 
 examiner 
  some Person  
Manchester Syntax
:Pruefung   rdfs:subClassOf            
[ rdf:type    owl:Restriction ;   owl:onProperty   :examiner;   
 owl:someVauesFrom   :Person ] .
 
Turtle Syntax
I.e. each test must have at least one examiner.

Role limitations (cardinalities)

Class: Exam   SubClassOf: examiner max 2
Manchester Syntax
:Exam      rdf:type     owl:Class ; 
           rdfs:subClassOf 
[rdf:type owl:Restriction ;                
 owl:onProperty :examiner ;        
 owl:maxCardinality "2"^^xsd:nonNegativeInteger] . 
Turtle Syntax

I.e. each test can have at most two examiners.

Analogous to max: min, exactly

Modeling Example

Modeling task: A performance requirement (of a software product) is a requirement, which is created by customers. It leads to a system requirement.
Class: PerformanceRequirement  SubClassOf: Requirement
 and (createdBy only Customer)
 and (leadsTo some SystemRequirement 
Solution (Manchester Syntax)

Role limitations (hasValue)

<owl:Class rdf:ID="ExamAtFaehnrich">     
 <rdfs:equivalentClass>         
  <owl:Restriction>           
   <owl:onProperty rdf:resource="#examiner"/> 
    <owl:hasValue rdf:datatype="#Faehnrich"/>         
  </owl:Restriction>     
 </rdfs:equivalentClass> 
</owl:Class> 
RDF/XML Syntax
Class: PruefungBeiFaehnrich   EquivalentTo: examiner value Faehnrich 
Manchester Syntax
:Exam   rdfs:equivalentClass        
 [rdf:type  owl:Restriction ;   owl:onProperty  :examiner ;     
            owl:hasValue :Faehnrich ] . 

Turtle Syntax

Domain and Range

ObjectProperty: belongsTo   Range: Organisation
Manchester Syntax
is equivalent to the following:
Class: owl:Thing   SubClassOf: belongsTo only Organisation
Manchester Syntax

Domain and Range

:belongsTo   rdf:type  owl:ObjectProperty ;  rdfs:range  :Organisation . 
Turtle Syntax

is equivalent to the following:
owl:Thing   rdfs:subClassOf     
 [ rdf:type   owl:Restriction ; owl:onProperty :belongsTo ; 
  owl:allValuesFrom   :Organisation ] . 
Turtle Syntax

Domain and Range

How do you model the domain D of a property p without using the Domain keyword?

Hint: Whenever p occurs at least once on something, this thing is of type D. For “at least once”, think of cardinality constraints or of existential quantification.

Domain and Range: Caution!

<owl:ObjectProperty rdf:ID="belongsTo">
 <rdfs:range rdf:resource="#Organisation"/>
 </owl:ObjectProperty> <Number rdf:ID="Five">
 <belongsTo rdf:resource="#Primes"/>
</Number>       
RDF/XML Syntax
ObjectProperty:
 belongsTo   Range: Organisation
Individual: Five
 Types: Number
 Facts: belongsTo Primes
Manchester Syntax
:belongsTo    rdf:type         owl:ObjectProperty ;
              rdfs:range       :Organisation.
:Five         rdf:type         :Number;
              :belongsTo       :Primes. 
Turtle Syntax

It now follows that Primes is an organization!

Role Properties

<owl:ObjectProperty rdf:ID="hasColleague">
 <rdf:type rdf:resource="&owl;TransitiveProperty"/>
 <rdf:type rdf:resource="&owl;SymmetricProperty"/>
</owl:ObjectProperty>
<owl:ObjectProperty rdf:ID="hasProjectManager">
 <rdf:type rdf:resource="&owl;FunctionalProperty"/>
</owl:ObjectProperty> <owl:ObjectProperty rdf:ID="isProjectManagerFor">
      <rdf:type rdf:resource="&owl;InverseFunctionalProperty"/>
</owl:ObjectProperty>
<Person rdf:ID="SoerenAuer">
<hasColleague rdf:resource="#SebastianTramp"/>
<hasColleague rdf:resource="#JensLehmann"/>
<isProjectManagerFor rdf:resource="#Triplify"/>
</Person>
<Project rdf:ID="OntoWiki">
 <hasProjectManager rdf:resource="#SoerenAuer"/>
 <hasProjectManager rdf:resource="#AuerSoeren"/>
</Project> 
RDF/XML Syntax

Conclusions from the example

  • SebastianTramp hasColleagues SoerenAuer
  • SebastianTramp hasColleagues JensLehmann
  • SoerenAuer owl:sameAs AuerSoeren

Negative Facts

Only possible since OWL 2.0.
 Individual: Bill
   Facts: not hasWife Mary
Manchester Syntax

 []  rdf:type               owl:NegativePropertyAssertion ;
     owl:sourceIndividual   :Bill ;
     owl:assertionProperty  :hasWife ;
     owl:targetIndividual   :Mary .
Turtle Syntax

Modeling Example

Modeling task
There are male and female persons. People have names.

Modeling Example

Modeling task

There are male and female persons. People have names.

:MalePerson    rdfs:subClassOf  :Person.
:FemalePerson  rdfs:subClassOf  :Person.
:hasName       rdf:type         owl:DatatypeProperty;
               rdfs:domain      :Person;
               rdfs:range       xsd:string.
Solution (Turtle Syntax)

Modeling Example

Modeling task
There are male and female persons. Persons have names.
:MalePerson    rdfs:subClassOf  :Person.
:FemalePerson  rdfs:subClassOf  :Person.
:hasName       rdf:type         owl:DatatypeProperty;
               rdfs:domain      :Person;
               rdfs:range       xsd:string.
:John          rdf:type         :MalePerson;
               :hasName         "John"^^xsd:string.
:Jill          rdf:type         :FemalePerson.
:hasName       "Jill"^^xsd:string .  
Solution (Turtle Syntax)

OWL 1 vs OWL 2 and History

  • OWL became W3C recommendation in 2004, OWL 2 in 2009
  • OWL 2 but fully backward compatible extension of OWL  
  • different language subsets: OWL variants vs OWL 2 profiles
  • syntactic sugar (already possible but now easier to express)
    • disjoint union of classes
  • new expressivity
    • keys
    • property chains
    • richer datatypes, data ranges
    • qualified cardinality restrictions
    • asymmetric, reflexive, and disjoint properties
    • enhanced annotation capabilities

OWL 2 Full

  • Unlimited use of all OWL and RDF language elements (must be valid RDFS)
  •  Difficult: non-existent type separation (classes, roles, individuals), thus:
    • owl:Thing the same as rdfs:Resource
    • owl:Class the same as rdfs:Class
    • owl:DatatypeProperty subclass of owl:ObjectProperty
    • owl:ObjectProperty the same as rdf:Property

OWL 2 Profiles

  • Three OWL profiles added in OWL 2.0: OWL QL, EL and RL.
  • Design principle for profiles:
  • Identify maximal OWL 2 sublanguages that are still implementable in polynomial time.
  • In general disallow negation and disjunction, because that complicates reasoning and is rarely needed

OWL QL

  • QL = Query Language  
  • based on the DL-Lite family of description logics
  • designed for query answering using SQL rewriting on top of relational databases
  • subclasses can only be class names or existentials with unrestricted filler  
  • superclasses can be class names, existentials or conjunctions with superclass filler (recursive) or negations with subclass filler

Allowed

\[\text{Fish} \sqsubseteq \text{Animal}\]

\[\exists \text{hasHouse}.\top \sqsubseteq \text{Landlord}\]

Forbidden

\[\exists \text{hasHouse}.\text{Villa} \sqsubseteq \text{RichLandlord}\]

\[\text{Student} \sqsubseteq \text{Poor} \sqcap \exists \text{hasBike}.\top \sqcap \neg \text{Pupil}\]


Details About OWL QL (1)

Forbidden


equality \[ \text{RichPerson} = \text{FamousPerson} \]

disjunctions \[ \text{Book} = \text{Ebook} \sqcup \text{PrintedBook} \]

universals \[ \forall \text{killed}.\text{Human} = \text{Murderer} \]

self \[\text{loves}.\text{Self} = \text{NarcisticPerson} \] 

cardinalities \[ \geq 5 \text{wonFight}.\top = \text{SuccessfulBoxer} \]

Details About OWL QL (2)

Forbidden

keys

Class: Person
HasKey: hasSSN
property chains
ObjectProperty: hasGrandparent
SubPropertyChain: hasParent o hasParent
transitive properties
ObjectProperty: hasAncestor
Characteristics: Transitive
nominals
EquivalentClasses(
:MyBirthdayGuests
ObjectOneOf(:Bill :John :Mary) )
functional properties
ObjectProperty: hasHusband
Characteristics: Functional

OWL EL

  • based on description logic \(\mathcal{EL}\)++, \(\mathcal{E}\) stands for full existential qualification
  • focus on terminlogical expressivity used for light-weight ontologies
  • allow existential but not universal, only rdfs:range (special kind of universals) allowed with restrictions
  • property domains, class/property hierarchies, class intersections, disjoint classes/properties, property chains, Self, nominals (classes with enumerated individual members) and keys fully supported
  • no inverse or symmetric properties, no disjunctions or negations

Examples

\[\exists \text{has.Sorrow} \sqsubseteq \exists \text{has}.\text{Liqueur}\]

\[\top \sqsubseteq \exists \text{hasParent}.\text{Person}\]

\[\text{German} \sqsubseteq \exists \text{knows}.\text{{angela}}\]

\[\text{hasParent} \circ \text{hasParent} \sqsubseteq \text{hasGrandparent}\]

OWL RL

  • RL = Rule Language
  • subclass axioms as rule-like implications with head (superclass) and body (subclass)
  • \(\text{DaysWithRain} \sqsubseteq \text{DaysWithWetStreet}\)

Protégé OWL editor

 http://protege.stanford.edu

Further Reading

Manchester Format

  • Convert the following using the Manchester Format
    • ex:Garfield rdf:type ex:Cat
    • ex:Cat rdfs:subClassOf ex:Animal
    • ex:hasPet rdfs:range ex:Animal
    • ex:hasPet rdfs:domain ex:Person
    • ex:Person rdfs:subClassOf ex:Animal
    • ex:hasPet rdfs:subPropertyOf ex:livesWith
    • ex:Judie ex:hasPet ex:Casimir

Entailment

  • Entail Axioms and Facts from Ontology
    • Individual: ex:Garfield
      • Types: ex:Cat
      • Facts: ex:knows ex:Odie, ex:livesWith ex:Casimir
    • Class: ex:Cat SubClassOf: ex:Animal
    • ObjectProperty: ex:hasPet
      • Range: ex:Animal
      • Domain: ex:Person
      • SubPropertyOf: ex:livesWith
    • Class: ex:Person SubClassOf: ex:Animal
      • DisjointWith: ex:Cat
    • Individual: ex:Judie
      • Facts: ex:hasPet ex:Casimir
    • ObjectProperty: ex:livesWith
      • InverseOf: ex:livesWith
      • SubPropertyOf: ex:knows
      • Characteristics: Transitive
  • Convert to Turtle Syntax
    • ObjectProperty: ex:livesWith
      • InverseOf: ex:livesWith
      • SubPropertyOf: ex:knows
      • Characteristics: Transitive
  • Give 3 Models of the ex:livesWith Property for: 1 person, 2 persons, infinite number of persons

Try Protégé OWL editor out!

There are to possibilities to use Protégé;

1. Download it on your desktop

2. Use it online

Source:http://protege.stanford.edu/

Protégé on your desktop

You can easily download and run Protégé OWL editor on your desktop.

Source: http://protege.stanford.edu/products.php#desktop-protege

Using Protégé online!

Sign up and make a new project

To DO: make a class.

Introduction

 

Goals

Learn about

  • Semantics of OWL
  • Description Logics
  • Tableau reasoning

Prerequisites

  • Basic knowledge of Propositional Logic
  • Basic knowledge of First-Order Logic


Semantics of OWL

The semantics of OWL is based on Description Logics, which have a model theoretic formal semantics.

   OWL DL corresponds to \(\mathcal{SHOIN}(D)\).

   OWL 2 corresponds to \(\mathcal{SROIQ}(D)\).

In this lecture we will explain the semantics of OWL DL by describing the description logic \(\mathcal{SHOIN}(D)\)

Description Logics

  • family of knowledge representation languages
  • usually fragments of First Order Logic (FOL)
  • in most cases decidable
  • comparatively expressive
  • originated from Semantic Networks
  • intuitive syntax
  • variable free (e.g. P ⊑ Q instead of ∀ x . p(x) → q(x))


Description Logics - Basic Components

Basic components:

  • concept names (atomic concepts), e.g. Student, Book, ...
  • role names, e.g. bornIn, worksFor, ...
  • individual names (individuals, objects), e.g. Steven, Mary, ...

The set of concept, role and individual names is often denoted as signature or vocabulary.

Description Logics - Knowledge Base

Usually a DL knowledge base consists of:

  • TBox \(\mathcal{T}\) : Information about concepts.
  • ABox \(\mathcal{A}\) : Information about individuals.

Additionally, in more expressive DLs:

  • RBox \(\mathcal{R}\) : Information about roles.

\(\mathcal{ALC}\) - Concepts

\(\mathcal{ALC}\), Attributive Language with Complement, is the most simple propositionally closed (i.e. subsuming propositional logic) DL.  

(Complex) \(\mathcal{ALC}\) concepts are inductively defined as follows:

  • every concept name is a concept,
  • \(\mathcal{\top}\) and \(\mathcal{\bot}\) are concepts,
  • if \(C\) and \(D\) are concepts and \(r\) is a role then the following are concepts:
    • \(\neg C\) (often called negation or complement)
    • \(C \sqcap D\) (often called conjunction, intersection or "and")
    • \(C \sqcup D\) (often called disjunction, union or "or")
    • \(\exists r.C\) (often called existential restriction)
    • \(\forall r.C\) (often called value restriction)

\(\mathcal{ALC}\) Concepts - Examples

\(\text{Person} \sqcap \exists \text{hasChild}.\top\)

  • Persons with children

\(\text{Animal} \sqcap \forall \text{eat}.\text{Vegetable}\)

  • Animals which only eat vegetables

\(\text{Professor} \sqcup \text{Student}\)

  • Professors or students

\(\text{Person} \sqcap \forall\text{bornIn}.\neg \text{EuropeanCountry}\)

  • Persons which are not born in a country in Europe

\(\mathcal{ALC}\) - TBox

A TBox (terminological box) consists of a finite set of terminological axioms.

Terminological axioms are basically (general) concept inclusion axioms , i.e. given concepts \(C\) and \(D\), GCIs are denoted as

\[C \sqsubseteq D\] 

The concept equality can be expressed with

\[C \equiv D\]

which is an abbreviation for \(C \sqsubseteq D\) and \(D \sqsubseteq C\).

\(\mathcal{ALC}\) - ABox

An ABox consists of a finite set of assertional axioms of the following forms:

  • \(C(a)\), so-called concept assertions
  • \(r(a,b)\), so-called role assertions

\(\mathcal{ALC}\) - Semantics (1)

A formal definiton of the model-theoretic semantics  of \(\mathcal{ALC}\) is given by means of an interpretation \(\mathcal{I} = (\Delta^{\mathcal{I}}, \cdot^{ \mathcal{I}})\) consisting of

  • a non-empty domain \(\Delta^{ \mathcal{I}}\)
  • a mapping \( \cdot^{ \mathcal{I} }\), which maps
    • every individual \(a\) to a domain element \(a^{\mathcal{I}}\in\Delta^{\mathcal{I}}\)
    • every concept name \(A\) to a subset of domain elements \(A^{\mathcal{I}}\subseteq\Delta^{\mathcal{I}}\)
    • every role \(r\) to a set of pairs of domain elements \(r^{\mathcal{I}} \subseteq \Delta^{\mathcal{I}}\times\Delta^{\mathcal{I}}\)

\(\mathcal{ALC}\) - Semantics (2)

  • Interpretation is extended to complex concepts by
    • \(\top^\mathcal{I}=\Delta^\mathcal{I}\)
    • \(\bot^\mathcal{I}=\emptyset\)
    • \((\neg C)^\mathcal{I}=\Delta^\mathcal{I} \setminus C^\mathcal{I} \)
    • \((C \sqcap D)^\mathcal{I} = C^\mathcal{I} \cap D^\mathcal{I} \)
    • \( (C \sqcup D)^\mathcal{I}=C^\mathcal{I} \cup D^\mathcal{I} \)
    • \((\exists r.C)^\mathcal{I}= \{x \in \Delta^\mathcal{I} | \exists y \in C^\mathcal{I} \text{ such that } (x,y) \in r^\mathcal{I} \}\)
    • \((\forall r.C)^\mathcal{I}= \{x \in \Delta^\mathcal{I} | \forall (x,y) \in r^\mathcal{I} \rightarrow y \in C^\mathcal{I} \} \)

 

\(\mathcal{ALC}\) - Semantics (3)

  •  Interpretation is extended to axioms by
    • \(\mathcal{I} \models C \sqsubseteq D \text{ iff } C^\mathcal{I} \subseteq D^\mathcal{I} \)
    • \(\mathcal{I} \models C \equiv D \text{ iff } C^\mathcal{I} = D^\mathcal{I} \)
    • \(\mathcal{I} \models C(a) \text{ iff } a^\mathcal{I} \in C^\mathcal{I} \)
    • \(\mathcal{I} \models r(a,b) \text{ iff } (a^\mathcal{I},b^\mathcal{I}) \in r^\mathcal{I} \)

\(\mathcal{ALC}\) Knowledge Base - Example

TBox \(\mathcal{T}\) :

\( \begin{align} \text{Man} &\equiv \neg \text{Woman} \sqcap \text{Person}\\ \text{Woman} &\sqsubseteq \text{Person}\\ \text{Mother} &\equiv \text{Woman} \sqcap \exists \text{hasChild}.\top \\ \end{align} \)

ABox \(\mathcal{A}\) :

\( \begin{align} \text{Man(STEPHEN)}\\ \neg\text{Man(MONICA)}\\ \text{Woman(JESSICA)}\\ \text{hasChild(STEPHEN, JESSICA)}\\ \end{align} \)

\(\mathcal{ALC}\) in OWL

The \(\mathcal{ALC}\) operators correspond to the following OWL expressions:

\( \begin{align} \top &: \text{owl:Thing}\\ \bot &: \text{owl:Nothing}\\ \neg &: \text{owl:complementOf}\\ \sqcup &: \text{owl:unionOf}\\ \sqcap &: \text{owl:intersectionOf}\\ \exists &: \text{owl:someValuesFrom}\\ \forall &: \text{owl:allValuesFrom}\\ \end{align} \)


\(\mathcal{ALC}\) + Inverse Roles

Naming: \(\mathcal{ALCI}\)

A role can be

  • a role name \(r\), or
  • a inverse role \(r^-\)

The semantics of inverse roles is defined by

\[ (r^-)^\mathcal{I} = \{(y,x) | (x,y) \in r^\mathcal{I} \}\]

OWL construct: owl:inverseOf

\(\mathcal{ALC}\) + Role Hierarchy

Naming: \(\mathcal{ALCH}\)

For roles \(r,s\) 

  • a role inclusion axiom (RIA) is denoted by \(r \sqsubseteq s\)
  • \(r \equiv s\) is an abbreviation for \(r \sqsubseteq s\) and \(s \sqsubseteq r\)

An interpretation \(\mathcal{I}\) entails \(r \sqsubseteq s\) iff \(r^\mathcal{I} \subseteq s^\mathcal{I}\).

OWL construct: rdfs:subPropertyOf

\(\mathcal{ALC}\) + Role Transitivity

Naming: \(\mathcal{ALC}\) + Transitivity = \(\mathcal{S}\)

For a role \(r\) 

  • a transitivity axiom is denoted by \(\text{Trans}(r)\)

An interpretation \(\mathcal{I}\) entails \(\text{Trans}(r)\) iff

\[(x,y) \in r^\mathcal{I} \wedge (y,z) \in r^\mathcal{I} \rightarrow (x,z) \in r^\mathcal{I}.\] 

OWL construct: owl:TransitiveProperty

\(\mathcal{ALC}\) + Role Functionality

Naming: \(\mathcal{ALCF}\)

For a role \(r\) 

  • a functionality axiom is denoted by \(\text{Func}(r)\)

An interpretation \(\mathcal{I}\) entails \(\text{Func}(r)\) iff

\[(x,y) \in r^\mathcal{I} \wedge (x,z) \in r^\mathcal{I} \rightarrow y=z.\] 

OWL construct: owl:FunctionalProperty

Simple vs. Complex Roles

Let \(\mathcal{R}\) be a role hierarchy and let \(\sqsubseteq^{*}_{\mathcal{R}}\) be its reflexive and transitive closure.

  • A role \(r\) is complex w.r.t. \(\mathcal{R}\), if there exists a role \(s\) such that \(\text{Trans}(s) \in \mathcal{R}\) and \(s \sqsubseteq^{*}_{\mathcal{R}} r\).
  • Otherwise, the role \(r\) is simple .

Example:

\[\mathcal{R}=\{ u \sqsubseteq r , r \sqsubseteq s , s \sqsubseteq t , q \sqsubseteq t , \text{Trans}(r) \}\]

Complex: \(r,s,t\)

Simple: \(u,q\)

\(\mathcal{ALC}\) + Unqualified Number Restrictions

Naming: \(\mathcal{ALCN}\)

For a simple role \(r\) and a natural number \(n\), number restrictions \(\geq n r, \leq n r, = n r\) are concepts which semantics is defined as

\(\begin{align} (\geq n r)^\mathcal{I} &= \{x \in \Delta^\mathcal{I} | \#\{y \in \Delta^\mathcal{I} | (x,y) \in r^\mathcal{I} \} \geq n \}\\ (\leq n r)^\mathcal{I} &= \{x \in \Delta^\mathcal{I} | \#\{y \in \Delta^\mathcal{I} | (x,y) \in r^\mathcal{I} \} \leq n \}\\ (= n r)^\mathcal{I} &= \{x \in \Delta^\mathcal{I} | \#\{y \in \Delta^\mathcal{I} | (x,y) \in r^\mathcal{I} \} = n \}\\ \end{align}\)

OWL constructs:

\( \begin{align} \geq n r &= \text{owl:minCardinality}\\ \leq n r &= \text{owl:maxCardinality}\\ = n r &= \text{owl:cardinality} \end{align} \)

\(\mathcal{ALC}\) + Nominals

Naming: \(\mathcal{ALCO}\)

Let \(a_1, \ldots, a_n\) be individuals. A nominal \(\{a_1, \ldots, a_n\}\) is a concept which semantics is defined as

\( (\{a_1, \ldots, a_n\})^\mathcal{I}=\{a_1^\mathcal{I}, \ldots, a_n^\mathcal{I}\} \)

OWL construct: owl:oneOf

Logical Reasoning

Deductive Reasoning

  • starts with the assertion of a general rule and proceeds from there to a guaranteed specific conclusion
  • "from the general rule to the specific application"   

Inductive Reasoning

  • begins with observations that are specific and limited in scope, and proceeds to a generalized conclusion that is likely, but not certain, in light of accumulated evidence
  • "from the specific to the general"

 Abductive Reasoning

  • begins with an incomplete set of observations and proceeds to the likeliest possible explanation for the set   

Logical Reasoning in Description Logics (1)

 

Logical Reasoning in Description Logics (2)

Let \(\mathcal{I}\) be an interpretation, \(\mathcal{T}\) be a TBox, \(\mathcal{A}\) be an ABox and \(\mathcal{K}=(\mathcal{T},\mathcal{A})\) be a knowledge base.We say

  • \(\mathcal{I}\) is a model for \(\mathcal{T}\), iff \(\mathcal{I} \models \alpha\) for every axiom \(\alpha \in \mathcal{T}\), written \(\mathcal{I} \models \mathcal{T}\). 
  • \(\mathcal{I}\) is a model for \(\mathcal{A}\), iff \(\mathcal{I} \models \alpha\) for every axiom \(\alpha \in \mathcal{A}\), written \(\mathcal{I} \models \mathcal{A}\). 
  • \(\mathcal{I}\) is a model for \(\mathcal{K}\), iff \(\mathcal{I} \models \mathcal{T}\) and \(\mathcal{I} \models \mathcal{A}\).
  • An axiom \(\alpha\) is entailed by \(\mathcal{K}\), written \(\mathcal{K} \models \alpha\), iff every model \(\mathcal{I}\) of \(\mathcal{K}\) is a model for \(\alpha\).

Reasoning Services (1)

 

Concept Satisfiability

\[ \mathcal{K} \not \models C \equiv \bot \]

The problem of checking whether \(C\) is satisfiable w.r.t. \( \mathcal{K} \), i.e. whether there exists a model \(\mathcal{I}\) of \( \mathcal{K} \)  such that \(C^{\mathcal{I}} \neq \emptyset \).

Subsumption

\[ \mathcal{K} \models C \sqsubseteq D\]

The problem of checking whether \(C\) is subsumed by \(D\) w.r.t. \(\mathcal{K}\), i.e. whether \(C^{\mathcal{I}} \subseteq D ^{\mathcal{I}}  \) in every model \(\mathcal{I}\) of  \(\mathcal{K}\) .

Satisfiability (Consistency)

\[ \mathcal{K} \not \models \top \sqsubseteq \bot\]

The problem of checking whether \(\mathcal{K}\) is consistent, i.e. whether it has a model .

Reasoning Services (2)

Instance Checking

\[ \mathcal{K} \models C(a)\]

The problem of checking whether the assertion \(C(a)\) is satisfied w.r.t. \(\mathcal{K}\), i.e. whether \(a^{\mathcal{I}} \in C^{\mathcal{I}}\) in every model \(\mathcal{I}\) of \(\mathcal{K}\).

Retrieval

\[\{a | \mathcal{K} \models C(a)\}\]

The problem of finding all individuals \(a\) which belong to concept \(C\) w.r.t. \(K\), i.e. find all \(a\) for a given \(C\) such that \(a^{\mathcal{I}}\in C^{\mathcal{I}}\) in every model \(\mathcal{I}\) of \(\mathcal{K}\).

Realization

\[\{C | \mathcal{K} \models C(a)\}\]

The problem of finding all named classes \(C\) which an indivdual \(a\) belongs to w.r.t. \(K\), i.e. find all \(C\) for a given \(a\) such that \(a^{\mathcal{I}}\in C^{\mathcal{I}}\) in every model \(\mathcal{I}\) of \(\mathcal{K}\).

Reasoning Services (3)

We can reduce all services to satisfiability check:

Concept Satisfiability

\(K \not \models C \equiv \bot \longleftrightarrow\) exists a \(x\) such that \(K \cup \{C(x)\}\) is satisfiable.

Subsumption

\(K \models C \sqsubseteq D \longleftrightarrow K \cup \{C \sqcap \neg D(x)\}\) is unsatisfiable.

Instance Check

\(K \models C(a) \longleftrightarrow K \cup \{\neg C(a)\}\) is unsatisfiable.

Tableau Algorithm

How can we proove the satisfiability of a concept?

(Remember: A concept is satisfiable, if there exists a model \(\mathcal{I}\) satisfiying it.) 

We need a constructive decision procedure for constructing models.

\(\longrightarrow\)  Tableau Algorithm  

Proof procedure:

  • transform a given concept into Negation Normal Form (NNF)
  • apply completion rules in arbitrary order as long as possible
  • the concept is satisfiable if, and only if, a clash-free tableau can be derived to which no completion rule is applicable

Sample Ontology for Tableau Algorithm

TBox \(\mathcal{T}\) :

\( \begin{align} \text{Man} &\equiv \neg \text{Woman} \sqcap \text{Person}\\ \text{Woman} &\sqsubseteq \text{Person}\\ \text{Mother} &\equiv \text{Woman} \sqcap \exists \text{hasChild}.\top \\ \end{align} \)

ABox \(\mathcal{A}\) :

\( \begin{align} \text{Man(STEPHEN)}\\ \neg\text{Man(MONICA)}\\ \text{Woman(JESSICA)}\\ \text{hasChild(STEPHEN, JESSICA)}\\ \end{align} \)

Let \(\mathcal{I}\) be an interpretation with:

\( \begin{align} \text{Man}^\mathcal{I}&=\{STEPHEN\}\\ \text{Woman}^\mathcal{I}&=\{JESSICA, MONICA\}\\ \text{Mother}^\mathcal{I}&=\{MONICA\}\\ \text{Person}^\mathcal{I}&=\{JESSICA, MONICA, STEPHEN\}\\ \text{hasChild}^\mathcal{I}&=\{\langle MONICA, STEPHEN \rangle, \langle STEPHEN, JESSICA \rangle\}\\ \end{align} \)

then it holds that

\( \mathcal{I} \models \mathcal{T} \text{ and } \mathcal{I}\models \mathcal{A} \)

Negation Normal Form

A concept is in Negation Normal Form (NNF) if all occurences of negations in it are in front of atomic concepts.

Every \(\mathcal{ALC}\) concept can be transformed into an equivalent one in NNF using the following rules:

\[ \begin{aligned} NNF(C) &= C, \text{ if } C \text{ is atomic }\\ NNF(\neg C) &= \neg C, \text{ if } C \text{ is atomic}\\ NNF(\neg \neg C) &= NNF(C) \\ NNF(C \sqcup D) &= NNF(C) \sqcup NNF(D) \\ NNF(C \sqcap D) &= NNF(C) \sqcap NNF(D) \\ NNF(\neg(C \sqcup D)) &= NNF(\neg C) \sqcap NNF(\neg D) \\ NNF(\neg(C \sqcap D)) &= NNF(\neg C) \sqcup NNF(\neg D) \\ NNF(\forall R.C) &= \forall R.NNF(C) \\ NNF(\exists R.C) &= \exists R.NNF(C) \\ NNF(\neg \forall R.C) &= \exists R.NNF(\neg C) \\ NNF(\neg \exists R.C) &= \forall R.NNF(\neg C) \\ \end{aligned} \]

Negation Normal Form - Example

Transform the concept

\[\neg (\neg (A \sqcup \neg B) \sqcap \neg C))\]

to an equivalent concept in negation normal form:

\[ \begin{aligned} &NNF(\neg (\neg (A \sqcup \neg B) \sqcap \neg C))\\ &= NNF(\neg \neg (A \sqcup \neg B)) \sqcup NNF(\neg \neg C)\\ &= NNF(A \sqcup \neg B) \sqcup NNF(C)\\ &= NNF(A \sqcup \neg B) \sqcup C\\ &= NNF(A) \sqcup NNF(\neg B) \sqcup C\\ &= A \sqcup \neg B \sqcup C\\ \end{aligned} \]

Tableau algorithm for \(\mathcal{ALC}\) concept satisfiability

A tableau (completion graph) for a \(\mathcal{ALC}\) concept is a labeled oriented graph \(G=\langle V,E,L\rangle\), where

  • each node \(x \in V\) is labeled with a set \(L(x)\) of concepts, and
  • each edge \(\langle x,y \rangle \in E\) is labeled with a set \(L(\langle x,y \rangle\)) of roles.

A completion graph \(G\)

  • contains a clash, if \(\{A,\neg A\} \in L(x)\) for some atomic concept \(A\), or \(\bot \in L(x)\), or \(\neg \top \in L(x)\)
  • is complete, if no completion rule can be applied on it.

Completion Rules for \(\mathcal{ALC}\) concept satisfiability

\(\sqcap\)-rule if \( C \sqcap D \in L(v), \text{ for some } v \in V \text{ and } \{C, D\} \not \subseteq L(v) \)
then \(L(v):=L(v) \cup \{C, D\} \)
\(\sqcup\)-rule if \( C \sqcup D \in L(v), \text{ for some } v \in V \text{ and } \{C, D\} \cap L(v) = \emptyset \)
then \( \text{choose } X \in \{C, D\} \text{ and let } L(v) := L(v) \cup \{X\} \)
\(\exists\)-rule if \( \exists r.C \in L(v), \text{ for some } v \in V, \text{ and there is no } r\text{-successor } \) \( v' \text{ of } v \text{ such that } C \in L(v') \)
then \(V:= V \cup \{v'\} , E:=E \cup \{ \langle v,v' \rangle \}, L(v'):=\{C\}\) \( \text{ and } L(\langle v,v' \rangle):=\{r\}\) \(\text{ for a new vertex } v'\)
\(\forall\)-rule if \( v,v' \in V, v' \text{ is } r\text{-successor of } v, \forall r.C \in L(v) \text{ and } C \not \in L(v') \)
then \( L(v'):=L(v') \cup {C} \)

Tableau algorithm for \(\mathcal{ALC}\) concept satisfiablity - Example 1

We check whether \(C=(A \sqcap \neg A) \sqcup B\) is satisfiable. It is in NNF, so we can directly apply the tableau algorithm to

\[(A \sqcap \neg A) \sqcup B.\]

The only rule applicable is \(\sqcup-\text{rule}\). We have two possibilities. Firstly, we can try

\[L(x)=\{C, A \sqcap \neg A\}.\]

Then we can apply \(\sqcap-\text{rule}\) and obtain

\[L(x)=\{C, A \sqcap \neg A, A, \neg A\}.\]

We have obtained a clash, thus this choice was unsuccessful. Secondly, we can try

\[L(x)=\{C, B\}.\]

No more rule is applicable and we obtained no clash. Thus, \((A \sqcap \neg A) \sqcup B\) is satisfiable.

A model \(\mathcal{I}\) satisfiying it is given by

\[\Delta^\mathcal{I}=\{x\}, A^\mathcal{I}=\emptyset, B^\mathcal{I}=\{x\}.\]

Tableau algorithm for \(\mathcal{ALC}\) concept satisfiablity - Example 2

We check whether \(C=A \sqcap \exists r.B \sqcap \forall r.\neg B\) is satisfiable. It is in NNF, so we can directly apply the tableau algorithm to

\[C=A \sqcap \exists r.B \sqcap \forall r.\neg B.\]

An application of \(\sqcap-\text{rule}\) gives

\[L(x)=\{C, A, \exists r.B, \forall r.\neg B\}.\]

An application of \(\exists-\text{rule}\) gives

\(\begin{align*} L(x)&=\{C, A, \exists r.B, \forall r.\neg B\}\\ L(y)&=\{B\}\\ L(\langle x,y \rangle)&=\{r\} \end{align*} \)

An application of \(\forall-\text{rule}\) gives

\(\begin{aligned} L(x)&=\{C, A, \exists r.B, \forall r.\neg B\}\\ L(y)&=\{B, \neg B\}\\ L(\langle x,y \rangle)&=\{r\} \end{aligned} \)

We obtained a clash and no other choices are possible. Thus, \(A \sqcap \exists r.B \sqcap \forall r.\neg B\) is unsatisfiable and there exists no model.

Tableau algorithm for \(\mathcal{ALC}\) TBoxes

We extend the tableau algorithm to check satisfiability of \(\mathcal{ALC}\) TBoxes
  • An \(\mathcal{ALC}\) TBox contains only axioms (GCIs) of form \(C \sqsubseteq D\) (Note that axioms of form \(C \equiv D\) can be rewritten as \(C \sqsubseteq D\) and \(D \sqsubseteq C\))
  • Every GCI is equivalent to \(\top \sqsubseteq \neg C \sqcup D\)

We can internalize the whole TBox into a single axiom:
\[\mathcal{T}=\{C_i \sqsubseteq D_i | 1 \leq i \leq n\}\]

is equivalent to
\[ \top \sqsubseteq \underset{1 \leq i \leq n}{{\LARGE\sqcap}} \neg C_i \sqcup D_i \]

Let \(C_\mathcal{T}\) be the concept on the right side of the GCI, then an additional rule is

\(\mathcal{T}\)-rule if \( C_\mathcal{T} \,\,\not \in L(v), \text{ for some } v \in V\)
then \( L(v):=L(v) \cup \{C_\mathcal{T}\} \)

Tableau algorithm for \(\mathcal{ALC}\) TBoxes - Example

Suppose we have a TBox \(\mathcal{T}=\{A \sqsubseteq \exists r.A\}\) and want to check whether the concept \(A\) is satisfiable.

We start with

\[L(x)=\{A\}\]

The only rule applicable is \(\mathcal{T}-\text{rule}\), thus we obtain

\[L(x)=\{A, \neg A \sqcup \exists r.A\}\]

After applying \(\sqcup-\text{rule}\), the first choice leads to a clash, thus we use the second part of the disjuction and get

\[L(x)=\{A, \neg A \sqcup \exists r.A, \exists r.A\}\]

We can apply the \(\exists-\text{rule}\) and get

\( \begin{align*} L(x)&=\{A, \neg A \sqcup \exists r.A, \exists r.A\}\\ L(y)&=\{A\} \end{align*} \)

At this point, we could run the same procedure on \(y\), thus the algorithm would never terminate.

Solution: We need to discover cycles \(\Longrightarrow\) Blocking

Blocking for \(\mathcal{ALC}\)

Goal: Ensure termination of the tableau algorithm.
Solution: Detect cycles that might occur due to application of the \(\mathcal{T}-\text{rule}\).
Result: Completion graph is always finite.
 
 
 
 

Blocking:

A node \(v' \in V\) is directly blocked by a node \(v \in V\), if

  1. \(v\) is ancestor of \(v'\)
  2. \(L(v') \subseteq L(v)\)
  3. there is no directly blocked node \(v^{''}\), such that \(v''\) is ancestor of \(v\)
     

A node \(v'\) is blocked, if either

  1. \(v'\) is directly blocked, or
  2. there is a directly blocked node \(v\) which is ancestor of \(v'\)

Tableau algorithm for \(\mathcal{ALC}\) TBoxes With Blocking - Example

Suppose we have a TBox \(\mathcal{T}=\{A \sqsubseteq \exists r.A\}\) and want to check whether the concept \(A\) is satisfiable.

We obtain the clash-free tableau

\( \begin{align*} L(x)&=\{A, \neg A \sqcup \exists r.A, \exists r.A\}\\ L(y)&=\{A, \neg A \sqcup \exists r.A, \exists r.A\} \end{align*} \)

wherein \(y\) is directly blocked by \(x\).

We can get a finite model by taking into account that

  • blocked nodes do not represent elements in the model
  • an edge from a node \(v\) to a directly blocked node \(v'\) is represented in the model as "edge" from \(v\) to the node which directly blocks \(v'\) 

For our example we get

\[ \Delta^\mathcal{I}=\{x\}, A^\mathcal{I}=\{x\}, r^\mathcal{I}=\{\langle x,x \rangle\} \]

Tableau algorithm for \(\mathcal{ALC}\) TBoxes With Blocking - (In)finite Models

The TBox \(\{\mathit{Guard}\sqsubseteq \exists \mathit{shields}.\mathit{Guard}\}\) has a finite model (see previous slides).

What about the following TBox?

\[ \begin{align*}\mathit{Guard}&\sqsubseteq \exists \mathit{shields}.\mathit{Guard} \sqcap \leq 1 \mathit{shields}^-\\ \mathit{FirstGuard}&\sqsubseteq \mathit{Guard} \sqcap \leq 0 \mathit{shields}^- \end{align*} \]

The existence of a FirstGuard forces the existence of an infinite sequence of guards, each one shielding the next.

I.e. there exist TBoxes that do not have finite models.

Complexity of DL Reasoning

http://www.cs.man.ac.uk/~ezolin/dl/: configure your DL and learn how complex standard reasoning tasks are

Summary

  • Semantics of OWL is based on Description Logics
  • Description Logics are decidable fragments of First Order Logic
  • Deductive reasoning in OWL is possible
  • The tableau algorithm is one of the most common reasoning procedures for OWL

TA-Description Logic

Translate the following statement to Description logic:

a) SubClassOf(Prof Union(intersectionOf(Person University member) intersectionOf(Person Not(Doktorand))))

b)proof the following query using Tableau algorithm

SubClassOf(Professor Person)

TA-Propositional Logic

Tableaux Algorithm (Propositional Logic)

Example 1:           proof the following statement: ((q r) (¬q r))

(1)                      negation:                                           ¬((q r ) (¬q r))

(2)                                                              

                          (q r)  

(3)                           ¬ (¬q r) = q ∧ ¬r

(4)                           q

(5)                            r      

(6)                                        q

(7)                          ¬r

TA-Description Logic

Having the following Knowledge Base, proof the example query:

knowledge Base W: {¬P (E U) (E ∩¬D), (P ∩ ¬E)(a) Proof the following query:           (P ∩ ¬E)(a) (from knowledge base)

   P (a)

 ¬E (a)

  (¬   P (E U) (E ∩¬D)) (a) (from knowledge base)

            ¬P (a)   |  ((EU) ((E ∩ ¬D))(a)

             (E U) (a) | (E ∩¬D) (a)

              E (a)              

              E (a)

             U (a)

            ¬D (a)

Required material

Learning Objectives

  • Rule languages ​​in the Semantic Web
  • Relationship between OWL and rules

The limits of OWL

Description logic concepts are insufficient as query language:
  •  “Which pairs of individuals have a common parent?” 
  •  “Which people live with one of their parents?” 
  •  “Which pairs of (direct or indirect) descendants are there?” 
Relevant information cannot be represented in OWL ontology 
  • \(\forall x . \forall y . \forall z . \, \mathsf{brother} (y, z) \wedge \mathsf{father} (x, y) \to \mathsf{uncle} ( x, z) \) (works in OWL 2, with tricks)
  • \(\forall x. \, \mathsf{love} (x, x) \to \mathsf{narcissist} (x) \) (works in OWL 2)
OWL unsuitable for programming:
  • OWL is decidable: it can generally not express everything programmable (halting problem).
  • OWL is not “processed”, it is not procedural: Certain (built-in) extensions are difficult to implement.

1/4: Logical rules

  • Implications in predicate logic
  • For example: \[F\to G \;\;\; (\equiv\;\neg F \vee G)\]
  • Logical extension of the knowledge base → static  
  • Open World
  • Declarative (descriptive)

2/4: Procedural Rules

  • e.g. Production Rules
  • "If X then Y else Z"
  • Machine-executable instructions  → dynamic
  • Operational (meaning = effect in execution)

3/4: Logic programming

  • e.g. Prolog, F-logic
  • man(X) <- person(X) AND NOT woman(X)
    
  • Approximation of logical semantics with operational aspects, built-ins possible
  • often closed world
  • “Semi-declarative” 

4/4: Inference rules of a calculus

  • e.g. rules for RDF semantics
  • rules not part of the knowledge base, “meta-rules” 
  • not a subject of this lecture

Which rule language to choose?

Rule languages ​​are hardly compatible with each other!
→ choice of appropriate rule language is important

Possible criteria:
  • Clear specification of syntax and semantics?
  • Software tool support?
  • What expressivity do I need?
  • Complexity of the implementation? Performance?
  • Compatibility with existing languages such as OWL?
  • Declarative (description) or operational (programming) semantics?
  • ...

Summary of different rule language approaches

Logical rules (implications in predicate logic):
  • clearly defined, comprehensively researched, well-understood
  • highly compatible with OWL DL and RDF
  • cannot be decided without restrictions
Procedural rules (e.g. production rules):
  • many independent approaches, often only vaguely defined
  • Often used like programming languages, unclear relationship to OWL and RDF
  • efficient processing possible
Logic programming (e.g. Prolog, F-logic):
  • clearly defined, but many different approaches
  • partially compatible with OWL and RDF
  • Decidability and computational complexity strongly depends on the selected approach
Main topic of this lecture: predicate logic rules
(which are also the basis of logic programming)

Predicate logic as a rule language

  • Rules as implication formulas of predicate logic: \[\underbrace{A_1 \wedge A_2\wedge \ldots\wedge A_n}_{\textrm{Body}} \to \underbrace{H_{}}_{\mathrm{Head}}\] → Semantically equivalent to disjunction: \[ H\vee \neg A_1 \vee\neg A_2\vee \ldots\vee\neg A_n\]
  • Constants, variables, and function symbols allowed
  • Quantifiers for variables are often omitted:
    Understood as universally quantified variables (i.e. rule applies to all assignments)
  • Disjunction with several non-negated atoms
    → disjunctive rule: \[ \underbrace{A_1 \wedge A_2\wedge \ldots\wedge A_n}_{\textrm{Body}} \to \underbrace{H_1 \vee H_2 \vee \ldots\vee H_m}_{\mathrm{Head}}\]

Types of rules

Types of “rules” of predicate logic:
  • Clause: disjunction of atomic propositions or negated atomic propositions
  • Horn clause: clause with at most one non-negated atom
  • Definite clause: clause with exactly one non-negated atom
  • Fact: clause of a single non-negated atom
Examples:
  • Clause: \[\mathsf{Person}(x) \;\to\;\mathsf{Woman}(x) \vee \mathsf{Man}(x)\]
  • Definite clause: \[\mathsf{Man}(x) \wedge \mathsf{hasChild}(x,y) \;\to\;\mathsf{Father}(x)\]
  • Function symbol: \[\mathsf{hasBrother}(\mathsf{mother}(x),y) \;\to\;\mathsf{hasUncle}(x,y)\]
  • Horn clause (integrity constraint): \[\mathsf{Man}(x) \wedge \mathsf{Woman}(x) \;\to\;\]
    Think “\( \neg (\mathsf{Man}(x) \wedge \mathsf{Woman}(x)) \)”
  • Fact: \[\mathsf{Woman}(\mathsf{gisela})\]

Datalog

Restriction to Horn rules without function symbols

→ Datalog rules

Datalog

  • logical rule language, originally basis for deductive databases
  • Knowledge bases (“Datalog Programs”) of Horn clauses without function symbols
  • decidable
  • efficient for large amounts of data, overall complexity same as OWL Lite profile of OWL 1 (EXPTIME)

Semantics of rules

Standard semantics of predicate logic!
  • well known and well understood semantics
  • compatible with other predicate logic approaches (e.g. description logic)

Semantics of Datalog

Semantics defined by using logical models:
  • Interpretation of \(\mathcal{I} \) with domain \(\Delta_ {\mathcal{I}} \)
  • Evaluation of variables: variable assignment \(\mathcal{Z} \) (mapping variables to \(\Delta_ {\mathcal{I}} \))
  • Interpretation of terms and formulas under \(\mathcal {I} \) (and \(\mathcal {Z} \)):
    • Interpretation of a constant: \(a ^ {\mathcal{I}, \mathcal{Z}} = a ^ {\mathcal {I}} \in \Delta_ {\mathcal {I}} \)
    • Interpretation of a variable: \(x ^ {\mathcal{I}, \mathcal {Z}} = \mathcal{N} (x) \in \Delta_{\mathcal{I}} \)
    • Interpretation of an n-ary predicate: \(p ^ {\mathcal{I}} \in \Delta_{\mathcal{I}} ^ n \)
    • \(\mathcal{I}, \mathcal{Z} \models p (t_1, \ldots, t_n) \) if and only if \((t_1 ^ {\mathcal{I} \mathcal{N}}, \ldots , t_n ^ {\mathcal {I}, \mathcal{Z}}) \in p ^ {\mathcal{I}} \)
    • \(\mathcal{I} \models B \to H \) iff for each variable assignment \(\mathcal{Z} \) is either \(\mathcal{I}, \mathcal{Z} \models H \) or \(\mathcal{I}, \mathcal{Z} \not \models B \).
  • \(\mathcal{I} \) is a model for a rule set, if and only if \(\mathcal{I} \models B \to H \) for all rules \(B \to H \) in this set 

Datalog in practice

Datalog in practice:
  • Various implementations available
  • Adjustments for the Semantic Web: data types from XML Schema, URIs/IRIs
Extensions of Datalog:
  • disjunctive Datalog allows disjunctions in heads
  • non-monotonic negation (no predicate logic semantics)
  • Integration of information from OWL ontologies (e.g. dl-programs, dlvhex)
    → loose coupling of OWL and Datalog (no common predicate logic semantics)

How can we combine OWL DL and Datalog?

SWRL – “Semantic Web Rule Language” 

  • Proposed extension for OWL with rules
  • Idea: Datalog rules with connection to OWL ontology
  • Symbols in rules can be OWL identifiers or new Datalog identifiers
  • Additional built-ins to process data types
  • several syntactic representations

Semantics of SWRL

OWL DL (Description Logic) and Datalog use the same interpretations:

  • OWL individuals are Datalog constants
  • OWL classes are unary Datalog predicates
  • OWL roles are binary Datalog predicates

→ An interpretation \( \mathcal{I} \) can simultaneously be a model for an OWL ontology and a set of Datalog rules

→ Inferences on OWL-Datalog combination possible

Example

Combined SWRL knowledge base (Datalog + description logic):

  1. Vegetarian(x) ∧ FishProduct(y) → doesNotLike(x,y)
  2. ordered(x,y) ∧ doesNotLike(x,y) → Unhappy(x)
  3. ordered(x,y) → Food(y)
  4. doesNotLike(x,z) ∧ Food(y) ∧ includes(y,z) → doesNotLike(x,y)
  5. → Vegetarian(Markus)
  6. Happy(x) ∧ Unhappy(x) →
  7. ∃ordered.ThaiCurry(Markus)
  8. ThaiCurry ⊑ ∃ includes.FishProduct 
We can conclude: Unhappy(Markus)
(Note: many of the above rules can actually be expressed as description logic axioms – not always intuitively, though.)

How hard is SWRL

  • Reasoning in OWL 1 DL is NEXPTIME-complete.
  • Reasoning in OWL 2 DL is N2EXPTIME-complete.
  • Reasoning in Datalog is EXPTIME-complete.

→ How hard is logical reasoning in SWRL?

Logical reasoning in SWRL is undecidable
(For OWL 1 and thus also for OWL 2).

Undecidability of SWRL

SWRL is undecidable

There is no algorithm that can draw all logical conclusions from all possible SWRL knowledge bases, even if any (finite) amount of computing time and memory is available.

 In practice, however, the following are possible:

  1. Algorithms draw all inferences from a part of all SWRL knowledge bases
  2. Algorithms that draw from all SWRL knowledge bases a part of the conclusions

Both are trivially possible if the appropriate "part" is sufficiently small.

Description Logic Rules

Observation
Some SWRL rules can be expressed already in OWL 2 (i.e. the description logic \( \mathcal{SROIQ} \)).

  • Identification of these Description Logic Rules provides a decidable fragment of SWRL
  • Goal: Use “hidden” expressiveness of OWL 2
  • Implementation directly by OWL 2 tools

SROIQ (in addition to red = SHOIN)

Class expressions
Class name A, B
Conjunction
CD
Disjunction
CD
Negation ¬ C
Exist. Rollenrestr. ∃ R.C
Univ. Rollenrestr. ∀ R.C
Self ∃ S.Self
Greater than ≥ n S.C
Less-than ≤ n S.C
Nominal {A}
Roles
Role names R, S, T
simple roles S, T
Inverse roles R -
Universal role U
Tbox (class axioms)
Inclusion
CD
Equivalence C ≡ D
Rbox (role axioms)
Inclusion
R1R2
Eneral. Incl. R1()Rn()R)
Transitivity Tra(R)
Symmetry Sym(R)
Reflexivity Ref(R)
Irreflexivity Irr(S)
Disjoint Dis(S, T)
Abox (facts)
Class membership C(a)
Role relationship R(a, b)
Neg. Role relationship ¬ S(a, b)
Equality a ≈ b
Inequality
a b

Simple rules with SROIQ

All SROIQ axioms can be written as SWRL rules:
  • \( C \sqsubseteq D \) is C(x) → D(x) 
  • \( R \sqsubseteq S \) is R(x, y) → S(x, y) 
Some classes can be “dismantled” within rules:
  • Happy \(\sqcap\) Unhappy \(\sqsubseteq\) ⊥     corresponds to
    Happy(x) ∧ Unhappy(x) →
  • ∃placeOfResidence.∃liesIn.EUCountry \(\sqsubseteq\) EUCitizen      corresponds to
    placeOfResidence(x,y) ∧ liesIn(y,z) ∧ EUCountry(z) → EUCitizen(x)
SROIQ-role axioms provide additional rules:
  • hasMother ◦ hasBrother \(\sqsubseteq\) hasUncle 
    corresponds to
    hasMother(x,y) ∧ hasBrother(y,z) → hasUncle(x,z)

More rules

What about
doesNotLike(x,z) ∧ food(y) ∧ includes(y,z) → doesNotLike(x,y)?

  • Rule head with two variables → not representable by subclass axiom
  • Rule body contains class expressions → not representable by subproperty axiom

Nevertheless, this rule can be represented in OWL 2!

More rules (II)

Simpler example: Man(x) ∧ hasChild(x,y) → fatherOf(x,y)
Idea
Replace Man(x) by role atom, so that the rule is representable as a general role inclusion with ◦.
Trick: with ∃R.Self we can convert classes into roles:
  • Auxiliary role RMan
  • Auxiliary axiom Man ≡ ∃RMan.Self
  • Intuition: “Men are the very things that have an RMan relationship with themselves.” 
With this auxiliary axiom the rule can be written as:
RMan ◦ hasChild \(\sqsubseteq \) fatherOf

More rules (III)

Example:
doesNotLike(x,z) ∧ Food(y) ∧ includes(y,z)→doesNotLike(x,y)
becomes 

\[Food \equiv \exists R_{Food}.\mathsf{Self}\]

\[doesNotLike \circ includes^{-} \circ R_{Food} \sqsubseteq doesNotLike\]

More rules (IV)

Not so simple:
Vegetarian(x) ∧ FishProduct(y) → doesNotLike(x,y)
Idea
Connect disjointed parts in the rule body by universal role U.
  • Auxiliary roles: RVegetarian and RFishProduct
  • Auxiliary axioms: Vegetarian ≡ ∃RVegetarian.Self and FishProduct ≡ ∃RFishProduct.Self
With these auxiliary axioms the rule can be written as:
RVegetarianURFishProductdoesNotLike

The boundaries of Description Logic Rules

Not all SWRL rules can be represented as description logic axioms!

Example:
ordered(x,y)∧doesNotLike(x,y) → Unhappy(x)
can not be represented in SROIQ.

Possible transformations in the rule body at a glance
  • Reverse roles, for example contains(y,z) → contains(z,y)
  • “Rolling up” side arms, e.g. 
    liesIn(y,z) ∧ EUCountry(z) → ∃liesIn.EUCountry(y)
  • Replace concepts through roles, e.g. Man(x) → RMan(x,x)
  • Convert chains into role inclusions (replace ∧ by ◦)

Description Logic Rules: Definition

Preparation: normalizing rules
  • For each occurrence of a constant a in the rule:
    Add in the body {a}(x) with a new variable x and replace the occurrence of a by x.
  • Replace each atom R(x, x) by ∃R.Self(x).
Dependency graph of a rule: undirected graph with
  • Nodes = variables of the rule
  • Edges = role atoms of the rule body (without direction!)
A SWRL rule is a Description Logic Rule if:
  1. all atoms use SROIQ concepts and roles,
  2. the dependency graph of the normalized rule has no cycles

Example

DL Rules in the SWRL knowledge base of the earlier example: 
  1. Vegetarian(x) ∧ FishProduct(y) → doesNotLike(x,y)
  2. ordered(x,y) → Food(y)
  3. doesNotLike(x,z) ∧ Food(y) ∧ includes(y,z) → doesNotLike(x,y)
  4. → Vegetarian(Markus)
  5. Happy(x) ∧ Unhappy(x) →
Rule (2) ordered(x,y) ∧ doesNotLike(x,y) → Unhappy(x) is not a DL rule

Note: After conversion to \( \mathcal{SROIQ} \) description logic rules must meet the conditions of simple roles and regular Rboxes!
 

Conversion of DL rules to SROIQ (I)

Input: A Description Logic Rule
  1. Normalizing the rule.
  2. For each pair of variables x and y :
    Are x and y not connected in the dependency graph, i.e. there is no path between x and y , then add to the body U(x,y).
  3. The rule head is now of the form D(z) or S(z,z') .
    For each atom of R(x,y) in the body:
    If the path in the dependency graph from z to y is shorter than that from z to x, replace R(x,y) with R(y,x) .
  4. If the body contains an atom R(x,y), so that y does not occurs in any other binary atom of the rule:
    • If the body contains n unary atoms C1(y),...,Cn(y) then define \(E:  = C_1 \sqcap \ldots \sqcap C_n \) and remove C1(y),...,Cn(y) from the body. Otherwise define \(E: = \top \).
    • Replace R(x,y) by ∃R.E(x) 
    Repeat step 4 as long as there are such R(x,y).

Conversion of DL rules to SROIQ (II)

The rule can now be expressed in SROIQ:
  • If the rule head is unary, the rule has the form: 
    C1(x) ∧ ... ∧ Cn(x) → D(x) .
    Replace with \(C_1 \sqcap \ldots \sqcap C_n \sqsubseteq D\).
  • If the head is binary, then
    • For each unary Atom C(z) in the body:
      Create a new axiom C ≡ ∃RC.Self (the role RC is new)
      and replace C(z) by RC(z,z) .
    • The rule now has the form
      R1(x,x2) ∧ ... ∧ Rn(xn,y) → S(x,y) .
      Replace with \(R_1 \circ \ldots \circ R_n \sqsubseteq S \).
This transformation of SWRL rules in a knowledge base does not change its satisfiability.

Rule Interchange Format (RIF)

  • adopted as a W3C standard on 22 June 2010 
  • Focus is on the rules exchange - not a format for all standard languages
  • single language can not meet different requirements and needs for different paradigms of rules' usage
  • also known as family of languges or dialects
  • RIF is uniform and extensible

RIF dialects

Focus is on two kinds of dialects:

  1. logic-based (e.g. predicate logic, logic programming)
  2. "Rules with actions" (eg production rules)

RIF provides framework for defining your own dialects

RIF is compatible with RDF and OWL:

  • can be combined with semantic OWL / RDF
  • RDF syntax for RIF is available

RIF documents

Document Description
RIF-BLD:
The Basic Logic Dialect
definite Horn clauses, standard predicate logic semantics
RIF-PRD:
The Production Rule Dialect
covers the wide range of production control systems
RIF Core:
The Core Dialect
enables communication between control systems with logic rules and production rules
RIF-FLD:
The Framework for Logic Dialect
logical extensional framework to minimize efforts for definition of new logical dialects
RIF-RDF + OWL:
RDF and OWL Compatibility
Combination of RIF with RDF or OWL
RIF-DTB:
Datatypes and build-ins
contains data types, functions and predicates for RIF dialects
RIF + XML data: specifies how RIF can be combined with XML data sources (import, semantics)
RIF OWLRL:
OWL 2 RL in RIF
Axiomatization of OWL 2 RL in RIF
RIF RDF reversible mapping of RDF to RIF
RIF-UCR:
Use cases and requirements
Collection of use cases
RIF test:
Test Cases
Conformance testing for RIF implementations

RIF Core

is the easiest RIF dialect

A core document consists of:

  • Directives like import of URI sprefixes setting
  • a sequence of logical conclusions

RIF Core Example

Document( 
Prefix(cpt http://example.com/concepts#)
Prefix(person http://example.com/people#)
Prefix(isbn http://.../isbn/)
Group
(
Forall ?Buyer ?Book ?Seller (
cpt:buy(?Buyer ?Book ?Seller) :− cpt:sell(?Seller ?Book ?Buyer)
)
cpt:sell(person:John isbn:000651409X person:Mary)
)
)
From this can be derived the following relationship:
cpt:buy(person:Mary isbn:00065409X person:John)

Expressiveness of RIF Core

  • Datalog as a basis
  • contains intersection of RIF-BLD (Basic Logic Dialect) and RIF-PRD (Production Rule Dialect)
  • some extensions: Data Types (RIF-DTB), IRI
  • Forward-chaining is possible

Combination of RDF + RIF

Typical scenario:

  • the application data are available in RDF
  • the rules for the data are described by RIF
  • A RIF processor creates new relationships

RIF is compatible with RDF:

  • RDF triples are representable in RIF

Example in Turtle-based syntax

{ 
?x  rdf:type       p:Novel ;
p:page_number  ?n ;
p: price       [
p:currency  :Euro ;
rdf:value   ?z
] .
?n > "500" ^^xsd:integer .
?z < " 20.0 " ^^xsd:double .
}
=>
{ <me>  p:buys  ?x }

The same with RIF Presentation Syntax

Document ( 
Prefix ...
Group (
Forall ?x  ?n  ?z (
<me> [p:buys−>?x ] :− And (
?x  rdf:type  p:Novel
?x[p:page_number−>?n p:price−>_abc]
_abc [p:currency−>:Euro rdf:value−>?z]
External(pred:numeric−greater−than(?n "500"^^xsd:integer))
External(pred:numeric−less−than(?z "20.0"^^xsd:double))
)
)
)
)

Discover new relationships ...

Forall ?x  ?n  ?z ( 
  [p:buys−>?x] :− And(
    ?x  rdf:type  p:Novel
    ?x[p:page_number−>?n p:price−>_abc]
    _abc[p:currency −>:Euro rdf:value−>?z ]
    External(pred:numeric−greater−than(?n "500"^^xsd:integer))
    External(pred:numeric−less−than(?z "20.0"^^xsd:double))
  )
)
in combination with:
<http://.../isbn/...>  a              p:Novel ;
                       p:page_number  "600"^^xsd:integer ;
                       p: price      [
                           rdf:value   "15.0"^^xsd:double ;
                           p:currency  :Euro
                       ] .
results in:
<me>  p:buys  <http://...isbn/...> .

What's with OWL 2 RL?

OWL 2 RL stands for OWL Rule Language

OWL 2 RL is the intersection of RIF Core and OWL

  • Inferences in OWL RL can be expressed with RIF rules
  • RIF Core engines can behave like OWL RL engines
    • as described in the document RIF - OWL 2 RL can be processed by OWLRL directly in RIF

Outlook: combination of RIF and SPARQL 1.1

Exercise

Convert the following rule into SROIQ axioms:
worksIn(w,x) ∧ employment(w,PERMANENT) ∧ Uni(x) ∧ PhDStudent(y) ∧ supervisedBy(y,w) → professorOf(w,y)
Next step:
Normalizing the rule.

Exercise

Convert the following rule into SROIQ axioms:
worksIn(w,x) ∧ employment(w,z) ∧ {PERMANENT}(z) ∧ Uni(x) ∧ PhDStudent(y) ∧ supervisedBy(y,w) → professorOf(w,y)
Next step:
For each pair of variables x and y: If x and y are not connected in the dependency graph, i.e. there is no path between x and y, then add in the body U(x,y).

Exercise

Convert the following rule into SROIQ axioms:
worksIn(w,x) ∧ employment(w,z) ∧ {PERMANENT}(z) ∧ Uni(x) ∧ PhDStudent(y) ∧ supervisedBy(y,w) → professorOf(w,y)
Next step:
The rule head is now in the form D(z) or S(z,z0). For each atom of R(x,y) in the body: if in the dependency graph, the path from z to y is shorter than that from z to x, replace R(x,y) with R(y,x).

Exercise

Convert the following rule into SROIQ axioms:
worksIn(w,x) ∧ employment(w,z) ∧ {PERMANENT}(z) ∧ Uni(x) ∧ PhDStudent(y) ∧ supervisedBy(w,y) → professorOf(w,y)
Next step:
If the body contains an atom R(x, y), so that y occurs in no other binary atom of the rule:
  • If the body contains n unary atoms C1(y),...,Cn(y) then define \( E: C_1 = \sqcap \ldots \sqcap C_n \) and remove C1(y),...,Cn(y) from the body. Otherwise define \( E: = \top \).
  • Replace R(x,y) by ∃R.E(x) .
Repeat step 4 as long as there are such R(x,y).

Exercise

Convert the following rule into SROIQ axioms:
∃worksIn.Uni(w) ∧ ∃employment.{PERMANENT}(w) ∧ PhDStudent(y) ∧ supervisedBy−(w,y) → professorOf(w,y)
Next step:
For each unary atom C(z) in the body:
Create a new axiom C ≡ ∃RC.Self (the role RC is new) and replace C(z) by RC(z,z) .

Exercise

Convert the following rule into SROIQ axioms:
∃R1.Self ≡ ∃worksIn.Uni
∃R2.Self ≡ ∃employment.{PERMANENT}
∃R3.Self ≡ PhDStudent

R1(w,w) ∧ R2(w,w) ∧ R3(y,y) ∧ supervisedBy−(w,y) → professorOf(w,y)
Next step:
The rule now has the form R1(x,x2) ∧ ... ∧ Rn(xn,y) → S(x,y) .
Replace the rule with \(R_1 \circ \ldots \circ R_n \sqsubseteq S \).

Exercise: Solution

\[ \exists R_1.Self \equiv \exists worksIn.Uni \]\[ \exists R_2.Self \equiv \exists employment . \{ PERMANENT \} \]\[ \exists R_3.Self \equiv PhDStudent \]
\[ R_1 \circ R_2 \circ supervisedBy^{-} \circ R_3 \sqsubseteq professorOf \]

Summary

Predicate logic rules extensions for OWL DL

  • Datalog as a well-known formalism 
  • Combination with OWL possible: SWRL
  • Semantic description by logical extension of OWL interpretation
  • SWRL is undecidable

Description Logic Rules

  • in OWL2 expressible SWRL fragment
  • indirect support through all OWL2 tools
  • definition and algorithm based on dependency graph

RIF (Rule Interchange Format)

  • W3C standard for exchanging rules
  • extensible family of languages

Also relevant:

  • SPARQL 1.1 entailment regime
  • conjunctive queries for OWL DL
  • DL-safe rules (variables can take only constants as values)

Mini Project

Describe with a Horn clause that authors of a joint article are co-authors. Use only the binary predicates coauthor and author. Is it a Description Logic Rule (please explain)? If yes, specify the Description Logic Rule.

Literature

Linked Data Stack - SPARQL

Query: SPARQL User Interface & Applications Trust Crypto Proof Unifying Logic Ontology: OWL Rules: RIF RDF-Schema Data Interchange: RDF XML URI Unicode

Outline

  • About SPARQL
  • Basic SPARQL
    • Presenation
    • Hands-on
  • SPARQL in real-life
    • Presenation
    • Hands-on
  • Advanced SPARQL
    • Presenation
    • Hands-on 

What is SPARQL?

SPARQL stands for “SPARQL Protocol and RDF Query Language”.

In addition to the language, W3C has also defined:

  • The SPARQL Protocol for RDF specification: it defines the remote protocol for issuing SPARQL queries and receiving the results.
  • The SPARQL Query Results XML Format specification: it defines an XML document format for representing the results of SPARQL

Query Languages for RDF and RDFS

There have been many proposals for RDF and RDFS query languages:

  • RDQL (http://www.w3.org/Submission/2004/SUBM-RDQL-20040109/)
  • ICS-FORTH RQL (http://139.91.183.30:9090/RDF/RQL/) and SeRQL (http://www.openrdf.org/doc/sesame/users/ch06.html)
  • SPARQL (http://www.w3.org/TR/rdf-sparql-query/)

In this course we will only cover SPARQL which is the current W3C recommendation for querying RDF data

SPARQL 1.1

Basic SPARQL Structures - Outline

  • A bit of RDF and Semantic Web
  • First glance at triple patterns
  • Components of a SPARQL query
    • Graph patterns
    • Types of queries
    • Modifiers

Triples

Triples are the statements about things (resources), using URIs and Literal values


Triples

Graph

Graphs with Uris

Prefixes

Vocabularies

  • Share concept of a domain
  • Utilize URIs as unique identifier
  • Define Properties and Classes, and more ....

Well known vocabularies

rdf : <http://www.w3.org/1999/02/22-rdf-syntax-ns#

rdfs : <http://www.w3.org/2000/01/rdf-schema#>

foaf : <http://xmlns.com/foaf/0.1/>

dbpedia :  <http://dbpedia.org/resource>

Triple Stores and SPARQL endpoints

  • A SPARQL endpoint exposes one or more Graphs
  • HTTP
  • expects a parameter "query", either with POST or GET with the encoded query
  • no required relation between graph name and endpoint name, but good practice
     

A simple Query

A simple Query

A slightly more complex Query

A slightly more complex Query



SELECT ?friend ?friendname WHERE { jwebsp:John foaf:knows ?friend. ?friend foaf:firstname ?friendname }

A slightly more complex Query

A slightly more complex Query

 

Structure of a SPARQL query

# prefix declarations
PREFIX ex: <http://example.com/resources/>
....
# query type            # projection                 # dataset definition
SELECT                        ?x ?y                             FROM ...

# graph pattern
WHERE {
    ?x a ?y
}

# query modifiers
ORDER BY ?y

Prefixes

Syntactical sugar to keep queries readable

Examples:

PREFIX  :          <http://example.com/base/>

PREFIX  foaf:     <http://xmlns.com/foaf/0.1/>

<http://xmlns.com/foaf/0.1/knows> == foaf:knows

<http://example.com/base/Tim> == :Tim

Query Types

SELECT 

  • returns a result table

ASK 

  • returns (boolean) true, if the pattern can be matched

CONSTRUCT

  • creates triples using templates

DESCRIBE

  • returns descriptions of resources

From clause

 Specifies which graphs should be considered by the endpoint.

  • if omitted, the so called default graph is used.
  • if specified, the query is evaluated using all specified graphs.
  • if specified as named graph, the named graphs can be used in parts of the query.

Graphs can be dereferenced by the SPARQL endpoint.

Solution modifiers

Change the result of a query 

LIMIT and OFFSET slice the resultset, useful for pagination

  example: SELECT * WHERE {.....} LIMIT 10 

  --> display only 10 results

ORDER BY sorts the result set

 example: SELECT * WHERE {.....} ORDER BY ASC(...) LIMIT 10 

  --> display the first 10 of the sorted result set

Where clause

  • contains the graph patterns
  • conjunctive
  • variables are bound to the same values

Triple patterns

  • General form a triple (s p o)
  • On all positions variables may occur
  • Variables are bound by the SPARQL endpoint

Triple patterns - Example

:John foaf:knows :Tim .
:John foaf:name "John" .
:Tim foaf:knows :John .

:Tim foaf:name "Tim" .

SELECT ?name WHERE {:John foaf:name ?name}
-->    "John"

SELECT ?friend WHERE {:John foaf:knows ?friend}
-->     :Tim

SELECT ?friend ?name WHERE {:John foaf:knows ?friend. :John foaf:name ?name}
-->     :Tim "John"

SELECT ?friendsname WHERE {:John foaf:knows ?friend. ?friend foaf:name ?name}
-->    "Tim"


Triple patterns - Cartesian product

:John foaf:knows :Tim .
:John foaf:name "John" .
:Tim foaf:knows :John .
:Tim foaf:name "Tim" .

SELECT ?person ?friendsname WHERE {?person foaf:knows ?friend. ?somebody foaf:name ?friendsname}


:John "John"
:John "Tim"
:Tim "John"
:Tim "Tim"

Matching Resources

Match character by character

either with prefix or full <URI>

  • foaf:name   == <http://xmlns.com/foaf/spec/name>  

percent encoding of reserved characters (like space)

  • myns:John%20Doe != myns:John Doe              | Error!

case sensitive

  • foaf:name   != <http://xmlns.com/foaf/spec/Name>  

Matching Literals

Literals need to match for equality character-by-character

  • can have datatype: xsd:int, xsd:date
    • SPARQL engine may know interpretation of datatype 
    • for equality, it needs to match exactly
  • can have language tag


Filter

  • operate on graph patterns
  • testing values
  • most prominently: restrict Literal values 
    • string comparison
    • regular expressions
    • numeric comparators
  • type/language checks
  • evaluate in the end either to true, false or type error

Filter Overview

  • Logical: !, &&, || 
  • Math: +, -, *, /  
  • Comparison: =, !=, >, <, ...  
  • SPARQL tests: isURI, isBlank, isLiteral, bound 
  • SPARQL accessors: str, lang, datatype Other: sameTerm, langMatches, regex
  • Vendor specific: prefixed like bif:contains

String Filtering

str() just the literal value, without datatype

regex() full regular expression

bif:contains string search using special index

String Filtering Example

:John :age 32 .
:John foaf:name "John"@en .
:Tim :age 20.

:Tim foaf:name "Tim"^^xsd:string .


SELECT ?friend {?friend foaf:name "Tim".}

-->    empty

SELECT ?friend {?friend foaf:name ?name.
FILTER (str(?name) = "Tim")}

-->    :Tim

SELECT ?friend {?friend foaf:name ?name. ?name bif:contains "im")}

-->    :Tim

Language and Datatype Filtering

lang(?x)  accessor to the language of a literal

langMatches(lang(?x),"en") evaluates if a language tag matches an other language tag

datatype(?x) accesses the datatype of the literal ?x

Numeric Filtering

:John :age 32 .
:John foaf:name "John"@en .
:Tim :age 20.

:Tim foaf:name "Tim"^^xsd:string .

SELECT ?friend WHERE {?friend :age ?age
FILTER (?age>25)}

-->     :John

Logical Operators

:John :age 32 . 
:John foaf:name "John"@en . 
:Tim :age 20.
:Tim foaf:name "Tim"^^xsd:string .
SELECT ?friend {?friend foaf:name ?name.
FILTER (str(?name) = "Tim" && ?age>25)}
-->    NULL
SELECT ?friend {?friend foaf:name ?name.
FILTER (str(?name) = "Tim" || ?age>25)}
-->    :Tim
-->    :John

Optional values

  • Similar to left join in SQL
  • Allows querying for incomplete data
  • “Optional” takes a full graph pattern
  • Syntax {pattern1} OPTIONAL {optpattern}

Optional Example

:John foaf:knows :Tim .
:John foaf:name "John" .
:John foaf:phone "+123456" .
:Tim foaf:knows :John .

:Tim foaf:name "Tim" .

SELECT ?name ?phone
                        {?person foaf:name ?name. 
                         ?person foaf:phone ?phone}

--> "John" "+123456"

This is a bit unsatisfying

Optional Example

:John foaf:knows :Tim .
:John foaf:name "John" .
:John foaf:phone "+123456" .
:Tim foaf:knows :John .

:Tim foaf:name "Tim" .

SELECT ?name ?phone {?person foaf:name ?name.
                   OPTIONAL {?person foaf:phone ?phone}}
--> "John" "+123456"
--> "Tim"


Union

Syntax: {graph pattern} UNION {graph pattern}

Allows querying (partly) differing data structures

Union Example

:John rdf:type foaf:Person .
:John foaf:name "John" .
:Tim rdf:type foaf:Person .
:Tim foaf:name "Tim" .
:Jane rdf:type foaf:Person .

:Jane rdfs:label "Jane" .

SELECT ?name WHERE {?person a foaf:Person. ?person foaf:name ?name}

--> "John"
--> "Tim"

SELECT ?name WHERE {?person a foaf:Person.
           {?person foaf:name ?name} UNION{?person rdfs:label ?name}}

--> "John"
--> "Tim"
--> "Jane"

SELECT ?name WHERE {
           {?person foaf:name ?name. ?person a foaf:Person} UNION{?person rdfs:label ?name. ?person a foaf:Person. }}


Projection

SELECT * WHERE {.....}

  --> all variables mentioned in the graph patterns

SELECT ?s ?o WHERE {?s ?p ?o} 

  --> only the variables specified, in this case ?s and ?o

SELECT DISTINCT ........

  --> eliminates duplicates in the result

Count

a simple aggregate function

counts how often a variable is bound.

Example:

:John foaf:knows :Tim .
:John foaf:name "John" .
:Tim foaf:knows :John .

:Tim foaf:name "Tim" .

SELECT count(?person) {?person foaf:name ?name}

--> 2

SPARQL in Real-Life - Outline

  • We use the previously acquired knowledge for
    • Exploring unknown data structures and vocabularies
    • Querying inconsistent data structures


Some public SPARQL endpoints

SPARQLer: general-purpose query endpoint for Web-accessible data

DBpedia: extensive RDF data from Wikipedia

DBLP: bibliographic data from computer science journals and conferences

LMDB: data from MDB - Movies data base (without html form)

World Factbook: country statistics from the CIA World factbook

About DBpedia

  • Crystallization point of the Semantic Web
  • Single most important data source
  • Community effort
  • Extract from the semi-structured information in Wikipedia
  • Non-curated content



Know your limits!

The DBpedia endpoint popular and well-used

Always add a LIMIT statement, when constructing queries

Vocabulary Exploration

Exploration by examining instance data

  • Find descriptive information about the dataset
  • Use tools
  • Analyze the query dump
  • Dereference URI
  • Queries

Descriptive Information

Most datasets have publications decribing them

Find papers about them using scholar.google.com


Tools: Relationship finder

http://www.visualdataweb.org/relfinder.php

Dereference URIs

 The Linked Data principles allow dereferencing URIs to get decriptions

Instance data on Leipzig

--> http://dbpedia.org/resource/Leipzig

Vocabulary information about foaf:name

--> http://xmlns.com/foaf/0.1/name


Querying

DESCRIBE <http://dbpedia.org/resource/Leipzig>

SELECT ?p ?o WHERE {<http://dbpedia.org/resource/Leipzig> ?p ?o}


?p queries

Query resources with a variable in the predicate position of a triple pattern

?p queries - Example

:John foaf:name "John".
:John rdfs:label "This is John".
:John foaf:phone "+12312".


SELECT ?p           ?o            WHERE {:John ?p ?o}

-->     foaf:name     "John"
-->     rdfs:label      "This is John"
-->     foaf:phone    "+12312"


Querying for Classes

Vocabularies define classes

  • foaf:Person
  • foaf:Document

rdf:type/a associates an instance with a class

  • :John a foaf:Person == :John rdf:type foaf:Person


Querying for Classes - Example

:John a foaf:Person
:Pluto a animals:Dog

SELECT ?person WHERE {?person a foaf:Person}

-->            :John

SELECT ?class WHERE {?instance a ?class}

-->          foaf:Person
-->          animals:Dog


Some public SPARQL endpoints

SPARQLer: general-purpose query endpoint for Web-accessible data

DBPedia: extensive RDF data from Wikipedia

DBLP: bibliographic data from computer science journals and conferences

LMDB: data from MDB - Movies data base (without html form)

World Factbook: country statistics from the CIA World factbook

Types

Get all the possible types of concepts in DBpedia

Types A

SELECT distinct ?type

WHERE {
   ?e a ?type
}

Properties list

Get all the properties of the Actor class. Show also their titles

Properties list A

SELECT distinct ?p ?title 

WHERE {
   ?p rdfs:label ?title.
   ?e a <http://dbpedia.org/ontology/Actor>.
   ?e ?p ?v
}

Working with DBpedia page

 

Look through Ivan The Terrible DBpedia page. What properties you might use to get the full list of Russian Leaders?

Check your suggestions using the DBpedia endpoint

Compare the amount of results for different queries using COUNT aggregation function.

Working with DBpedia page A

SELECT ?e
WHERE {
   ?e dcterms:subject category:Russian_leaders
}

SELECT ?e
WHERE {
   ?e dbpprop:title dbpedia:List_of_Russian_rulers
}

...

SELECT count(?e)

WHERE {
   ?e dbpprop:title dbpedia:List_of_Russian_rulers .

...

Multiple patterns

 Change the previous query to show also the real name of the leader.

Multiple patterns A

SELECT ?e ?name
WHERE {
   ?e dbpprop:title dbpedia:List_of_Russian_rulers  .
   ?e dbpprop:name ?name

}   

Better version:

SELECT ?e ?name
WHERE {
   ?e dbpprop:title dbpedia:List_of_Russian_rulers  .
   ?e rdfs:label ?name
}

LIMIT

Show only 20 first results. Then show the next 20. 

Show twenty results starting from the 10th.

LIMIT A

SELECT ?e ?name

WHERE {
   ?e dbpprop:title dbpedia:List_of_Russian_rulers  .
   ?e rdfs:label ?name
}

LIMIT 20
OFFSET 10

FILTER

Filter the list and show only the results for Ivan_the_Terrible

FILTER A

SELECT ?e ?name
WHERE {
   ?e dbpprop:title dbpedia:List_of_Russian_rulers  .
   ?e rdfs:label ?name .

FILTER (?e = <http://dbpedia.org/resource/Ivan_the_Terrible>)
}

String Matching

 Show the list of all Russian leaders with the name "Ivan"

String matching A

SELECT ?e ?name
WHERE{
   ?e dbpprop:title dbpedia:List_of_Russian_rulers  .
   ?e rdfs:label ?name .

FILTER regex(?name, "ivan", "i")

Langmatching

Get a list of Russian leaders showing only Russian labels for the name.

Langmatching A

SELECT ?e ?name
WHERE {
   ?e dbpprop:title dbpedia:List_of_Russian_rulers  .
   ?e rdfs:label ?name .

FILTER (langMatches(lang(?name), "RU"))
}

Choosing properties to show

Rewrite the previous query to show the entry, the name, the name of predecessor and the name of successor. 

Choosing properties to show A

SELECT ?name ?predecessor_name ?successor_name
WHERE {
   ?e dbpprop:title dbpedia:List_of_Russian_rulers .
   ?e rdfs:label ?name .
   ?e dbpedia-owl:successor ?successor .
   ?successor rdfs:label ?successor_name .
   ?e dbpprop:predecessor ?predecessor .
   ?predecessor rdfs:label ?predecessor_name .
FILTER (langMatches(lang(?name), "EN") && langMatches(lang(?successor_name), "EN") && langMatches(lang(?predecessor_name), "EN"))
}

More practice

 Find a real name of the Russian leader, who was on the throne right before Catherine I ("Catherine I of Russia"@en)

Can  you find other ways to do the same task?

More practice A

SELECT ?name
WHERE{
   ?e dbpprop:title dbpedia:List_of_Russian_rulers .
   ?e rdfs:label ?name .
   ?e dbpedia-owl:successor ?successor  .
   ?successor rdfs:label "Catherine I of Russia"@en

} OR

SELECT ?name as ?leader
WHERE {
   ?e dbpprop:title dbpedia:List_of_Russian_rulers .
   ?e rdfs:label ?name .
   ?e dbpedia-owl:successor ?successor .
   ?successor  rdfs:label ?successor_name .
   FILTER (?successor_name = "Catherine I of Russia"@en)
}

OPTIONALs

Look at the page: http://dbpedia.org/page/Dmitry_of_Suzdal

Why this leader is not in the results of the previous queries?

Fix the problem.

OPTIONALs A

SELECT ?e ?name ?predecessor_name ?successor_name
WHERE {?e dbpprop:title dbpedia:List_of_Russian_rulers .
   ?e rdfs:label ?name .
   FILTER (langMatches(lang(?name), "EN")) .
   OPTIONAL {?e dbpedia-owl:successor ?successor .
     ?successor rdfs:label ?successor_name .
     FILTER (langMatches(lang(?successor_name), "EN"))  .
   }  .
   OPTIONAL {?e dbpprop:predecessor ?predecessor .
    ?predecessor rdfs:label ?predecessor_name .
    FILTER (langMatches(lang(?predecessor_name), "EN")) .
   }
}

UNIONs

Look at the http://dbpedia.org/page/Dmitry_of_Suzdal page more carefully. What can you say about successor and predecessor of the leader?

Fix the problem.

UNIONs A

SELECT ?e ?name ?predecessor_name ?successor_name
WHERE {?e dbpprop:title dbpedia:List_of_Russian_rulers .
   ?e rdfs:label ?name .
   FILTER (langMatches(lang(?name), "EN")) .
   OPTIONAL {{?e dbpedia-owl:successor ?successor} UNION  { ?e dbpprop:after ?successor } .
     ?successor rdfs:label ?successor_name .
     FILTER (langMatches(lang(?successor_name), "EN"))  .
   }  .
   OPTIONAL {{?e dbpprop:predecessor ?predecessor} UNION  { ?e dbpprop:before ?predecessor } .
    ?predecessor rdfs:label ?predecessor_name .
    FILTER (langMatches(lang(?predecessor_name), "EN")) .
   }  .

Final task

Show the list of actors, played together with Julia Roberts. For each result show also the name of the movie and the director. Order the results both by director and by movie.

Final task A


SELECT ?director_name ?movie_name ?actor_name WHERE { ?movie dbpedia-owl:starring <http://dbpedia.org/resource/Julia_Roberts> . ?movie dbpedia-owl:starring ?actor . ?movie rdfs:label ?movie_name . ?actor rdfs:label ?actor_name . ?movie dbpedia-owl:director ?director . ?director rdfs:label ?director_name . FILTER (langMatches(lang(?movie_name), "EN") && langMatches(lang(?actor_name), "EN") && langMatches(lang(?director_name), "EN")) . } ORDER BY ?director ?movie

Aggregate Functions

Aggregate functions similar to SQL were introduced with SPARQL 1.1

Most important min, max, avg, sum, count

Group by groups the results accordingly, neccessary for projection


Aggregate Functions - Example

:John :age 32 .
:John :gender :male .
:Tim :age 20.
:Tim :gender :male.
:Jane :gender :female.

:Jane :age 23.

SELECT avg(?age) WHERE {?person :age ?age}

-->    25

SELECT ?gender min(?age)  WHERE {?person :age ?age. ?person :gender ?gender} GROUP BY ?gender

-->    :male  20
-->    :female 23

Other query types

CONSTRUCT

--> creates a graph by binding variables in a template

ASK

--> returns a boolean values, if the pattern could be found

DESCRIBE

--> gives a short description about some resources


Other Query Types Examples

:John foaf:knows :Tim .
:John foaf:name "John" .
:John foaf:phone "+123456" .
:Tim foaf:knows :John .
:Tim foaf:name "Tim" .

CONSTRUCT {?person foaf:name ?name. ?person foaf:phone ?phone}
                   {?person foaf:name ?name. ?person foaf:phone ?phone}


--> :John foaf:name "John" .
--> :John foaf:phone "+123456" .


Other Query Types Examples

:John foaf:knows :Tim .
:John foaf:name "John" .
:John foaf:phone "+123456" .
:Tim foaf:knows :John .
:Tim foaf:name "Tim" .

DESCRIBE ?person WHERE {?person foaf:name ?name. ?person foaf:phone ?phone}
     
--> :John foaf:name "John" . 
--> :John foaf:phone "+123456" .


Other Query Types Examples

:John foaf:knows :Tim .
:John foaf:name "John" .
:John foaf:phone "+123456" .
:Tim foaf:knows :John .
:Tim foaf:name "Tim" .

ASK         {?person foaf:name ?name. ?person foaf:phone ?phone}
-->true


Named Grahps

Allow more control about from which graphs a triple is coming from

SELECT *
FROM NAMED <http://mygraph.example/>
WHERE{ GRAPH ?g {?s ?p ?o}}

--> <http://mygraph.example/> <s> <p> <o>

Named Graphs - Example

SELECT ?g ?o ?p2 ?o2
FROM <http://www.w3.org/People/Berners-Lee/card.rdf>
FROM NAMED <http://dig.csail.mit.edu/2008/webdav/timbl/foaf.rdf>
{ ?s ?p ?o.
GRAPH ?g {?o ?p2 ?o2}  }

--> A huge list of triples

Subquery with from clause

Subqueries on the cheap:

  1. Write the query which you want to use as basis as a CONSTRUCT Query
  2. URL encode it
  3. Create the other query
  4. Put the endpoint for the first + the encoded query in the FROM clause of the other query.


--> SELECT * FROM <http://dbpedia.org/sparql?query=SELECT%20....>



Negation

Question: How to find all contacts that do NOT have a phone number


Use a combination of not, bound and optional!

Negation Example

:John foaf:knows :Tim .
:John foaf:name "John" .
:John foaf:phone "+123456" .
:Tim foaf:knows :John .

:Tim foaf:name "Tim" .

SELECT ?name ?phone {
              ?person foaf:name ?name.
               OPTIONAL {?person foaf:phone ?phone}
               FILTER (!bound(?phone))}

--> Tim

Query Federation

The SERVICE keyword allows federation between multiple SPARQL endpoints

The endpoints distribute the query

Reasoning and SPARQL

Reasoning is not a SPARQL feature

Some reasoning can be simulated with SPARQL

Missing direct associations with parent classes can be queried with patterns like

{?sub rdfs:subClassof ?parent .
?subsub rdfs:subClassOf ?sub...}

Property Paths

Either syntactical sugar:

   ?person foaf:knows/foaf:name ?name ==

   ?person  foaf:knows ?friend. ?friend foaf:name ?name 

Or explorative:

  ?x foaf:knows+/foaf:name ?name .

The SPARQL 1.1 Recommendation has further helpful examples.






Queries and Algebra

  • SPARQL queries are compiled into algebraic expressions for evaluation
  • SPARQL queries with identical result sets can perform differently, depending on how well the query can be optimized.

Examples:

select * {?s ?p ?o. FILTER (?p = foaf:name && ?o = "Angela Merkel"@en) }

select * {?s foaf:name "Angela Merkel"@en }


Algebra

PREFIX  foaf: <http://xmlns.com/foaf/spec/>

SELECT  *
WHERE
  { ?s foaf:name "Angela Merkel"@en }
compiles into
  1 (base <http://example/base/>
  2   (prefix ((foaf: <http://xmlns.com/foaf/spec/>))
  3     (bgp (triple ?s foaf:name "Angela Merkel"@en))))



Algebra

PREFIX  foaf: <http://xmlns.com/foaf/spec/>

SELECT  *
WHERE
  { ?s ?p ?o
    FILTER ( ( ?p = foaf:name ) && ( ?o = "Angela 
Merkel"@en ) ) }

compiles into 

(base <http://example/base/>
  (prefix ((foaf: <http://xmlns.com/foaf/spec/>))
    (filter (&& (= ?p foaf:name) (= ?o "Angela
Merkel"@en)) (bgp (triple ?s ?p ?o)))))

Evaluation

  • SPARQL queries are recursively evaluated, starting from the triple patterns (leaf nodes)
  • Intermediate result sets are build

Usage of indexes for:

  • Resources
  • Literals

But not for:

  • regex
  • aggregate functions

Instead consider bif:contains (bif = built-in function, in some engines)

Also consider pushing filters as deep into queries as possible.

Not Bound

Check your final task from the basic SPARQL tutorial: try to find the movies, starring by Julia Roberts, where there is no information about director.

Not Bound A

SELECT ?movie_label WHERE { ?movie dbpedia-owl:starring <http://dbpedia.org/resource/Julia_Roberts> . ?movie rdfs:label ?movie_label . OPTIONAL {?movie dbpedia-owl:director ?director} . FILTER (langMatches(lang(?movie_label), "EN") && !bound(?director)) }

Aggregation

Collect the statistics about Russia: find the population, total number of cities, number of cities with the population more than 1billion, average population of cities.

Aggregation A1


SELECT ?population count(?city) WHERE { <http://dbpedia.org/resource/Russia> dbpprop:populationEstimate ?population . ?city a dbpedia-owl:PopulatedPlace . ?city dbpedia-owl:country <http://dbpedia.org/resource/Russia> . }

Aggregation A2


SELECT count(?billioners) WHERE { ?billioners a dbpedia-owl:PopulatedPlace . ?billioners dbpedia-owl:country <http://dbpedia.org/resource/Russia> . ?billioners dbpprop:pop2002census ?city_population . FILTER (?city_population > 1000000) }

Aggregation A3


SELECT AVG(?population) WHERE { ?city a dbpedia-owl:PopulatedPlace . ?city dbpedia-owl:country <http://dbpedia.org/resource/Russia> . ?city dbpprop:pop2002census ?population . }

AS

 Change the previous queries to show the correct titles of the table columns.

AS A


SELECT ?population count(?city) AS ?number_of_cities WHERE { <http://dbpedia.org/resource/Russia> dbpprop:populationEstimate ?population . ?city a dbpedia-owl:PopulatedPlace . ?city dbpedia-owl:country <http://dbpedia.org/resource/Russia> . }

MINUS

 Exclude the Novosibirsk when counting the average population of Russian cities

MINUS A


SELECT AVG(?population) WHERE { ?city a dbpedia-owl:PopulatedPlace . ?city dbpedia-owl:country <http://dbpedia.org/resource/Russia> . ?city dbpprop:pop2002census ?population . MINUS {<http://dbpedia.org/resource/Novosibirsk> dbpprop:pop2002census ?population} }

Retrieving the information

Show the information about Moscow. Show all triples, where Moscow is either a subject or an object.

Retrieving the information A


SELECT ?s ?p ?o WHERE { { ?s ?p ?o. filter (?s = <http://dbpedia.org/resource/Moscow>) } UNION { ?s ?p ?o. filter (?o = <http://dbpedia.org/resource/Moscow>) } }

Searching for commons

 Find the commons between Mikhail Gorbachev and Ivan The Terrible.

Searching for commons A


SELECT ?p ?c ?o WHERE { <http://dbpedia.org/resource/Ivan_the_Terrible> ?p ?o . <http://dbpedia.org/resource/Mikhail_Gorbachev> ?c ?o }

Use of relational finder

 Do the same task in RelFinder

Use of Hanne

 Find the best way to get the list of Russian football clubs using Hanne

How to connect

 Download the file

Where to write query

Find this part:

You can enter any query you want.

How to show the results

 Fill the table accordingly with the JSON object structure

SPARQL Endpoints

http://www.sparql.org/

http://dbpedia.org/sparql

http://www.w3.org/wiki/SparqlEndpoints

SPARQL enabled triple stores

Virtuoso Open Source   ---  http://virtuoso.openlinksw.com/

Jena & Fuseki --- http://jena.apache.org/

Sesame --- http://www.openrdf.org/

Local queries with ARQ

How to query RDF datasets in local files:

  1. Download Jena
  2. Learn about the SPARQL features that ARQ supports
  3. Use the arq.query command-line tool ( documentation )
  4. java -cp <jena>/lib/commons-codec-1.6.jar:…:<jena>/lib/xml-apis-1.4.01.jar arq.query --data=file.rdf --query=file.sparql
    • Wrap this into a shell script or alias to save time!
    • Java class path must contain all *.jar files of Jena
    • on Windows use ; instead of : as separator
    • data file must be RDF/XML and have *.rdf filename extension
    • trick in Unix-style shells (e.g. bash): instead of --query=file.sparql use --query=<(echo "SELECT ...") (“process substitution”)

ARQ: Output

Example of running ARQ (see previous slide for full command line):

$ ... arq.query --data=test.rdf --query=<(echo "SELECT DISTINCT ?class WHERE {?s a ?class}")
----------------------------------------
| class                                |
========================================
| <http://xmlns.com/foaf/0.1/Person>   |
| <http://xmlns.com/foaf/0.1/Document> |
----------------------------------------

Additional Tools

YASGUI – a user-friendly web GUI to query a given SPARQL endpoint, with syntax highlighting.

FedX --- http://www.fluidops.com/fedx/

Further Learning Resources

  • SPARQL Trainer (http://aksw.org/projects/sparqltrainer)
  • Learning SPARQL, Bob DuCharme, O'Reilly (2011)
  • Semantic Web for the Working Ontologist, Dean Allemang and James Hendler, Morgan Kaufmann (2011)  
  • SPARQL by example, http://www.cambridgesemantics.com/semantic-university/sparql-by-example

Additional Topics

GeoSparql

Task: Display monuments 30km away on a map.

Sparql Update

Task: Create a Graph with some personal information about you.



The end!

 

how to install fuseki and use it


Source: http://jena.apache.org/documentation/serving_data/

Define your own knowledge base and load it to fuseki and query it

This is the knowledge base:

John rdf:type foaf:Person . 

John foaf:name "John" . 

Tim rdf:type foaf:Person . 

Tim foaf:name "Tim" . 

Jane rdf:type foaf:Person .

Jane rdfs:label "Jane" .

Jane foaf:name "Jane" . 

and this is the query:

SELECT ?name WHERE {?person a foaf:Person. 

           {?person foaf:name ?name} UNION{?person rdfs:label ?name}}


  • You can also try more queries inspiring from the lecture.

Linked (Open) Data

Linked Data is data on the Web that satisfies certain basic principles (see next slide).

Linked Data is often Open Data (i.e. beeing freely available with an open license), then called Linked Open Data (LOD).
However, the Linked Data principles also work for “closed” data, e.g. in enterprise intranets.

In this lecture you will learn

  • the linked data principles
  • basic architecture principles for the Web
  • the 5-star scheme for open data

Linked Data Principles

Tim Berners-Lee outlined four principles of Linked Data in his Design Issues: Linked Data note, paraphrased along the following lines:

  • Use URIs to identify things that you expose to the Web as resources.
  • Use HTTP URIs so that people can locate and look up (dereference) these things.
  • Provide useful information about the resource when its URI is dereferenced.
  • Include links to other, related URIs in the exposed data as a means of improving information discovery on the Web.
  • Web Architecture Principles (I): Resources

    • Resources identify the items of interest in our domain - things whose properties and relationships we want to describe in the data
    • W3C Technical Architecture Group (TAG) distinguishes two kinds of resources:
      • Information resources: All the resources we find on the traditional document Web, such as documents, images, and other media files, are information resources.
      • Non-information resources (also 'other resources') . People, physical products, places, proteins, scientific concepts, etc. As a rule of thumb, all “real-world objects” that exist outside of the Web are non-information resources.

    Web Architecture Principles (II): Resource Identifiers

    • Resources are identified using Uniform Resource Identifiers (URIs) .
    • For Linked Data, we use HTTP URIs only (avoid other URI schemes such as URNs and DOIs)
    • HTTP URIs make good names for two reasons:
      • They provide a simple way to create globally unique names without centralized management;
      • URIs work not just as a name but also as a means of accessing information about a resource over the Web.
    • The preference for HTTP over other URI schemes is discussed at length in the W3C TAG draft finding URNs, Namespaces and Registries.

    Web Architecture Principles (III): Representations

    • Information resources can have representations.
    • A representation is a stream of bytes in a certain format, such as HTML, RDF/XML, or JPEG.
    • For example, an invoice is an information resource. It could be represented as an HTML page, as a printable PDF document, or as an RDF document.
    • A single information resource can have many different representations, e.g. in different formats, resolution qualities, or natural languages.

    Web Architecture Principles (IV): Dereferencing HTTP URIs

    URI Dereferencing = looking up a URI on the Web to get information about the referenced resource. W3C TAG distinguishes two different types of URIs:

    • Information Resources : When a URI identifying an information resource is dereferenced, the server usually generates a new representation, a new snapshot of the information resource's current state, and sends it back to the client using the HTTP response code 200 OK.
    • Non-Information Resources cannot be dereferenced directly.
      1. Instead of sending a representation of the resource, the server redirects the client the URI of a information resource which describes the non-information resource using the HTTP response code 303 See Other .
      2. The client dereferences this new URI and gets a representation describing the original non-information resource.

    Data publishers have two ways of providing clients with URIs of information resources describing non-information resources: Hash URIs and 303 redirects.

    Web Architecture Principles (V): HTTP requests

    1. The client performs an HTTP GET request on a URI identifying a non-information resource. In our case a vocabulary URI. If the client is a Linked Data browser and would prefer an RDF/XML representation of the resource, it sends an Accept: application/rdf+xml header along with the request. HTML browsers would send an Accept: text/html header instead.
    2. The server recognizes the URI to identify a non-information resource. As the server can not return a representation of this resource, it answers using the HTTP 303 See Other response code and sends the client the URI of an information resource describing the non-information resource. In the RDF case: RDF content location.
    3. The client now asks the server to GET a representation of this information resource, requesting again application/rdf+xml.
    4. The server sends the client an RDF/XML document containing a description of the original resource vocabulary URI.

    URI Aliases

    • Web = open environment => different information providers talk about the same non-information resource, e.g. a geographic location or a famous person => different URIs for identifying the same real-world object:
    • DBpedia: http://dbpedia.org/resource/BerlinGeonames: http://sws.geonames.org/2950159/ both URIs refer to the same non-information resource => URI aliases.
    • URI aliases:
      • common on the Web of Data, can not realistically be expected that all information providers agree on the same URIs to identify a non-information resources.
      • provide an important social function since they allow different views and opinions to be expressed.
      • common practice that information providers set owl:sameAs links to URI aliases they know about.

    LOD uses RDF Data Model

    Benefits of using the RDF Data Model in the Linked Data Context

    • Clients can look up every URI in an RDF graph over the Web to retrieve additional information.
    • Information from different sources merges naturally.
    • The data model enables you to set RDF links between data from different sources.
    • The data model allows you to represent information that is expressed using different schemas in a single model.
    • Combined with schema languages such as RDF Schema or OWL, the data model allows you to use as much or as little structure as you need, meaning that you can represent tightly structured data as well as semi-structured data.

    RDF Features Best Avoided in the Linked Data Context

    • Blank nodes: impossible to set external RDF links to a blank node, merging data from different sources becomes more difficult => use URI references.
    • RDF reification: semantics of reification unclear and reified statements are cumbersome to query with the SPARQL. Metadata can be attached to the information resource instead.
    • RDF collections or RDF containers do not work well together with SPARQL. Can the information also be expressed using multiple triples having the same predicate? makes SPARQL queries straight forward.

    Choosing URIs

    • Use HTTP URIs for everything. http:// scheme is (the only URI scheme) that is widely supported in tools and infrastructure.
    • Define your URIs in an HTTP namespace under your control, where you actually can make them dereferenceable.
    • Keep implementation cruft out of your URIs. Short, mnemonic names are better. Consider:
      • http://dbpedia.org/resource/Berlin
      • http://www4.wiwiss.fu-berlin.de:2020/demos/dbpedia/cgi-bin/resources.php?id=Berlin
    • Keep URIs stable and persistent. Changing URIs will break any already-established links.
    • URIs are constrained by the technical environment. server is called demo.serverpool.wiwiss.example.org, then your URIs will have to begin with http://demo.serverpool.wiwiss.example.org/. If server does not run on port 80, then URIs begin with http://demo.serverpool.example.org:2020/. Clean up URIs using URI rewriting rules.
    • Often three URIs related to a single non-information resource:
      • an identifier for the resource,
      • an identifier for a related information resource suitable to HTML browsers (with a web page representation),
      • an identifier for a related information resource suitable to RDF browsers (with an RDF/XML representation).
    • Several ideas for choosing these related URIs:
      • http://dbpedia.org/resource/Berlin http://id.dbpedia.org/Berlin http://dbpedia.org/Berlin
      • http://dbpedia.org/page/Berlin http://pages.dbpedia.org/Berlin http://dbpedia.org/Berlin.html
      • http://dbpedia.org/data/Berlin http://data.dbpedia.org/Berlin http://dbpedia.org/Berlin.rdf
    • Often some kind of primary key is required inside URIs, to make sure that each one is unique. Use a key that is meaningful inside your domain. E.g., when dealing with books, using the ISBN number is better than using the primary key of an internal database table.

    Vocabularies

    • Reusing existing terms
    • Well-known vocabularies have evolved in the Semantic Web community:
    • Friend-of-a-Friend (FOAF), vocabulary for describing people.
    • Dublin Core (DC) defines general metadata attributes. See also their new domains and ranges draft.
    • Semantically-Interlinked Online Communities (SIOC), vocabulary for representing online communities.
    • Description of a Project (DOAP), vocabulary for describing projects.
    • Simple Knowledge Organization System (SKOS), vocabulary for representing taxonomies and loosely structured knowledge.
    • Music Ontology provides terms for describing artists, albums and tracks.
    • Review Vocabulary, vocabulary for representing reviews.
    • Creative Commons (CC), vocabulary for describing license terms.

    What should be returned as RDF description for a URI?

    • The description: all triples from the dataset that have the resource's URI as the subject (immediate description of the resource)
    • Backlinks: all triples from the dataset that have the resource's URI as the object (allows browsers and crawlers to traverse links in either direction).
    • Related descriptions: additional information about related resources that may be of interest in typical usage scenarios. E.g., send information about the author along with information about a book, because clients interested in the book may also be interested in the author. Moderation is recommended, returning 1 MB of RDF will be considered excessive.
    • Metadata: any metadata, such as a URI identifying the author and licensing information. These should be recorded as RDF descriptions of the information resource that describes a non-information resource; i.e., the subject of the RDF triples should be the URI of the information resource. Attaching meta-information to that information resource, rather than attaching it to the described resource itself or to specific RDF statements about the resource (as with reification) plays nicely together with using Named Graphs and the SPARQL query language in Linked Data client applications. Each RDF document should contain a license under which the content can be used (e.g. Creative Commons).
    • Serialization: The data source should at least provide RDF/XML (official syntax for RDF), additionally Turtle (better readable), when asked for MIME-type text/turtle. In situations where people might want to use data together with XML technologies (XSLT or XQuery), also serve a TriX serialization (works better with these technologies than RDF/XML).

    Example: Returning RDF from a URI

    Metadata and Licensing Information

    <http:dbpedia.org/data/alec_empire/>
            rdfs:label "RDF description of Alec Empire" ;
    rdf:type foaf:Document ;
    dc:publisher <http:dbpedia.org/resource/dbpedia/> ; dc:date "2007-07-13"^^xsd:date ;
    dc:rights <http:en.wikipedia.org/wiki/wp:gfdl/> .

    The description

    <http:dbpedia.org/resource/alec_empire/>
            foaf:name "Empire, Alec" ;
    rdf:type foaf:Person, <http:dbpedia.org/class/yago/musician/> ; rdfs:comment "Alec Empire (born May 2, 1972) is a German musician who is ..."@en ;
    rdfs:comment "Alec Empire (eigentlich Alexander Wilke) ist ein deutscher Musiker. ..."@de ;
    dbpedia:genre <http:dbpedia.org/resource/techno/> ;
    dbpedia:associatedActs <http:dbpedia.org/resource/atari_teenage_riot/> ;
    foaf:page <http:en.wikipedia.org/wiki/alec_empire/>, <http:dbpedia.org/page/alec_empire/> ;
    rdfs:isDefinedBy <http:dbpedia.org/data/alec_empire/> ; owl:sameAs <http:zitgist.com/music/artist/d71ba53b-23b0-4870-a429-cce6f345763b/> .

    Backlinks

    • <http:dbpedia.org/resource/60_second_wipeout/> dbpedia:producer <http:dbpedia.org/resource/alec_empire/> .
    • <http:dbpedia.org/resource/limited_editions_1990-1994/> dbpedia:artist <http:dbpedia.org/resource/alec_empire/> .

    5 star Open Data

    Learning Objectives

    • Relation between Relational Databases and RDF.
    • Basic understanding of mapping principles.

    Classic web deployment

    • Shared access to the
      data.
    • Exposes data as
      webpages for human consumption.

    Triplification by Materialization

    • Direct access on the data, users can create their own queries.
    • Linked Data allows other applications to consume date.
    • Negative: Needs an other server with indexes / memory footprint. 

    Triplification by SPARQL-to-SQL-Rewriting

    • All benefits from previous plus:
    • Reduced deployment overhead, small memory footprint
    • Data always up to date

    Mappings for Triplification

    •  Work for both Materialization and SPARQL-to-SQL-Rewriting
    • R2RML is the most prominent RDB to RDF Mapping Language
      • Custom mappings for converting RDB into RDF
      • W3C Recommendation since September 2012
      • Turtle serialization



    R2RML Core Concepts

    Term Map creates RDF terms (IRIs, Literals and Blank Nodes)

    • from a template, or
    • from a column, or
    • from a constant expression.
       

    R2RML Core Concepts

    Term Map creates RDF terms (IRIs, Literals and Blank Nodes)

    • from a template, or
    • from a column, or
    • from a constant expression.

    Triples Map create triples

    • from the rows of a table or view,
    • using Term Maps.

    R2RML Core Concepts

    Term Map creates RDF terms (IRIs, Literals and Blank Nodes)

    • from a template, or
    • from a column, or
    • from a constant expression.

    Triples Map create triples

    • from the rows of a table or view,
    • using Term Maps.  

    Referencing Object Map models a relations between Triples Maps.

    Mapping with R2RML

     

    Mapping with R2RML

     

    Simple Mapping Executed

     A Sample Database


    Simple Mapping Executed

    For the previous mapping


    Simple Mapping Executed

    Results in


    Outline

    1. Motivation and Definition
    2. Overview of Ontology Learning Approaches
    3. In Detail: Learning Definitions with Refinement Operators
    4. Conclusions

    Outline

    1. Motivation and Definition
    2. Overview of Ontology Learning Approaches
    3. In Detail: Learning Definitions with Refinement Operators
    4. Conclusions

    Definition: Ontology Learning

    • "Ontology Learning is a subtask of information extraction. The goal of ontology learning is to (semi-)automatically extract relevant concepts and relations from a given corpus or other kinds of data sets to form an ontology." (Wikipedia, today)
    • "Ontology Learning is a mechanism for semi-automatically supporting the ontology engineer in engineering ontologies.''
      A. D. Mädche. Ontology Learning for the Semantic Web. Dissertation. Universität Karlsruhe, 2001
    • "Ontology Learning aims at the integration of a multitude of disciplines in order to facilitate the construction of ontologies, in particular ontology engineering and machine learning."
      A. D. Mädche, S. Staab. Ontology Learning. Handbook of Ontologies in Information Systems, 2004

    Classification of Ontology Learning Data

    sometimes heterogeneous sources of evidence (e.g., hyponymy [Snow et al. 2006], subsumption [Cimiano et al. 2005], [Manzano-Macho et al. 2008], [Buitelaar et al. 2008], disjointness [Völker et al. 2007])

    Classification of Ontology Learning DataII

    Ontology Learning Layer Cake [Cimiano 2006]

     

    Patterns [Hearst 1992] for Class Subsumption

    • NP such as {NP,}* {or|and} NP
      • games such as baseball and cricket
    • NP {,NP}* {,} {and|or} other NP
      • rabbits and other animals
      • but: „rabbits and other pets
    • NP {,} including {NP,}* {or|and} NP
      • fruits including apples and pears
    • NP {,} especially {NP,}* {or|and} NP
      • Europeans, especially Italians
      • but: „US presidents, especially democrats

    Patterns [Ogata and Collier 2004]

    • NP is a NP
      • „A kangaroo is an animal living in Australia.“
    • a NP named|called NP
      • „Japanese people like to play a game called Go .“
    • NP, NP
      • Sencha , the most popular tea in Japan, ..."
    • NP. The NP
      • „John loves his Ferrari . The car ...“
    • Among NP, NP
      • Among all musical instruments, violins are ..."
    • NP except for|other than NP
      • Employees except for managers suffer from ..."

    JAPE Rule

    • GATE = General Architecture for Text Engineering
    • written in Java
    • mature, used worldwide
    • JAPE = language for rapid prototyping and efficient implementation of shallow analysis methods
    • can be used e.g.~for domain specific patterns (financial blogs etc.)

    JAPE Rule II

    rule: Hearst_1 ( (NounPhrase):superconcept {SpaceToken.kind == space} {Token.string=="such"} {SpaceToken.kind == space} {Token.string=="as"} {SpaceToken.kind == space} (NounPhrase):subconcept ):hearst1

    -->

    :hearst1.SubclassOfRelation = { rule = "Hearst1" }, :subconcept.Domain = { rule = "Hearst1" }, :superconcept.Range = { rule = "Hearst1" }

    Lexical Context Similarity (e.g. [Cimiano and Völker 2005])

    • "Columbus is the capital of the state of Ohio. Columbus has a population of about 700,000 inhabitants."
    • Columbus (capital (1), state (1), Ohio (1), population (1), inhabitant (1) )
    • City (country (2), state (1), inhabitant (2), mayor (1), attraction (1) )
    • Explorer (ship (1), sailor (2), discovery (1) )  

    „most probably“: City(Columbus)

    Subcategorization Frames

    • "Tina drives a Ford."
      • Person(Tina). Vehicle(Ford).
    • "Her father drives a bus."
      • Father subclass-of Person
      • Bus subclass-of Vehicle
    • subcat: drive( subj: person, obj: vehicle )
      • \[Person \sqsubseteq \forall drive.Vehicle \]

    Text2Onto

    Suchanek et al. 2009

    Learning from text and background knowledge via reasoning:

    "Washington is the capital of the US. (...) New York is the US capital of fashion."

    • extracted: hasCapital(US, New York); hasCapital(US, Washington)
    • background knowledge: hasCapital is a functional property
    • possible inferences:
      • New York = Washington
      • inconsistency (unique names assumption)
    • logical contradictions can help to detect errors in automatically extracted information

    LeDA

    Other Approaches

    • Association rules and co-occurrence statistics
    • WordNet : \[hyponymy \approx subsumption \]
      • hyponym( bank\(\sharp\)1‚ institution\(\sharp\)1 )
      • Bank subclass-of Institution
    • Noun phrase heuristics
      • „image processing software“
    • Instance clustering (e.g. Columbus and Washington)
      • Hierarchical clustering of context vectors
    • Knowledge Base Completion / Formal Concept Analysis (FCA)
      • asks knowledge engineer questions to complete a knowledge base
      • tool: OntoComp [Sertkaya et al.]

    Tools and Frameworks

    Table: Lexical ontology learning: informal or semi-formal data (e.g. texts)

    Tools and Frameworks II

    Problems and Challenges

    • Homonymy and polysemy e.g. [Ovchinnikova et al. 2006]
      • "Peter is sitting on the bank in front of the bank."
      • "An interesting book is lying on the table."
    • Semantics of adjectives
      • "red flower", "false friend"
    • Empty heads e.g. [Völker et al. 2005], [Cimiano and Wenderoth 2005]
      • "Tuna is a kind of fish. The Southern Bluefin is one of the most endangered types of Tuna."
    • Ellipsis and underspecification
      • "Mary started the book."
    • Anaphora (e.g. pronouns) e.g. [Cimiano and Völker 2005]
      • "There is an apple on the table. It is red."

    Problems and Challenges (Ctd.)

    • Metaphors and analogies e.g. [Gust et al. 2007]
      • " Live is a journey ."
    • Opinions, quotations and reported speech
      • "Tom thinks that dolphins are mammals."
    • What should be represented as an individual? e.g. [Zirn et al. 2008]
      • "The kangaroo is an animal living in Australia."
    • Class, relation (object property) or attribute (datatype property)?
      • "All elephants are grey."
      • "Easter monday is a national holiday."
    • Knowledge is changing e.g. [Stojanovic 2004], [Zablith et al. 2009]
      • "Pluto is a planet."

    Learning OWL Class Expressions

    • given:
      • background knowledge (particularly OWL/DL knowledge base)
      • positive and negative examples (particulary individuals in knowledge base)
    • goal:
      • logical formula (particularly OWL Class Expression) covering positive examples and not covering negative examples

    ILP and Semantic Web

    • since early 90s Inductive Logic Programming
    • only few approaches based on description logics
    • Web Ontology Language (OWL) becomes W3C standard in 2004
    • increasing number of RDF/OWL knowlegde bases, but ILP still mainly focuses on logic programs -->  research gap

    Why ILP in the Semantic Web?

    • Ontology Learning:
      • given class A in K
      • instances of A as positive examples
      • non-instances as negative examples
      • definitions can be learned if ABox data is available
    • improvement of existing ML problem solutions
    • direct usage of knowledge in the Semantic Web instead of conversion in e.g. horn clauses to apply ML methods

    TODO: /refinerho missing... Refinement Operators - Definitions

    • given a DL \(\mathcal{L}\), consider the quasi-ordered space \(\langle\mathcal{C}(\mathcal{L}),\sqsubseteq_ T\rangle\) over concepts of \(\mathcal{L}\)
    • \(\rho: \mathcal{C}(\mathcal{L})\to 2^{\mathcal{C}(\mathcal{L})}\) is a downward \(\mathcal{L}\) refinement operator if for any \(C \in \mathcal{C}(\mathcal{L})\):\[D \in \rho(C) \text{ implies } D \sqsubseteq_ T C\]
    • notation: Write \(C \to D\) instead of \(D \in \rho(C)\)
    • example refinement chain in \(\langle\mathcal{C}(EL), \sqsubseteq_ T\rangle\): \[ \top \to_{\rho} male \to male \sqcap \exists hasChild.\top \]

    Learning with Refinement Operators

    TODO: \refinerho missing... Properties of Refinement Operators

    An \(La\) downward refinement operator \(rho\) is called
    • finite iff \(\rho(C)\) is finite for any concept \(\in \mathcal{C}(\mathcal{L})\)
    • redundant iff there exist two different \(\rho\) refinement chains from a concept C to a concept D.
    • proper iff for \( C,D\in \mathcal{C}(\mathcal{L}), C refinerho D \) implies \(C \not\equiv_T D \)
    • ideal iff it is finite, complete, and proper.
    • complete iff for \( C,D\in \mathcal{C}(La) with D \sqsubseteq_ T C there is a concept E with E \equiv_ T D and a refinement chain C refinerho \cdots refinerho E\)
    • weakly complete iff for any concept \(C\) with \(C \sqsubseteq_T \top\) we can reach a concept \(E\) with \(E \equiv_T C\) from \(\top\) by \(\rho\).
    • ideal = complete + proper + finite

    Properties of Refinement Operators II

    • Properties indicate how suitable a refinement operator is for solving the learning problem:
      • Incomplete operators may miss solutions
      • Redundant operators may lead to duplicate concepts in the search tree
      • Improper operators may produce equivalent concepts (which cover the same examples)
      • For infinite operators it may not be possible to compute all refinements of a given concept
    • We researched properties of refinement operators in Description Logics
    • Key question: Which properties can be combined?

    Refinement Operator Property Theorem

    Theorem

    Maximal sets of properties of \(\mathcal{L}\) refinement operators which can be combined for \(\mathcal{L} \in \{\mathcal{ALC}, \mathcal{ALCN}, \mathcal{SHOIN}, \mathcal{SROIQ} \}\):

    1. {weakly complete, complete, finite}
    2. {weakly complete, complete, proper}
    3. {weakly complete, non-redundant, finite}
    4. {weakly complete, non-redundant, proper}
    5. {non-redundant, finite, proper}
    "Foundations of Refinement Operators for Description Logics",
    J. Lehmann, P. Hitzler, ILP conference, 2008

    "Concept Learning in Description Logics Using Refinement Operators",
    J. Lehmann, P. Hitzler, Machine Learning journal, 2010

    Refinement Operator Property Theorem II

    • no ideal refinement in OWL and many description logics
    • indicates that learning in DLs is hard
    • algorithms need to counteract disadvantages
    • goal: develop operators close to theoretical limits

    Definition of \(\mathcal{p}\)

     

    Definition of \(\mathcal{p}\) II

    Definition of \(\mathcal{p}\) III

    Definition of \(\mathcal{p}\) IV

    TODO: Characters.. \(\mathcal{p}\) Properties

    • \(\op\) is complete
    • \(\op\) is infinite , e.g. there are infinitely many refinement steps of the form: \( \top \refineop C_1 \sqcup C_2 \sqcup C_3 \sqcup \dots \)
    • \(\op\) not proper, but can be extended to a \emph{proper operator \(\opclosed\)} (refinements more expensive to compute)
    • \(\op\) is redundant:

    TODO: Characters.. \(\mathcal{p}\) Properties II

    • \(\op\) is complete
    • \(\op\) is infinite , e.g. there are infinitely many refinement steps of the form: \( \top \refineop C_1 \sqcup C_2 \sqcup C_3 \sqcup \dots \)
    • \(\op\) not proper , but can be extended to a proper operator \(\opclosed\) (refinements more expensive to compute)
    • \(\op\) is redundant :

    "A Refinement Operator Based Learning Algorithm for the \(\mathcal{ALC}\) Description Logic",
    J. Lehmann, P. Hitzler, ILP conference, 2008

    "Concept Learning in Description Logics Using Refinement Operators",
    J. Lehmann, P. Hitzler, Machine Learning journal, 2010

    OCEL

    • uses \(mathcal{p}\) for top down search
    • OCEL is complete - it always find a solution if one exists
    • highly configurable, e.g. felxible target language, termination criteria and heuristics
    • implements redundancy elimination technique with polynommial complexity wrt. search tree size based on ordered negation normal form
    • can handle infinite refinement operators by stepwise length-limited horizontal expansion

    TODO: Stepwise Node Expansion

    Scalability: Reasoning

    \(\mathcal{K} = \{ \mathcal{male} \sqsubseteq \mathcal{person}\),
    \(\mathcal{OnlyMaleChildren}(a)\),
    \(\mathcal{Person}(a), \mathcal{Male}(a_1), \mathcal{Male}(a_2)\),
    \(\mathcal{hasChild}(a,a_1), \mathcal{hasChild}(a,a_2) \} \)

    • given \(\mathcal{K}\), we want to learn a description of \(\mathcal{OnlyMaleChildren}\)
    • \(C = \mathcal{person} \sqcap \forall \mathcal{hasChild}.\mathcal{male}\) appears to be a good solution, but \(\mathcal{a}\) is not an instance of \(mathcal{C}\) under OWA
    • idea: dematerialise \(K\) using standard (OWA) DL reasoner, but perform instance checks using CWA
    • closer to intuition and provides order of magnitude performance improvements
    • optimised for thousands of instance checks on a static knowledge base

    Scalability: Stochastic Coverage Computation

    Heuristics often require expensive instance checks or retrieval, e.g.:

    \[\begin{aligned} %\acc(C) & = \frac{1}{2} \cdot \left( \frac{\mathbf{|R(A) \cap R(C)|}}{|R(A)|} + \sqrt{\frac{\mathbf{|R(A) \cap R(C)|}}{\mathbf{|R(C)|}}} \right) %\acc(C) & = \frac{1}{2} \cdot \left( \frac{|R(A) \cap R(C)|}{|R(A)|} + \sqrt{\frac{|R(A) \cap R(C)|}{|R(C)|}} \right) \end{aligned}\]

    Scalability: Stochastic Coverage Computation II

    Heuristics often require expensive instance checks or retrieval, e.g.:

    \[\begin{aligned} %\acc(C) & = \frac{1}{2} \cdot \left( \frac{a}{|R(A)|} + \sqrt{\frac{a}{b}} \right) \end{aligned}\]
    • replace \(|R(A) \cap R(C)|\) und \(|R(C)|\) by variables \(a\) and \(b\) we want to estimate
    • Wald-Method for computing the 95% confidence interval
    • first estimate \(mathcal{a}\), then the whole expressions
    • method can be applied to various heuristics
    • in tests on real ontologies up to 99% less instance checks and algorithm up to 30 times faster
    • low influence on learning results empirically shown in 380 learning problems on 7 real ontologies (differs by ca. \(0,2\% \pm 0,4\%\))

    Scalability: Fragment Extraction

    Extraction of Fragments from SPARQL Endpoints / Linked Data:

    "Learning of {OWL} Class Descriptions on Very Large Knowledge Bases",
    Hellmann, Lehmann, Auer, Int. Journal Semantic Web Inf. Syst, 2009

    Evaluation Setup

    • lack of evaluation standards in OWL/DL learning
    • procedure: convert existing benchmarks to OWL (time consuming, requires domain knowledge)
    • measure predictive accuracy in ten fold cross validation
    • part 1: evaluation against other OWL/DL learning systems
    • part 2: evaluation against other ML systems (carcinogenesis problem)
    • part 3: evaluation of ontology enginering

    Evaluation: Accuracy

    • Collection of 6 Benchmarks
    • OCEL often stat. significantly better than other algorithms for most benchmarks

    Evaluation: Readability

    • YinYang generates significantly longer solutions

    Evaluation: Runtime

    Carcinogenesis

    • goal: predict whether chemical compounds cause cancer
    • Why?
      • more than 1000 new substances each year
      • substances can often only be tested via long and expensive experiments on rats and mice
    • background knowledge:
      • database of US National Toxicology Program (NTP)
      • converted from Prolog to OWL

    "Obtaining accurate structural alerts for the causes of chemical cancers is a problem of great scientific and humanitarian value." (A. Srinivasan, R.D. King, S.H. Muggleton, M.J.E. Sternberg 1997)

    Carcinogenesis II

    • very challenging problem: low accuracy, high standard deviation
    • OCEL stat. sign. better than most other approaches

    Ontology Learning Evaluation

    • 5 PhD studens
    • 5 real ontologies in different domains
    • 998 decision of each test person for 92 classes
    • in 35% of the cases accepted suggestions for ontology enhancements
    • problem: ontology quality, modelling errors (unsatisfiable classes, disjunction and conjunction confused etc.)

    DL-Learner Project

    • DL-Learner Open-Source-Projekt: http://dl-learner.org, http://sf.net/projects/dl-learner
    • extensible platform for different learning problems and algorithms
    • Interfaces: command line, GUI, Web-Service
    • supports common OWL formats
    • allows different reasoners (via OWL API, DIG, OWLLink)
    • mloss.org (ML & Open Source Software): 1600 Downloads

    Applications

    • "classical" ML problems
      • carcinogenesis
      • other biomedical tasks

    Applications II

    • "classical" ML problems
      • carcinogenesis
      • other biomedical tasks
    • Ontology Learning
      • Protégé Plugin

    Applications III

    • "classical" ML problems
      • carcinogenesis
      • other biomedical tasks
    • Ontology Learning
      • Protégé Plugin
      • OntoWiki Plugin

    Applications IV

    • "classical" ML problems
      • carcinogenesis
      • other biomedical tasks
    • Ontology Learning
      • Protégé Plugin
      • OntoWiki Plugin
      • ORE

    Applications V

    • "classical" ML problems
      • carcinogenesis
      • other biomedical tasks
    • Ontology Learning
      • Protégé Plugin
      • OntoWiki Plugin
      • ORE
    • Recommendation/Navigation
      • moosique.net

    Applications VI

    • "classical" ML problems
      • carcinogenesis
      • other biomedical tasks
    • Ontology Learning
      • Protégé Plugin
      • OntoWiki Plugin
      • ORE
    • Recommendation/Navigation
      • moosique.net
      • DBpedia Navigator

    Applications VI

    • "classical" ML problems
      • carcinogenesis
      • other biomedical tasks
    • Ontology Learning
      • Protégé Plugin
      • OntoWiki Plugin
      • ORE
    • Recommendation/Navigation
      • moosique.net
      • DBpedia Navigator
    • other/external:
      • ISS (Gerken et al.)
      • Learning in Probabilistic DLs (Ochoa Luna et al.)
      • TIGER Corpus Navigator (Hellmann et al.)

    Conclusions

    • Ontology Learning is a diverse research area involving several research disciplines (NLP, Machine Learning, Ontology Engineering)
    • approaches vary in used data sources and the expressiveness of the created ontologies
    • refinement operator based learning as one method for learning definitions (with applications outside of learning ontologies)
    • new Wiki (under construction): http://ontology-learning.net
    • new ontology learning book in 2011

    How to set links?

    • Manually
      • Uriqr or Sindice to search for existing URI
    • Automatic generation
      • Link Discovery
        • LIMES – Link Discovery Framework for Metric Spaces provides time-efficient approaches for discovery and computing the results of link specifications.
        • Silk - A Link Discovery Framework for the Web of Data tool for discovering relationships between data items within different Linked Data sources. Data publishers can use Silk to set RDF links from their data sources to other data sources on the Web.
        • TopBraid Composer (ontology editor made by TopQuadrant) has a wizard for linking ontology instances to corresponding DBpedia concepts.
        • SemMF SemMF is a framework for calculating semantic similarity between objects that are represented as arbitrary RDF graphs. The framework allows taxonomic and non-taxonomic concept matching techniques to be applied to selected object properties.
        • Yves Equivalence Miner together with an experience report about the problems he ran into while interlinking Jamendo and Musicbrainz.

    Link Discovery

    • Cannot be carried out manually at Web scale
      • 31 billion triples
      • Freebase contains over 20 million entities
      • Over 250 knowledge bases
    • Automatic approaches
      • Ontology Matching
      • Instance Matching

    Ontology Matching

    • Goal : Find OWL class expression that express the relation between the ontologies

    OM: Approaches

    • Sense-Based: WordNet hierarchy distance

    OM: Approaches

    • Extensional techniques: Compare the instances

    OM: Approaches

    • Often smaller datasets that in instance matching
    • For most ontology, simple matching suffices
    • Problem: Accuracy in the long tail
    • Need for formally correct statements (DL statements)

    Link Discovery

    • Goal: Discover related entities across knowledge bases

    TODO: Bild.. Formal Definition

    • Goal: For all sS and tT, find all pairs (s, t) such that s(s, t) > q
    • Equivalent formulation : Find classifier C: S×T  {-1, +1} such that C(s, t) = -1 iff s(s, t) < q , else C(s, t) = +1

    Link Discovery

    • Two main problems
      • Runtime
      • Complexity of specifications
    • Runtime
      • Large number of instances
      • Brute-force approach in O(|S||T|)
      • Comparisons of strings comprising m tokens is in O(m2)

    Link Discovery

    • Two main problems
      • Runtime
      • Complexity of specifications
    • Complexity of specifications
      • Which properties should be used?
      • Which similarity measures work best?
      • Which threshold settings should be used?

    Link Discovery

    LD: Runtime

    • Aggregration and Blocking (SILK)

    LD: Runtime

    • Aggregration and Blocking (SILK)

    TODO: Bild .. LD: Runtime

    • Hybrid (LIMES)

    LD: Runtime

    • PassJoin

    TODO: Bild .. HYPPO

    • D = t/a

    TODO: Bild.. HYPPO

    HYPPO

    HYPPO

    • D

    HYPPO

    • D

    HYPPO

    • Approximation rate:
    • Number of cubes:
    • Tradeoff: high granularity leads to better approximation but to more cubes
    • a = 1
    • a = 2
    • a = 4

    Learning Link Specifications

    • Supervised Learning
      • Batch learning
      • Active Learning
    • Unsupervised Learning
      • Self-Configuration
      • Optimization of objective function

    RAVEN

    • Hospital/Residents
    • Learning classifier C involves learning
    • Two sets of restrictions resp. that specify the sets S resp. T,
    • a specification of a complex similarity measure σ as the combination of several atomic similarity measures σ1, ..., σn and
    • a set of weights/thresholds q1, ..., qn such that qi is the threshold for σi.
    •  

    RAVEN

    • Hospital/Residents + Classifier model
    • Learning classifier C involves learning
    • Two sets of restrictions resp. that specify the sets S resp. T,
    • a specification of a complex similarity measure σ as the combination of several atomic similarity measures σ1, ..., σn and
    • a set of weights/thresholds q1, ..., qn such that qi is the threshold for σi.
    •  

    RAVEN

    • Active Perceptron Learning
    • Learning classifier C involves learning
    • Two sets of restrictions resp. that specify the sets S resp. T,
    • a specification of a complex similarity measure σ as the combination of several atomic similarity measures σ1, ..., σn and
    • a set of weights/thresholds q1, ..., qn such that qi is the threshold for σi.
    •  

    RAVEN

    RAVEN

    RAVEN

    RAVEN

    Hospital/Residents

    Hospital/Residents

    TODO: Formal Definition

    • Learning classifier C involves learning
      1. Two sets of restrictions resp. that specify the sets S resp. T,
      2. a specification of a complex similarity measure σ as the combination of several atomic similarity measures σ1, ..., σn and
      3. a set of thresholds q1, ..., qn such that qi is the threshold for σi.
    • NB: Assume restrictions are class restrictions

    Restriction Discovery

    1. Start with source and target knowledge bases KS and KT

    Restriction Discovery

    2. Sample instances randomly across KS and KT

    Restriction Discovery

    3. Count the number of owl:sameAs links between Si and Tj

    Restriction Discovery

    4. Solve equivalent Hospital/Resident problem

    Restriction Discovery

    4. Solve equivalent Hospital/Resident problem

    Problem: Not enough owl:sameAs links

    Restriction Discovery

    3. Count the number of instances of Si and Tj that share common property values

    Restriction Discovery

    4. Solve equivalent Hospital/Resident problem

    Restriction Discovery

    • Source

    • Target

    • S

    • T

    • Drugbank

    • Disesome

    • Targets

    • Genes

    • Sider

    • Diseasome

    • Side-Effect

    • Diseases

    • DBpedia

    • Dailymed

    • Organization

    • Organization

    • Sider

    • Dailymed

    • Drugs

    • Offer

    • Drugbank

    • DBpedia

    • Targets

    • Protein

    RAVEN

    • Begin with unclassified links

    TODO: Formel.. RAVEN

    Initialize classifier:

    TODO: sign.. RAVEN

    • Get most informative positive (L+) and negative candidates (L-)

    RAVEN

    • Ask the oracle for classification

    RAVEN

    • Ask the oracle for classification

    TODO: Formel.. RAVEN

    • Update L:

    RAVEN

    • Fetch most informative positive and negative candidates

    RAVEN

    • Ask oracle

    TODO: Formel.. RAVEN

    • Terminate when agrees with oracle on all and return classification

    TODO: Bild.. Goal

    • Drugbank
    • Dailymed
    • db:Drugs
    • rdfs:label
    • dm:name
    • db:brandName
    • dm:name
    • dm:Offer
    • Trigrams
    • Trigrams
    • > 0.9

    TODO: Bild.. EAGLE

    • Idea: Specifications are trees
    • Goal: Learn elements of trees through genetic operations until best specification is found

    TODO: Bild.. EAGLE

    • Step 1: Generate initial population
      • Random process (property pairs, thresholds)
      • Compute fitness
      • Fitness = F1-measure w.r.t known data

    EAGLE

    • Step 2: Evolve population
      • Tournament between two individuals
      • Two operators: Mutation and crossover

    EAGLE

    • Step 2: Evolve population
      • Tournament between two individuals
      • Two operators: Mutation and crossover

    EAGLE

    • Step 2: Evolve population
      • Tournament between two individuals
      • Two operators: Mutation and crossover

    EAGLE

    • Step 2: Evolve population
      • Tournament between two individuals
      • Two operators: Mutation and crossover

    EAGLE

    • Step 2: Evolve population
        • Tournament between two individuals
      • Two operators: Mutation and crossover

    EAGLE

    • Step 3: Computation of most informative links
      • Previous approaches define amount of information of link as closeness to the decision boundary
      • Here, use disagreement amongst elements of population of size n
      • Function is maximal when n/2 count (s,t) as positive and n/2 as negative
      • Can be modelled with other functions such as entropy

    TODO: Bild.. EAGLE

    • Step 4: Active Learning
      • Compute d((s,t)) for all (s,t) returned by a spec
      • Pick k most informative
      • Require labeling from user
      • Update list of positive and negative examples

    EAGLE

    • Step 5: Remove least fit elements
    • Fitness = F1-measure w.r.t known data

    EAGLE

    • Step 5: If termination conditions not met, goto Step 2. Else terminate and pick fittest spec.

    Implementation (LIMES)

    Conclusion...

    Outline

    • What is data quality?
    • Why is data quality important?
    • Challenges
    • Data quality concepts
    • Data quality dimensions and metrics
    • Data quality assessment tools

    What is data quality?

    • Commonly conceived as a multi-dimensional construct with a popular definition as
      • fitness for use”*


    * Juran J. The Quality Control Handbook. McGraw -Hill, New York, 1974

    Why is data quality important?

    • Data is only good as its quality
    • Portrays flaws in data before it can be used in a use case
    • Affects potential use of the data
      • Limits full exploitation of such data by data consumers

    Challenges

    • Several definitions for same dimension (different schools of thought)
    • Measuring data quality (automatically or semi-automatically)
    • Specifying data quality metrics (quantitative and qualitative) for each dimension
    • Scalability of assessment
    • Making results of the assessment explicit
    • Improving data quality after assessment

    Data quality concepts

    Data Quality Problems 

    • set of issues that can affect the potentiality of the applications that use the data

    Data Quality Assessment Methodology

    • evaluating if a piece of data meets the information consumers need in a specific use case

    Data quality concepts

    Data Quality Dimension

    • data quality assessment involves measuring of quality dimensions or criteria that are relevant to the consumer

    Data Quality Metric

    • a metric or measure is a procedure for measuring an quality dimensions

    Dimension Groups

    • Four groups:
      • Accessibility
      • Intrinsic
      • Contextual
      • Representational

    Dimensions and their relations

    An overview of the 18 dimensions categorized into four groups and their inter-relations.

    User Scenario

    • Consider an intelligent flight search engine
      • relies on aggregating data from several datasets
      • obtains information about airports and airlines from an airline dataset (e.g. OurAirports OpenFlights )
      • information about the location of countries, cities and particular addresses is obtained from a spatial dataset (e.g. LinkedGeoData
      • aggregators pull all the information related to flights from different booking services (e.g., Expedia)

    Note: We will use this scenario throughout as an example to explain each quality dimension through a quality issue.  

    Accessibility dimensions

    • Availability
    • Licensing
    • Interlinking
    • Security
    • Performance

    Availability

    Definition
    • extent to which information (or some portion of it) is present, obtainable and ready for use.

    Example

    • User looks up a flight in our flight search engine. She requires additional information such as car rental and hotel booking at the destination, which is present in another dataset and interlinked with the flight dataset. 
    • Instead of retrieving the results, she receives an error response code "404 Not Found"
    • This indicates that either the references resource cannot be dereferenced or the information is unavailable

    Availability Metrics

    • A1: checking whether the server responds to a SPARQL query [19]
    • A2: checking whether an RDF dump is provided and can be downloaded [19]
    • A3: detection of dereferenceability of URIs by checking:
      • for dead or broken links [31], i.e. that when an HTTP-GET request is sent, the status code 404 Not Found is not returned [19]
      • that useful data (particularly RDF) is returned upon lookup of a URI [31]
      • for changes in the URI, i.e. compliance with the recommended way of implementing redirections using the status code 303 See Other [19]
    • A4: detect whether the HTTP response contains the header field stating the appropriate content type of the returned file, e.g. application/rdf+xml [31]
    • A5: dereferenceability of all forward links: all available triples where the local URI is mentioned in the subject (i.e. the description of the resource) [32]

    Licensing

    Definition

    • granting of permission for a consumer to re-use a dataset under defined conditions

    Example

    • LinkedGeoData dataset is licensed under the Open Database License*
    • This allows others to copy, distribute and use the data and produce work from the data allowing modifications and transformations.

    * http://opendatacommons.org/licenses/odbl/

    Licensing Metrics

    •  L1: machine-readable indication of a license in the VoID description or in the dataset itself [19,32]
    • L2: human-readable indication of a license in the documentation of the dataset [19,32]
    • L3: detection of whether the dataset is attributed under the same license as the original [19]

    Interlinking

    Definition

    • degree to which entities that represent the same concept are linked to each other, be it within or between two or more data sources.

    Example

    • The instance of the country " United States "  in an airline dataset should be interlinked with the instance " America " in the spatial dataset.
    • This interlinking can help when a user queries for a flight, as the search engine can display the correct route from the start destination to the end destination by correctly combining information for the same country from both datasets.

    Interlinking Metrics

    • I1: detection of:
      • interlinking degree: how many hubs there are in a network* [26]
      • clustering coefficient: how dense is the network [26]
      • centrality: indicates the likelihood of a node being on the shortest path between two other nodes [26]
      • whether there are open sameAs chains in the network [26]
      • how much value is added to the description of a resource through the use of sameAs edges [26]
    • I2: detection of the existence and usage of external URIs (e.g. using owl:sameAs links) [31,32]
    • I3: detection of all local in-links or back-links: all triples from a dataset that have the resource’s URI as the object [32]

    * In [26], a network is described as a set of facts provided by the graph of the Web of Data, excluding blank nodes.

    Security

    Definition

    • extent to which data is protected against alteration and misuse. 

    Example     

    • If we assume that the flight search engine obtains flight information from arbitrary airline websites, there is a risk for receiving incorrect information from malicious websites. 
    • For instance, an airline or sales agency website can pose as its competitor and display incorrect flight fares.
    • Thus, by this spoofing attack, this airline can prevent users to book with the competitor airline.
    • In this case, the use of standard security techniques such as digital signatures allows verifying the identity of the publisher.  

    Security Metrics

    • S1: using digital signatures to sign documents containing an RDF serialization, a SPARQL result set or signing an RDF graph [19]
    • S2: verifying authenticity of the dataset based on provenance information such as the author and his contributors, the publisher of the data and its sources (if present in the dataset) [19]

    Performance

    Definition

    • efficiency of a system that binds to a large dataset, that is, the more performant a data source the more efficiently a system can process data.

    Example

    • Performance depends on type and complexity of the query by a large number of users of a dataset.
    • Our flight search engine can perform well by considering response-time when deciding which sources to use to answer a query.

    Performance Metrics

    • P1: checking for usage of slash-URIs where large amounts of data is provided*[19]
    • P2: low latency**: (minimum) delay between submission of a request by the user and reception of the response from the system [19]
    • P3: high throughput: (maximum) number of answered HTTP-requests per second [19]
    • P4: scalability: detection of whether the time to answer an amount of ten requests divided by ten is not longer than the time it takes to answer one request [19]

    * http://www.w3.org/wiki/HashVsSlash

    ** Latency is the amount of time from issuing the query until the first information reaches the user [51].

    Intrinsic dimensions

    • Semantic accuracy
    • Syntactic validity
    • Consistency
    • Conciseness
    • Completeness

    Syntactic validity

    Definition

    • degree to which an RDF document conforms to the specification of the serialization format.

    Example

    • Let us assume that the ID of the flight between Paris and New York is A123
    • But in our search engine the same flight instance is represented as A231.
    • Since this ID is included in one of the datasets, it is considered to be syntactically accurate since it is a valid ID (even though it is incorrect).  

    Syntactic validity Metrics

    • SV1: detecting syntax errors using (i) validators [19,31], (ii) via crowdsourcing [1,64]
    • SV2: detecting use of:
      • explicit definition of the allowed values for a certain datatype, (ii) syntactic rules [20], (iii) detecting whether the data conforms to the specific RDF pattern and that the “types” are defined for specific resources [37], (iv) use of different outlier techniques and clustering for detecting wrong values [63]
      • syntactic rules (type of characters allowed and/or the pattern of literal values) [20]
    • SV3: detection of ill-typed literals, which do not abide by the lexical syntax for their respective datatype that can occur if a value is (i) malformed, (ii) is a member of an incompatible datatype [18,31]

    Semantic accruacy

    Definitions

    • degree to which data values correctly represent the real world facts.

    Example 

    • Assume that he ID of the flight between Paris and New York is A123 
    • But, in our search engine the same flight instance is represented as A231 (possibly manually introduced by a data acquisition error)
    • In this case, the instance is semantically inaccurate snce the flight ID does not represent its real-world state i.e. A123.  

    Semantic accuracy Metrics

    • SA1: detection of outliers by (i) using distancebased, deviation-based and distribution-based methods [9,18], (ii) using the statistical distributions of a certain type to assess the statement’s correctness [52]
    • SA2: detection of inaccurate values by (i) using functional dependencies* [20] between the values of two or more different properties [20], (ii) comparison between two literal values of a resource [37], (iii) via crowdsourcing [1,64]
    • SA3: detection of inaccurate annotations**, labellings*** or classifications# using the formula:
      1 - (inaccurate instances/total no. of instances) * (balanced distance metrics/total no. of instances)## [42]
    • SA4: detection of misuse of properties#* by using profiling statistics, which support the detection of discordant values or misused properties and facilitate to find valid values for specific properties [11]
    • SA5: ratio of the number of semantically valid rules+ to the number of nontrivial rules++ [15]

    * Functional dependencies are dependencies between the values of two or more different properties.

    ** Where an instance of the semantic metadata set can be mapped back to more than one real world object or in other cases, where there is no object to be mapped back to an instance.

    *** Where mapping from the instance to the object is correct but not properly labeled.

    #In which the knowledge of the source object has been correctly identified by not accurately classified.

    ##Balanced distance metric is an algorithm that calculates the distance between the extracted (or learned) concept and the target concept [45].

    #*Properties are often misused when no applicable property exists.

    +Valid rules are generated from the real data and validated against a set of principles specified in the semantic network.

    ++The intuition is that the larger a dataset is, the more closely it should reflect the basic domain principles and the semantically incorrect rules will be generated.

    Consistency

    Definition

    • means that a knowledge base is free of (logical/formal) contradictions with respect to particular knowledge representation and inference mechanisms.

    Example

    • A user is looking for flights between Paris and New York on the 21st of December, 2013. Her query returns the following results:

      Flight       From       To                  Arrival       Departure
      A123       Paris       NewYork       14:50        22:35
      A123       Paris       Singapore     14:50        22:35
    • The results show that the flight number A123 has two different destinations at the same date and same time of arrival and departure 

    • This is inconsistent with the ontology definition that one flight can only have one destination at a specific time and date. This contradiction arises due to inconsistency in data representation, which is detected by using inference and reasoning.  

    Consistency Metrics

    • CS1: detection of use of entities as members of disjoint classes using the formula: no. of entities described as members of disjoint classes/total no. of entities described in the dataset [19,31,37]
    • CS2: detection of misplaced classes or properties* using entailment rules that indicate the position of a term in a triple [18,31]
    • CS3: detection of misuse of owl:DatatypeProperty or owl:ObjectProperty through the ontology maintainer** [31]
    • CS4: detection of use of members of owl:DeprecatedClass or owl:DeprecatedProperty through the ontology maintainer or by specifying manual mappings from deprecated terms to compatible terms [18,31]
    • CS5: detection of bogus owl:InverseFunctionalProperty values (i) by checking the uniqueness and validity of the inverse-functional values [31], (ii) by defining a SPARQL query as a constraint [37]
    • CS6: detection of the re-definition by third parties of external classes/properties (ontology hijacking) such that reasoning over data using those external terms is not affected [31]
    • CS7: detection of negative dependencies/correlation among properties using association rules [11]
    • CS8: detection of inconsistencies in spatial data through semantic and geometric constraints [50]
    • CS9: the attribution of a resource’s property (with a certain value) is only valid if the resource (domain), value (range) or literal value (rdfs ranged) is of a certain type - detected by use of SPARQL queries as a constraint [37]
    • CS10: detection of inconsistent values by the generation of a particular set of schema axioms for all properties in a dataset and the manual verification of these axioms [64]

    * For example, a URI defined as a class is used as a property or vice-a-versa.

    ** For example, attribute properties used between two resources and relation properties used with literal values.

    Conciseness

    Definition

    • refers to the minimization of redundancy of entities at the schema and the data level. Conciseness is classified into
    • (i) intensional conciseness (schema level) which refers to the case when the data does not contain redundant schema elements (properties and classes) and
    • (ii) extensional conciseness (data level) which refers to the case when the data does not contain redundant objects (instances).

    Example

    • Intensional conciseness: Particular flight, say A123, represented by two different properties in the same dataset, such as http://flights.org/airlineID and http://flights.org/name
    • This redundancy can ideally be solved by fusing the two properties and keeping only one unique identifier.
    • Extensional conciseness: Both these identifiers of the same flight have the same information associated with them in both the datasets, thus duplicating the information.

    Conciseness Metrics

    • CN1: intensional conciseness measured by
      (no. of unique properties orclasses of a dataset / total no. of properties/classes in a target schema) [47]
    • CN2: extensional conciseness measured by:
      • no. of unique instances of a dataset / total number of instances representations in the dataset [47]
      • 1 - (total no. of instances that violate the uniqueness rule / total no. of relevant instances) [20,37,42]
    • CN3: detection of unambiguous annotations using the formula:
      1 − (no. of ambiguous instances/no. of instances contained in the semantic metadata set)* [42,55]

    * Detection of an instance mapped back to more than one real world object leading to more than one interpretation.

    Completeness

    Definition

    • refers to the degree to which all required information is present in a particular dataset. In terms of LD, completeness comprises of the following aspects: 
    • (i) Schema completeness, the degree to which the classes and properties of an ontology are represented, thus can be called “ontology completeness”,
    • (ii) Property completeness, measure of the missing values for a specific property,
    • (iii) Population completeness is the percentage of all real-world objects of a particular type that are represented in the datasets and
    • (iv) Interlinking completeness, which has to be considered especially in LD, refers to the degree to which instances in the dataset are interlinked.

    Example

    • In our use case, the flight search engine contains complete information to include all the airports and airport codes such that it allows a user to find an optimal route from the start to the end destination (even in cases where there is no direct flight).
    • The user wants to travel from Santa Barbara to San Francisco. Since the flight search engine contains interlinks between these close airports, the user is able to locate a direct flight easily.


    Completeness Metrics

    • CM1: schema completeness:
      no. of classes and properties represented / total no. of classes and properties [20,47]
    • CM2: property completeness:
      • (i) no. of values represented for a specific property / total no. of values for a specific property [18,20]
      • (ii) exploiting statistical distributions of properties and types to characterize the property and then detect completeness [52]
    • CM3: population completeness:
      no. of real-world objects are represented / total no. of real-world objects [18,20,47]
    • CM4: interlinking completeness:
      • (i) no. of instances in the dataset that are interlinked
      • (ii) calculating percentage of mappable types in a datasets that have not yet been considered in the linksets when assuming an alignment among types [2]

    Contextual dimensions

    • Relevancy
    • Trustworthiness
    • Understandability
    • Timeliness

    Relevancy

    Definition

    • refers to the provision of information which is in accordance with the task at hand and important to the users’ query.

    Example

    • When a user is looking for flights between any two cities, only relevant information i.e. departure and arrival airports, start end end time, duration and cost per person should be provided.
    • Some datasets also contain much irrelevant information such as car rental, hotel booking, travel insurance etc.
    • Providing irrelevant data distracts service developers and potential users and also wastes network resources.
    • Instead, restricting the dataset to only flight related information simplifies application development and increases the likelihood to return only relevant results to users.

    Relevancy Metrics

    • R1: obtaining relevant data by:
      • (i) ranking (a numerical value similar to PageRank), which determines the centrality of RDF documents and statements [12]),
      • (ii) via crowdsourcing [1,64]
    • R2: measuring the coverage (i.e. number of entities described in a dataset) and level of detail (i.e. number of properties) in a dataset to ensure that the data retrieved is appropriate for the task at hand [19]

    Trustworthiness

    Definition

    • defined as the degree to which the information is accepted to be correct, true, real and credible.

    Example

    • In our use case, if the flight information is provided by trusted and well-known airlines, then the user is more likely to trust the information.
    • Generally, information about a product or service (e.g. a flight) can be trusted when it is directly published by the producer or service provided (e.g. the airline)
    • On the other hand, if a user retrieves information from a previously unknown source, she can decide whether to believe this information by checking whether the source is well-known or if it is contained in a list of trusted providers.

       

    Trustworthiness Metrics 1

    • T1: computing statement trust values based on:
      • provenance information which can be either unknown or a value in the interval [−1,1] where 1: absolute belief, −1: absolute disbelief and 0: lack of belief/disbelief [18,28]
      • opinion-based method, which use trust annotations made by several individuals [23,28]
      • provenance information and trust annotation in Semantic Web-based social-networks [24]
      • annotating triples with provenance data and usage of provenance history to evaluate the trustworthiness of facts [16]
    • T2: using annotations for data to encode two facets of information:
      • blacklists (indicates that the referent data is known to be harmful) [12] and
      • authority (a boolean value which uses the Linked Data principles to conservatively determine whether or not information can be trusted) [12]
    • T3: using trust ontologies that assigns trust values that can be transferred from known to unknown data [33] using:
      • content-based methods (from content or rules) and
      • metadata-based methods (based on reputation assignments, user ratings, and provenance, rather than the content itself)

    Trustworthiness Metrics 2

    • T4: computing trust values between two entities through a path by using:
      • a propagation algorithm based on statistical techniques [58] 
      • in case there are several paths, trust values from all paths are aggregated based on a weighting mechanism [58]
    • T5: computing trustworthiness of the information provider by:
      • construction of decision networks informed by provenance graphs [21]
      • checking whether the provider/contributor is contained in a list of trusted providers [9]
      • indicating the level of trust for the publisher on a scale of 1 − 9 [22,25]
    • T6: checking content trust25 based on associations (e.g. anything having a relationship to a resource such as author of the dataset) that transfer trust from content to resources [22]
    • T7: assignment of explicit trust ratings to the dataset by humans or analyzing external links or page ranks [47]

    Understandability

    Definition
    • refers to the ease with which data can be comprehended without ambiguity and be used by a human information consumer.

    Example

    • A user wants to search for flights between Boston and San Francisco using our flight search engine.
    • From the data related to Boston in the integrated dataset for the required flight, the following URIs and a label are retrieved:
      • http://rdf.freebase.com/ns/m.049jnng
      • http://red.freebase.com/ns/m.043j22x
      • "Boston Logan Airport"@en
    • For the first two items, no human-readable label is available, only URIs are displayed.
    • This does not represent anything meaningful tot he user besides perhaps that the information is from Freebase.
    • The third entity contains a human-readable label, which the user can easily understand.

    Understandability Metrics

    • U1: detection of human-readable labelling of classes, properties and entities as well as indication of metadata (e.g. name, description, website) of a dataset [18,19,32]
    • U2: detect whether the pattern of the URIs is provided [19]
    • U3: detect whether a regular expression that matches the URIs is present [19]
    • U4: detect whether examples of SPARQL queries are provided [19]
    • U5: checking whether a list of vocabularies used in the dataset is provided [19]
    • U6: checking the effectiveness and the efficiency of the usage of the mailing list and/or the message boards [19]

    Timeliness

    Definition

    • measures how up-to-date data is relative to a specific task. 

    Example

    • Consider a user checking the flight time-table.
    • Suppose that the result is a list of triples comprising the description of the resource A such as the connecting airports, time of departure and arrival, gate etc.
    • This flight time-table is updated every 10 minutes (volatility).
    • Assume there is a change of the flight departure time, specifically delay of one hour.
    • This information is updated in the system after 30 minutes. This renders the data out-of-date. 

    Timeliness Metrics

    • TI1: detecting freshness of datasets based on currency and volatility using the formula:
      max{0,1 − currency / volatility} [29], which gives a value in a continuous scale from 0 to 1, where a score of 1 implies that the data is timely and 0 means it is completely out-dated and thus unacceptable. In the formula, currency is the age of the data when delivered to the user [18,47,56] and volatility is the length of time the data remains valid [20]
    • TI2: detecting freshness of datasets based on their data source by measuring the distance between the last modified time of the data source and last modified time of the dataset [20]

    Representational dimensions

    • Representational-conciseness
    • Interoperability
    • Interpretability
    • Versatility

    Representational-conciseness

    Definition

    • refers to the representation of the data, which is compact and well formatted on the one hand and clear and complete on the other hand. 

    Example

    • Use of airport code for URI, for e.g. LEJ for Leipzig: http://airlines.org/LEJ
    • Such short representation of the URIs helps users share and memorize them easily.

    Representational-conciseness Metrics

    • RC1: detection of long URIs or those that contain query parameters [18,32]
    • RC2: detection of RDF primitives i.e. RDF reification, RDF containers and RDF collections [18,32]

    Interoperability

    Definition
    • degree to which the format and structure of the information conforms to previously returned information as well as data from other sources.

    Example

    • In different airline datasets, different notations for representing the geo-coordinates of a particular flight location are used.
    • One dataset uses WGS 84 geodetic system, another one uses the GeoRSS point system.
    • This makes querying the integrated dataset difficult, as it requires users and the machines to understand the heterogeneous schema.
    • Also, consumers are faced with the problem of how the data can be interpreted and displayed.

      Interoperability Metrics

      • IO1: detection of whether existing terms from all relevant vocabularies for that particular domain have been reused [32]
      • IO2: usage of relevant vocabularies for that particular domain [19]

      Interpretability

      Definition  

      • refers to technical aspects of the data, that is, whether information is represented using an appropriate notation and whether the machine is able to process the data.

      Example: 

      • Consider our flight search engine and a user looking for a flight from Mumbai to Boston with a two-days stop-over in Berlin.
      • The user specifies the dates correctly. 
      • However, since the flights are operated by different airlines, they have a different way of representing the date: first as dd/mm/yyyy and then as mm/dd/yy.
      • Thus, the machine is unable to correctly interpret the data and cannot provide an optimal result for this query.
      • This lack of consensus in the format of the data hinders the ability of the machine to interpret the data and thus provide the appropriate flights.

      Interpretability Metrics

      • IN1: identifying objects and terms used to define these objects with globally unique identifiers [18]
      • IN2: detecting the use of appropriate language, symbols, units, datatypes and clear definitions [19]
      • IN3: detection of invalid usage of undefined classes and properties (i.e. those without any formal definition) [31]
      • IN4: detecting the use of blank nodes* [32]

      * Blank nodes are not recommended since they cannot be externally referenced.

      Versatility

      Definition

      • refers to the availability of the data in different representations and in an internationalized way.

      Example

      • Consider a user who does not understand English but only Chinese and wants to use our flight search engine.
      • In order to cater to the needs of such a user, the dataset should provide labels and other language-dependent information in Chinese so that she can understand and thus use the service. 

      Versatility Metrics

      • V1: checking whether data is available in different serialization formats [19]
      • V2: checking whether data is available in different languages [19]

      Data Quality Assessment Tools

      Comparison of Tools

      Comparison of quality assessment tools according to several attributes s

      References 1

      • Acosta et al., 2013 [1] Crowdsourcing Linked Data Quality Assessment
      • Albertoni et al., 2013 [2] Assessing Linkset Quality for Complementing Third-Party Datasets
      • Bizer et al., 2009 [9] Quality-driven information filtering using the WIQA policy framework
      • Böhm et al., 2010 [11] Profiling linked open data with ProLOD
      • Bonatti et al., 2011 [12] Robust and scalable linked data reasoning incorporating provenance and trust annotations
      • Chen et al., 2010 [15] Hypothesis generation and data quality assessment through association mining
      • Ciancaglini et al., 2012 [16] Tracing where and who provenance in Linked Data: a calculus
      • Feeney et al., 2014 [18] Improving curated web-data quality with structured harvesting and assessment
      • Flemming, 2010 [19] Assessing the quality of a Linked Data source
      • Fürber et al.,2011 [20] SWIQA - a semantic web information quality assessment framework
      • Gamble et al., 2011 [21] Quality, Trust and Utility of Scientific Data on the Web: Towards a Joint Model
      • Gil et al., 2007 [22] Towards content trust of Web resources
      • Gil et al., 2002 [23] Trusting Information Sources One Citizen at a Time
      • Golbeck, 2006 [24] Using Trust and Provenance for Content Filtering on the Semantic Web
      • Golbeck et al., 2003 [25] Trust Networks on the Semantic Web

      References 2

      • Guéret et al., 2012 [26] Assessing Linked Data Mappings Using Network Measures
      • Hartig, 2008 [28] Trustworthiness of Data on the Web
      • Hogan et al.,2010 [31] Weaving the Pedantic Web
      • Hogan et al., 2012 [32] An empirical survey of Linked Data conformance
      • Jacobi et al., 2011 [33] Rule-Based Trust Assessment on the Semantic Web
      • Kontokostas et al., 2014 [37] Test-driven Evaluation of Linked Data Quality
      • Lei et al., 2007 [42] A framework for evaluating semantic metadata
      • Mendes et al., 2012 [47] Sieve: Linked Data Quality Assessment and Fusion
      • Mostafavi et al., 2004 [50] An ontology-based method for quality assessment of spatial data bases
      • Paulheim et al., 2014 [52] Improving the Quality of Linked Data Using Statistical Distributions
      • Ruckhaus et al., 2014 [55] Analyzing Linked Data Quality with LiQuate
      • Rula et al., 2012 [56] Capturing the Age of Linked Open Data: Towards a Dataset-independent Framework
      • Shekarpour et al., 2010 [58] Modeling and evaluation of trust with an extension in semantic web
      • Wienand et al., 2014 [63] Detecting Incorrect Numerical Data in DBpedia
      • Zaveri et al., 2013 [64] User-driven Quality evaluation of DBpedia

      NLP Interchange Format

      DBpedia

      • DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web.
      • DBpedia allows you to ask sophisticated queries against Wikipedia, and to link other data sets on the Web to Wikipedia data.
      • Semi-structured Wiki markup -> structured information
      • Common goal with WikiData but, different approach
       
      DBpedia is a community project, please see http://dbpedia.org for a full list of contributors

      Wikipedia Limitations

      Simple Questions – hard to answer with Wikipedia:

      • What have Innsbruck and Leipzig in common?
      • Who are mayors of central European towns elevated more than 1000m?
      • Which movies are starring both Brad Pitt and Angelina Jolie?
      • All soccer players, who played as goalkeeper for a club that has a stadium with more than 40.000 seats and who are born in a country with more than 10 million inhabitants

                                               

      (More on this later)

      Structure in Wikipedia

      • Title
      • Abstract
      • Infoboxes
      • Geo-coordinates
      • Categories
      • Images
      • Links
        • other language versions
        • other Wikipedia pages
        • To the Web
        • Redirects
        • Disambiguation
      • ...

      DBpedia Information Extraction Framework

      DBpedia Information Extraction Framework (DIEF)

      • Started in 2007
      • Hosted on Sourceforge and Github
      • Initially written in PHP but fully re-writtten Written in Scala & Java
      • Around 40 Contributors
      • See https://www.ohloh.net/p/dbpedia for detailed overview

      Can potentially be adapted to other MediaWikis

      DIEF - Overview

      DIEF - Input / Parsing

      Input

      • Wikipedia pages are read from an external source.
      • Pages can either be read from
        • a Wikipedia / Mediawiki dump
        • a MediaWiki installation using the MediaWiki API.
      Parsing
      • Each Wikipedia page is parsed by the wiki parser.
      • The Wikipedia page is transformed into an Abstract Syntax Tree.

      DIEF - Extraction/ Serialization

      Extractors

      • DBpedia offers extractors for many different purposes, for instance, to extract labels, abstracts or geographical coordinates.
      • The Abstract Syntax Tree of each Wikipedia page is forwarded to the extractors.
      • Each extractor consumes an Abstract Syntax Tree and yields a graph of RDF statements.

      Serialization

      • The collected RDF statements are written to a sink.
      • Different formats, such as N-Triples, Quads, Turtle, JSON are supported.

      DIEF - Extractors

      Feature Extractors
      • specialized in extracting a single feature from an article
      • i.e. abstracts, labels, coordinates, categories, templates, ...
      • db:Berlin rdfs:label "Berlin" .
        db:Oliver Twist dc:subject db:Category:English novels .

      Infobox Extractors

      • Raw infobox extractor
      • Mapping-based Infobox extractor

      DIEF - Raw Infobox extractor

      WikiText syntax
      {{Infobox Korean settlement
      | title       = Busan Metropolitan City
      | hangul      = 부산 광역시
      ...
      | area_km2        = 763.46
      | pop         = 3635389
      | region      = [[Yeongnam]]
      }}

      RDF serialization
      dbp:Busan   dbp:title    ″Busan Metropolitan City″
      dbp:Busan   dbp:hangul   ″부산 광역시″@Hang
      dbp:Busan   dbp:area_km2 ″763.46“^xsd:float
      dbp:Busan   dbp:pop      ″3635389“^xsd:int
      dbp:Busan   dbp:region   dbp:Yeongnam

      DIEF - Raw Infobox extractor / Diversity

      DIEF - Raw Infobox extractor / Diversity

      DIEF - Raw Infobox extractor

      • Straightforward approach & big coverage
      • Inconsistency in property naming
        • Different infoboxes might have different naming for the same meaning (i.e. born vs birth_date vs birthDate)
      • Inconsistency in property datatype
        • Datatypes are calculated per instance in a greedy manner

      DIEF -Mapping-Based Infobox extractor

      Correct Semantics:

      • Combine what belongs together (birth_place, birthplace)
      • Separate what is different (bornIn, birthplace)
      • Big boost for Precision / Recall
      Mappings Wiki:
      • http://mappings.dbpedia.org/
      • Everybody can contribute to new mappings or improve existing ones
      • ~170 editors 

      DIEF -Mapping-Based Infobox extractor

      URI/IRI schemes

      http://{lang.}dbpedia.org is the main domain

      For every article there exists a DBpedia resource in the form:
      ttp://{lang.}dbpedia.org/resource/{ArticleName}

      Properties from the raw infobox extractor use the http://{lang.}dbpedia.org/property/ namespace

      Ontology is global for all languages and under
      http://dbpedia.org/ontology/
      namespace

      Note: that for English language no language code is used

      • http://dbpedia.org as main domain
      • http://dbpedia.org/resource/{title} for articles
      • http://dbpedia.org/property/{title} for properties

      RDF Dumps

      DBpedia has 2 extraction modes:

      • DBpedia Live  where extraction results update directly the SPARQL endpoint (more on that later)
      • Dump-Based extraction results are serialized as RDF. The results are later loaded into a triple store

      DBpedia Dumps

      SPARQL Endpoint

      Question answering(1/2)

      Back to our Wikipedia questions:

      • What have Innsbruck and Leipzig in common?
      • Who are mayors of central European towns elevated more than 1000m?
      • Which movies are starring both Brad Pitt and Angelina Jolie?
      • All soccer players, who played as goalkeeper for a club that has a stadium with more than 40.000 seats and who are born in a country with more than 10 million inhabitants

      Using the data extracted from Wikipedia and the public SPARQL endpoint DBpedia can answer these questions.

      Question answering(2/2)

      All soccer players, who played as goalkeeper for a club that has a stadium with more than 40.000 seats and who are born in a country with more than 10 million inhabitants

      DBpedia Live

      • DBpedia dumps are generated on a bi-annual basis
      • Wikipedia has around 100,000 – 150,000 page edits per day 
      • DBpedia Live pulls page updates in real-time and extraction results update the triple store
      • In practice, a 5 minute update delay increases performance by 15%

      Links

      • SPARQL Endpoint: http://live.dbpedia.org/sparql
      • Documentation: http://wiki.dbpedia.org/DBpediaLive
      • Statistics: http://live.dbpedia.org/statistics/

      DBpedia Live - Overview

      DBpedia Live - Components

      • Local Wikipedia mirror (to be able to exceed access limit)
      • Mapping Wiki as input 
        • A change in a mapping may affect several pages
      • Extraction Manager: Handles the Wikipedia & mapping feeds and updated the triple-store
        • Unmodified pages over a month are also updated
      • Sync Tool: Publishes updates to allow up-to-date DBpedia Live mirrors
      • Data is isolated in separate graphs

      DBpedia Internationalization (I18n)

      • DBpedia Internationalization Committee founded:
        • http://wiki.dbpedia.org/Internationalization
      • Available DBpedia language editions in:
        • Korean, Greek, German, Polish, Russian, Dutch, Portuguese, Spanish, Italian, Japanese, French
        • Use the corresponding Wikipedia language edition for input
      • Mappings available for 23 languages

      DBpedia I18n - Overview

      Annotation / Entity Disambiguation

      Named entity recognition and disambiguation

      Tools such as: DBpedia Spotlight, AlchemyAPI, Semantic API, Open Calais, Zemanta and Apache Stanbol
        

      Question Answering

      Faceted Browsers

      Search and Querying


      Digital Libraries & Archives

      • Virtual International Authority Files (VIAF) project as Linked Data
      • DBpedia can also provide:
        • Context information for bibliographic and archive records (e.g. an author’s demographics, a film’s homepage, an image etc.)
        • Stable and curated identifiers for linking. 
        • The broad range of Wikipedia topics can form the basis for a thesaurus for subject indexing. 

      DBpedia mobile

      DBpedia Mobile is a location-centric DBpedia client application for mobile devices consisting of a map view, the Marbles Linked Data Browser and a GPS-enabled launcher application.


      DBpedia Wiktionary

      • Wiktionary is a Wikimedia project: http://wiktionary.org
        • 171 languages, 3M words for English.
      • Extracted Using the DBpedia Information Extraction Framework
      • Easily configurable for every Wiktionary language edition

      Other Applications

      See http://wiki.dbpedia.org/Applications  for a complete list

      DBpedia Timeline

      Year Month Event
      2006 Nov Start of infobox extraction from Wikipedia
      2007 Mar DBpedia 1.0 release
        Jun ESWC DBpedia article
        Nov DBpedia 2.0 release
        Nov ISWC DBpedia article
        Dec DBpedia 3.0 release candidate
      2008 Feb DBpedia 3.0 release
        Aug DBpedia 3.1 release
        Nov

      DBpedia 3.2 release

      2009 Jan

      DBpedia 3.3 release

        Sep

      JWS DBpedia article

      2010 Feb

      Information extraction framework in Scala

         

      Mappings Wiki release

        Mar DBpedia 3.4 release
        Apr DBpedia 3.5 release
        May Start of DBpedia Internationalization effort
      2011 Feb DBpedia Spotlight release
        Mar DBpedia 3.6 release
        Jul DBpedia Live release
        Sep DBpedia 3.7 release (with I18n datasets)
      2012 Aug DBpedia 3.8 release
        Sep Publication of DBpedia Internationalization article
      2013 Sep DBpedia 3.9 release

      Static releases size

      Compressed size of DBpedia releases over time:

      (*) From 3.7 i18n onwards DBpedia offers additional formats for download: NT, NQuads, Turtle Turtle-Quads

      DBpedia Live changesets

       (*) Data represent compressed RDF in N-Triples format
      (**) From Nov 2013 DBpedia Live was restarted with a clean database
      2012
      2013
      Additions
      Deletions
      Additions
      Deletions
      Jan
      -
      -
      2.84 Gb
      0.64 Gb
      Feb
      -
      -
      2.50 Gb
      0.44 Gb
      Mar
      -
      -
      2.45 Gb
      0.60 Gb
      Apr
      0.96 Gb
      0.12 Gb
      2.18 Gb
      0.48 Gb
      May
      0.68 Gb
      0.21 Gb
      4.81 Gb
      0.64 Gb
      Jun
      0.80 Gb
      0.25 Gb
      3.56 Gb
      0.42 Gb
      Jul
      0.52 Gb
      0.14 Gb
      3.67 Gb
      0.33 Gb
      Aug
      0.48 Gb
      0.39 Gb
      2.25 Gb
      0.27 Gb
      Sep
      0.61 Gb
      0.43 Gb
      -
      -
      Oct
      0.91 Gb
      0.42 Gb
      -
      -
      Nov
      1.17 Gb
      0.46 Gb
      5.54 Gb
      0.23 Gb
      Dec
      1.02 Gb
      0.39 Gb

      Open Government Data

       

      Distributed Social Semantic Networking

       

      Catalogus Professorum Lipsiensium

       
      • In 2009 University of Leipzig celebrated 600th anniversary
      • One of the oldest Universities in Germany
      • Prosopographical catalogue containing facts about
        • 1.300 professors
        • 10.000 associated periods of life
        • 400 institutions
      • The resulting OWL knowledge base consists of
        • 200.000 triples in core CPL
        • 173,000 manually added

      Prosopographical research

      • Analysis of common characteristics of historical groups
      • Statistically relevant quantities of individual biographies
      • Research use cases:
        • Historical social network analysis
        • Academic self-complementation analysis
        • Relationship between religion and university

      Archival work

       
      Figure: Personal file of Levin Schücking from university archive

      Publication of historical catalogues

       
      Figure: Page taken from Dresden professors catalogue book

      Motivation

      • First Version of the catalogue: single database table reached the limitation of 255 columns
      • High effort for interchanging and syncing the table
      • Requirements for the new database:
        • collaboration during collecting data
        • online publication
        • limited resources

      Paradigm shift in historical research

      • Research:
        • from individual to collaborative work
      • Publication:
        • from book to instant web publication

      Objectives

      • Wiki based adding, editing and structuring of data
      • Definition of a vocabulary
      • Instant publication on the web
      • Exploring, accessing and interlinking information 

      Public website

       

      Research representation using OntoWiki

       

      Visual Query Builder

       

      Relationship Finder

       

      Linked Data Browser

       
      Figure: Tabulator Tool

      Architectural overview about the project platforms

       

      Catalogus Professorum Model (CPM)

      Main Concepts


      Catalogus Professorum Model (CPM)

      Concept Models


      Catalogus Professorum Model (CPM)

      Interlinked Concepts


      Collaboration using a semantic data wiki

      • Intuitive authoring of semantic content
      • Collaborative and spatial distributed
      • Keeping track of changes
      • Allowing comments and discussions
      • Instant visual representation
      • Different views on instance data

      The semantic data wiki OntoWiki

       

      Methodology

      Pragmatic approach of CPL engineering

      • Starting with legacy data
      • No use of FOAF (Friend of a Friend Vocabulary)
      • Focus on concrete representation issues
      • ontology / application co-design process

         

      Engineering co-design methodology

      Before CPL

      1. Information analysis
      2. Initialization (September 2008)
      3. Wiki-based knowledge acquisition and ontology refinement
      4. Publication of the catalogue (May 2009)
      5. Interlinking other datasets
      6. Alignment to other ontologies (Further work)

      Statistics about various CPL usage indicators

       

      Lessons learned

      • Projects involving people with very different backgrounds and
        with very limited resources requires to establish a working
        knowledge base / application co-design
      • Timely visibility of the knowledge base for a wider community,
        additional refinements are triggered by the interaction with
        the community.
      • Motivation boast due to the early public availability
      • Growing added value for domain experts is the availability of
        background knowledge on the linked Data Web
      • CPL is one of the first prosopographical knowledge bases on
        the data web
         

      Conclusions

      • Demonstration of a successful application of
        • semantic knowledge representation techniques
        • an agile collaboration methodology for the humanities
      • Completely new research opportunities for Historians
      • Paradigm shift in historic research
        • from individual centred research aiming to solve a specific
          research task towards collaborative research

      The Europeana Data Model for Cultural Heritage

      1. What is the issue?
      2. What is the solution?
      3. How was EDM developed?
      4. How will Europeana resource discovery change with EDM?
      5. How can EDM contribute to enriching my data?
      6. How does EDM contribute to the Web of Data or Semantic Web?
      7. How do I deliver EDM?
      8. Where can I find more information?

      Europeana Data Model Issues

      • Europe's cultural heritage objects are digitised by a wide range of data providers from the library, museum, archive and audio-visual sectors
      • All use providers are using different metadata standards
      • Data needs to appear in a meaningful way in a cross-cultural, multilingual context
      •  The Linked Open Data environment lacks authoritative data from the cultural heritage community to contribute to the development of new knowledge
       

      About Europeana Data Model

       For a more technical description of the EDM model, see: