IBM 1620 data processing machine, 1962

Who is this?

Sources (originally from slide slide Who is this?)

The Web

The Web accompanies the transition from an industrial to an information society and provides the infrastructure for a new quality of information handling regarding acquisition as well as provisioning

  • high availability
  • high relevance
  • low cost


The Web penetrates society

  • Social contacts (social networking platforms, blogging, ...)
  • Economics (buying, selling, advertising, ...)
  • Administration (eGovernment)
  • Work life (information gathering and sharing)
  • Recreation (games, role play, creativity, ...)
  • Education (eLearning, Web as information system, ...)

The current Web

 Immensely successful.

  • Huge amounts of information and data.
  • Syntax standards for transfer of structured data.
  • Machine-processable, human-readable documents.
  • Content/knowledge cannot be accessed by machines.
  • Meaning (semantics) of transferred data is not accessible.

Limitations of the Web

Too much information with too little structure and made for human consumption
  • Content search is very simplistic
  • →  future requires better methods
Web content is heterogeneous
  • in terms of content
  • in terms of structure
  • in terms of character encoding
  • → future requires intelligent information integration
Humans can derive new (implicit) information from given pieces of information but on the current Web we can only deal with syntax
  • → requires automated reasoning techniques

Pascal Hitzler ( (originally from slide slide Limitations of the Web)

What Google does not find

There are many information needs current search engines can not satisfy:

  • Apartments for rent close to well rated Thai restaurants
  • Bi-lingual English-German child care in Berlin reachable in 15 minutes from my place of work
  • Kid-friendly holiday destinations with culture and sports activities
  • Researchers working in south-east Asia on information retrieval topics
  • ERP service providers with offices in Vienna and Berlin
  • ...

We have subconsciously learned not to ask search engines such questions.

In principle, all the required knowledge is on the Web – most of it even in machine-readable form. However, without automated data integration, processing (and reasoning) we cannot obtain a useful answer.

What's the problem with the Web

  • inability to integrate and fuse information from different sources
  • there is lack of comprehensive background knowledge to interpret information found on the Web
  • current Web search is restricted to text in a certain language - there are many “smaller” languages with much less information available than in English

Basic ingredients for the Semantic Web

  • Open Standards for describing information on the Web
  • Methods for obtaining further information from such descriptions

We’ll talk about these matters in this course.

Data Models, Access & Integration

Data Integration Enterprise Information Integration
sets of heterogeneous data sources appear as a single, homogeneous data source
Data Warehousing
  • Based on extract, transform load (ETL)
  • Global-As-View (GAV)
  • Mediators
  • Ontology-based
  • P2P
  • Web service-based
Data Web
  • URIs as entity identifiers
  • HTTP as data access protocol
  • Local-As-View (LAV)
Data Access Object relational mappings (ORM)
  • NeXT’s EOF / WebObjects
  • ADO.NET Entity Framework
  • Hibernate
Procedural APIs
  • ODBC
  • JDBC
Query Languages
  • Datalog, SQL
  • XPath/XQuery
Linked Data
  • de-referenceable URIs
  • RDF serialization
Data Models   RDBMS
  • organize data in relations, rows, cells
  • Oracle, DB2, MS-SQL

LOD Cloud May 2007

LOD = Linked Open Data

LOD Cloud October 2007

LOD Cloud February 2008

LOD Cloud September 2008

LOD Cloud March 2009

Colours indicate different domains; e.g.: orange = social networks

LOD Cloud September 2010

LOD Cloud September 2011

LOD Cloud August 2014

First update after 3 years – what do you think were the reasons?

LOD Cloud February 2017

LOD cloud February 2017

The Web of Data

  • >70 bilion facts
  • covering many different domains (life-sciences, geo, user generated content, government, bibiographic, ...)

Sources (originally from slide slide The Web of Data)

Map to the Semantic Web

Sources (originally from slide slide Map to the Semantic Web)

The Semantic Data Web Stack

User Interface & Applications Trust Crypto Proof Unifying Logic Rules: RIF Ontology: OWL Query: SPARQL RDF-Schema Data Interchange: RDF XML URI Unicode

… also known as “layer cake”

How Mark A. Greenwood realised the Semantic Web:

URIs and Unicode

  • URI = Uniform Resource Identifier
  • Used to create globally unique names for resources
  • Every object with clear identity can be a resource
    • Books, places, organizations ...
  • In the books domain the ISBN serves the same purpose
  • IRIs: Unicode-aware extension of URIs (I = Internationalized)

Resource Description Framework – RDF

 Information is represented in RDF in triples (also called statements, facts):

  • Modeled on linguistic categoriesbut not always consistent
  • Allowed assignments:
    • Subject: URI or blank node
    • Predicate: URI (a.k.a. property)
    • Object: URI, blank node or literal
  • Node and edge labels should be unambiguous, so that the original graph is reconstructable from a list of triples


RDF Schema

Not all triples make sense:

Cinema  AlbertEinstein  2012

How can we constrain the use of RDF? 

RDFS (S = “Schema”) allows to define classes, properties and restrict their use.

SPARQL – Query Language for RDF

SELECT * WHERE { jwebsp:John  foaf:knows  ?friend }

Web Ontology Language – OWL

  • OWL: acronym for Web Ontology Language, more easily pronounced than WOL
  • family of languages for authoring ontologies
  • since 2004, OWL 2.0 since 2009     
  • Semantic fragment of FOL


  • Instantiation of classes by individuals
  • Concept hierarchies (taxonomies, inheritance): classes, terms
  • Binary relations between individuals: Properties, Roles
  • Properties of relations (e.g., range, transitive)
  • Data types (e.g. Numbers): concrete domains
  • Logical means expression
  • Clear semantics!

RDFa Content Editor – RDFaCE

supports the automatic semantic annotation of texts


  • Pascal Hitzler, Markus Krötzsch, Sebastian Rudolph: Foundations of Semantic Web Technologies, Chapman & Hall/CRC, 2009, 455 pages, hardcover, ISBN: 9781420090505,
  • Amit Sheth, Krishnaprasad Thirunarayan:  Semantics Empowered Web 3.0: Managing Enterprise, Social, Sensor, and Cloud-based Data and Services for Advanced Applications (Synthesis Lectures on Data Management),  Morgan & Claypool Publishers (December 19, 2012), ISBN: 1608457168
  • Tom Heath, Christian Bizer:  Linked Data (Synthesis Lectures on the Semantic Web: Theory and Technology), Morgan & Claypool Publishers; 1 edition (February 20, 2011), ISBN: 1608454304.

Sources (originally from slide slide Literature)


All the corresponding questions for the Introductions are covered in the Questions part of the Deck.

Creator: soeren (TIB)

ali1k (VU Amsterdam), clange, nagikasa, mirette, RohanAsmat

Licensed under the Creative Commons
Attribution ShareAlike CC-BY-SA license

This deck was created using SlideWiki.