IBM 1620 data processing machine, 1962

Who is this?

The Web

The Web accompanies the transition from an industrial to an information society and provides the infrastructure for a new quality of information handling regarding acquisition as well as provisioning

  • high availability
  • high relevance
  • low cost

 

The Web penetrates society

  • Social contacts (social networking platforms, blogging, ...)
  • Economics (buying, selling, advertising, ...)
  • Administration (eGovernment)
  • Work life (information gathering and sharing)
  • Recreation (games, role play, creativity, ...)
  • Education (eLearning, Web as information system, ...)

The current Web

 Immensely successful.

  • Huge amounts of information and data.
  • Syntax standards for transfer of structured data.
  • Machine-processable, human-readable documents.
BUT:
  • Content/knowledge cannot be accessed by machines.
  • Meaning (semantics) of transferred data is not accessible.

Limitations of the Web

Too much information with too little structure and made for human consumption
  • Content search is very simplistic
  • →  future requires better methods
Web content is heterogeneous
  • in terms of content
  • in terms of structure
  • in terms of character encoding
  • → future requires intelligent information integration
Humans can derive new (implicit) information from given pieces of information but on the current Web we can only deal with syntax
  • → requires automated reasoning techniques

What Google does not find

There are many information needs current search engines can not satisfy:

  • Apartments for rent close to well rated Thai restaurants
  • Bi-lingual English-German child care in Berlin reachable in 15 minutes from my place of work
  • Kid-friendly holiday destinations with culture and sports activities
  • Researchers working in south-east Asia on information retrieval topics
  • ERP service providers with offices in Vienna and Berlin
  • ...

We have subconsciously learned not to ask search engines such questions.

In principle, all the required knowledge is on the Web – most of it even in machine-readable form. However, without automated data integration, processing (and reasoning) we cannot obtain a useful answer.

What's the problem with the Web

  • inability to integrate and fuse information from different sources
  • there is lack of comprehensive background knowledge to interpret information found on the Web
  • current Web search is restricted to text in a certain language - there are many “smaller” languages with much less information available than in English

Basic ingredients for the Semantic Web

  • Open Standards for describing information on the Web
  • Methods for obtaining further information from such descriptions

 
We’ll talk about these matters in this course.

Data Models, Access & Integration

Data Integration Enterprise Information Integration
sets of heterogeneous data sources appear as a single, homogeneous data source
Data Warehousing
  • Based on extract, transform load (ETL)
  • Global-As-View (GAV)
Research
  • Mediators
  • Ontology-based
  • P2P
  • Web service-based
Data Web
  • URIs as entity identifiers
  • HTTP as data access protocol
  • Local-As-View (LAV)
Data Access Object relational mappings (ORM)
  • NeXT’s EOF / WebObjects
  • ADO.NET Entity Framework
  • Hibernate
Procedural APIs
  • ODBC
  • JDBC
Query Languages
  • Datalog, SQL
  • XPath/XQuery
  • SPARQL
Linked Data
  • de-referenceable URIs
  • RDF serialization
Data Models   RDBMS
  • organize data in relations, rows, cells
  • Oracle, DB2, MS-SQL
     

LOD Cloud May 2007

LOD = Linked Open Data

LOD Cloud October 2007

LOD Cloud February 2008

LOD Cloud September 2008

LOD Cloud March 2009

Colours indicate different domains; e.g.: orange = social networks

LOD Cloud September 2010

LOD Cloud September 2011

LOD Cloud August 2014

First update after 3 years – what do you think were the reasons?

LOD Cloud February 2017

LOD cloud February 2017

The Web of Data

 
  • >70 bilion facts
  • covering many different domains (life-sciences, geo, user generated content, government, bibiographic, ...)

Map to the Semantic Web

The Semantic Data Web Stack

User Interface & Applications Trust Crypto Proof Unifying Logic Rules: RIF Ontology: OWL Query: SPARQL RDF-Schema Data Interchange: RDF XML URI Unicode

… also known as “layer cake”

How Mark A. Greenwood realised the Semantic Web:

URIs and Unicode

  • URI = Uniform Resource Identifier
  • Used to create globally unique names for resources
  • Every object with clear identity can be a resource
    • Books, places, organizations ...
  • In the books domain the ISBN serves the same purpose
  • IRIs: Unicode-aware extension of URIs (I = Internationalized)

Resource Description Framework – RDF

 Information is represented in RDF in triples (also called statements, facts):

  • Modeled on linguistic categoriesbut not always consistent
  • Allowed assignments:
    • Subject: URI or blank node
    • Predicate: URI (a.k.a. property)
    • Object: URI, blank node or literal
  • Node and edge labels should be unambiguous, so that the original graph is reconstructable from a list of triples

 

RDF Schema

Not all triples make sense:

Cinema  AlbertEinstein  2012

How can we constrain the use of RDF? 

RDFS (S = “Schema”) allows to define classes, properties and restrict their use.

SPARQL – Query Language for RDF

SELECT * WHERE { jwebsp:John  foaf:knows  ?friend }

Web Ontology Language – OWL

  • OWL: acronym for Web Ontology Language, more easily pronounced than WOL
  • family of languages for authoring ontologies
  • since 2004, OWL 2.0 since 2009     
  • Semantic fragment of FOL

Features

  • Instantiation of classes by individuals
  • Concept hierarchies (taxonomies, inheritance): classes, terms
  • Binary relations between individuals: Properties, Roles
  • Properties of relations (e.g., range, transitive)
  • Data types (e.g. Numbers): concrete domains
  • Logical means expression
  • Clear semantics!

RDFa Content Editor – RDFaCE

supports the automatic semantic annotation of texts

http://rdface.aksw.org/

Literature

  • Pascal Hitzler, Markus Krötzsch, Sebastian Rudolph: Foundations of Semantic Web Technologies, Chapman & Hall/CRC, 2009, 455 pages, hardcover, ISBN: 9781420090505, http://www.semantic-web-book.org
  • Amit Sheth, Krishnaprasad Thirunarayan:  Semantics Empowered Web 3.0: Managing Enterprise, Social, Sensor, and Cloud-based Data and Services for Advanced Applications (Synthesis Lectures on Data Management),  Morgan & Claypool Publishers (December 19, 2012), ISBN: 1608457168
  • Tom Heath, Christian Bizer:  Linked Data (Synthesis Lectures on the Semantic Web: Theory and Technology), Morgan & Claypool Publishers; 1 edition (February 20, 2011), ISBN: 1608454304. http://linkeddatabook.com

Questions

All the corresponding questions for the Introductions are covered in the Questions part of the Deck.