The Web today


Agenda

  • Motivation
  • Web Science
  • Web Evolution
    • Web 1.0 - Traditional Web
    • Web 2.0
      • Major breakthroughs of Web 2.0
    • Web 3.0 - Semantic Web
  • What Web Science could be
    • The computer science of 21st century
  • Summary
  • References

Motivation

  • “[…] As the Web has grown in complexity and the number and types of interactions that take place have ballooned, it remains the case that we know more about some complex natural phenomena (the obvious example is the human genome) than we do about this particular engineered one.”
  • A new science that studies the complex phenomena called Web is needed!!

Web Science

  • A new science that focuses on how huge decentralized Web systems work. 
  • “The Web isn’t about what you can do with computers. It’s people and, yes, they are connected by computers. But computer science, as the study of what happens in a computer, doesn’t tell you about what happens on the Web.” 
    Tim Berners-Lee
  • “A new field of science that involves a multi-disciplinary study and inquiry for the understanding of the Web and its relationships to us”
    Bebo White, SLAC, Stanford University
  • Shift from how a single computer works to how huge decentralized Web systems work

Endorsements for Web Science

  • “Web science represents a pretty big next step in the evolution of information. This kind of research likely to have a lot of influence on the next generation of researchers, scientists and, most importantly, the next generation of entrepreneurs who will build new companies from this.” 
    Eric E. Schmidt, CEO Google
  • “Web science research is a prerequisite to designing and building the kinds of complex, human-oriented systems that we are after in services science.” 
    Irving Wladawsky-Berger, IBM

Web Science – multi-disciplinary approach

The goals of Web Science

  • To understand what the Web is
  • To engineer the Web’s future and providing infrastructure
  • To ensure the Web’s social benefit

Scientific method

  • Natural Sciences such as physics, chemistry, etc. are analytic disciplines that aim to find laws that generate or explain observed phenomena
  • Computer Science on the other hand is synthetic. It is about creating formalisms and algorithms in order to support particular desired behaviour.
  • Web science scientific method has to be a combination of these two paradigms

What Could Scientific Theories for the Web Look Like?

  • Some simple examples:
    • Every page on the Web can be reached by following less than 10 links
    • The average number of words per search query is greater than 3
    • Web page download times follow a lognormal distribution function (Huberman)
    • The Web is a “scale-free” graph
  • Can these statements be easily validated? Are they good theories? What constitutes good theories about the Web?

Food for thought

           

  • What are the analogies for Web Science and Design? Is our understanding of the Web like that of 1800 electricity?

Evolution of the Web

Introduction

  • Web evolution
    • Web 1.0 - Traditional Web
    • Web 2.0 
    • Web 3.0 - Semantic Web
  • Future steps to realize Web science
    • Large scale reasoning
    • Rethinking Computer Science for the 21st century

Web 1.0

  • The World Wide Web ("WWW" or simply the "Web") is a system of interlinked, hypertext documents that runs over the Internet. Witha Web browser, a user views Web pages that may contain text, images, and other multimedia and navigate between them using hyperlinks.
  • The Web was created around 1990 by Tim Berners-Lee working at CERN in Geneva, Switzerland. 
  • A distributed document delivery system implemented using application-level protocols on the Internet
  • A tool for collaborative writing and community building
  • A framework of protocols that support e-commerce
  • A network of co-operating computers interoperating using HTTP and related protocols to form a ‘subnet’ of the Internet
  • A large, cyclical, directed graph made up of Web pages and links

The breakthrough

WWW components

  • Structural Components
    • Clients/browsers – to dominant implementations
    • Servers – run on sophisticated hardware
    • Caches – many interesting implementations
    • Internet – the global infrastructure which facilitates data transfer
  • Language and Protocol Components
    • Uniform Resource Identifiers (URIs)
    • Hyper Text Transfer Protocol (HTTP)
    • Hyper Text Markup Language (HTML)

Uniform Resource Identifiers (URIs)

  • Uniform Resource Identifiers (URIs) are used to name/identify resources on the Web
  • URIs are pointers to resources to which request methods can be applied to generate potentially different responses
  • Resource can reside anywhere on the Internet
  • Most popular form of a URI is the Uniform Resource Locator (URL)

Hypertext Transfer Protocol (HTTP)

  • Protocol for client/server communication
    • The heart of the Web
    • Very simple request/response protocol
    • Client sends request message, server replies with response message
    • Provide a way to publish and retrieve HTML pages
    • Stateless
    • Relies on URI naming mechanism

HTTP Request Messages

  • GET – retrieve document specified by URL
  • PUT – store specified document under given URL
  • HEAD – retrieve info. about document specified by URL
  • OPTIONS – retrieve information about available options
  • POST – give information (eg. annotation) to the server
  • DELETE – remove document specified by URL
  • TRACE – loopback request message
  • CONNECT – for use by caches

HTML

  • Hyper-Text Markup Language
    • A subset of Standardized General Markup Language (SGML)
    • Facilitates a hyper-media environment
  • Documents use elements to “mark up” or identify sections of text for different purposes or display characteristics
  • Mark up elements are not seen by the user when page is displayed
  • Documents are rendered by browsers
  • HTML markup consists of several types of entities, including: elements, attributes, data types and character references 
    • DTD (Document Type Definition)
    • Element (such as document (…), head elements () 
    • Attribute: HTML
    • Data type: CDATA, URIs, Dates, Link types, language code, color, text string, etc. 
    • Character references: for referring to rarely used characters: 
      • "&#x6C34" (in hexadecimal) represents the Chinese character for water 

Web 2.0

  • “Web 2.0 is a notion for a row of interactive and collaborative systems of the internet“
  • Web 2.0 is a vaguely defined phrase referring to various topics such as social networking sites, wikis, communication tools, and folksonomies.
  • Tim O'Reilly provided a definition of Web 2.0 in 2006: "Web 2.0 is the business revolution in the computer industry caused by the move to the internet as platform, and an attempt to understand the rules for success on that new platform. Chief among those rules is this: Build applications that harness network effects to get better the more people use them.”

Elements of the Web's next generation

  • People, Services, Technologies

Definition“ by O‘Reilly

Web 1.0

DoubleClick

Ofoto

Britannica Online content

Web sites

publishing

CMS 

directories

taxonomy

Web 2.0

Google AdSense

Flickr

Wikipedia

blogging

participation

wikis

tagging

folksonomy

improvement

personalized

tagging, community

community, free

dialogue


flexibility, freedom

community

    Characteristics of Web 2.0 applications

    • Typical characteristics of Web 2.0 applications
      • Users can produce and consume data on a Web 2.0 site
      • Web is used as a participation platform
      • Users can run software applications entirely through a Web browser
      • Data and services can be easily combined to create mashups

    Examples

    • Gmail
    • Google Notebooks (Collaborative Notepad in the Web)
    • Wikis
    • Wikipedia
      • Worlds biggest encyclopedia, Top 30 web site, 100 langueges
    • Del.icio.us (Social Tagging for Bookmarks)‏
    • Flickr (Photo Sharing and Tagging) 
    • Blogs, RSS, Blogger.com
    • Programmableweb.com: 150 web-APIs

    Blogs

    • Easy usable user interfaces to update contents
    • Easy organization of contents
    • Easy usage of contents
    • Easy publishing of comments
    • Social: collaborative (single users but strongly connected)‏

    Introduction


    • Wiki was invented by Ward Cunningham
    • Collection of HTML sites: read and edit
    • Most famous and biggest Wiki: Wikipedia (MediaWiki)
      • But: Also often used in Intranets (i. e. our group)
    • Problems solved socially instead of technically
    • Flexible structure
    • Background algorithms + human intelligence
    • No new technologies
    • social: collaborative (nobody owns contents)

    Wikis: Design Principles

    • Open
      • Should a page be found to be incomplete or poorly organized, any reader can edit it as they see fit. 
    • Incremental
      • Pages can cite other pages, including pages that have not been written yet. 
    • Organic
      • The structure and text content of the site are open to editing and evolution. 
    • Mundane
      • A small number of (irregular) text conventions will provide access to the most useful page markup. 
    • Universal
      • The mechanisms of editing and organizing are the same as those of writing so that any writer is automatically an editor and organizer. 
    • Overt
      • The formatted (and printed) output will suggest the input required to reproduce it. 
    • Unified
      • Page names will be drawn from a flat space so that no additional context is required to interpret them. 
    • Precise
      • Pages will be titled with sufficient precision to avoid most name clashes, typically by forming noun phrases.

    Wikis: Design Principles

    • Tolerant
      • Interpretable (even if undesirable) behavior is preferred to error messages. 
    • Observable
      • Activity within the site can be watched and reviewed by any other visitor to the site.
    • Convergent
      • Duplication can be discouraged or removed by finding and citing similar or related content. 

    Social tagging

    • Idea: Enrich contents by user chosen keywords
    • Replace folder based structure by a organisation using tags
    • New: Simple user interfaces for tagging and tag based search
    • First steps to Semantic Web?
    • Technically: user interfaces
    • Social: collaborative (own contents, shared tags)

    Collaborative Tagging

    Collaborative Tagging: Delicious

    • Browser plug-ins available from http://del.icio.us
    • Allows the tagging of bookmarks
    • Community aspect: 
      • Suggestion of tags that were used by other users
      • Availability of tag clouds for bookmarks of the whole community
      • Possibility to browse related bookmarks based on tags

    Tagging: Flickr.com

      

    Folksonomies

    • Data created by tagging, knowledge structures

    Tag Clouds

    Major breakthroughs of Web 2.0

    • The four major breakthroughs of Web 2.0 are:
      • Blurring the distinction between content consumers and content providers.
      • Moving from media for individuals towards media for communities.
      • Blurring the distinction between service consumers and service providers.
      • Integrating human and machine computing in a new way.

    Blurring the distinction between content consumers and providers

    • Interactive Web applications through asynchronous JavaScript and XML (AJAX)
    • Interactive Web applications through asynchronous JavaScript and XML (AJAX)
    • Web blogs or Blogs, Wikis

    Blurring the distinction between content consumers and providers:

    • Flickr, YouTube

    • Tagging – del.icio.us, shazam.com

    Blurring the distinction between content consumers and providers

    • RDFA, micro formats

    Moving from a media for individuals towards a media for communities


    • Folksomonies, FOAF
    • Community pages (friend-of-a-friend, flickr, LinkedIn, myspace, …)

    Moving from a media for individuals towards a media for communities

    • Second Life
    • Wikipedia

    Moving from a media for individuals towards a media for communities

    • Wikipedia


    Blurring the distinction between service consumers and service providers

    • RSS feeds

    • Yahoo pipes allow people to connect internet data sources, process them, and redirect the output.

    Blurring the distinction between service consumers and service providers

    • Widgets, gadgets, and mashups.

    Integrating human and machine computing in a new way

    •  Amazon Mechanical turk

    Web Evolution - summary

    Web 1.0

    Web 2.0

    Semantic Web

    Personal Websites

    Blogs

    Semantic Blogs: semiBlog, Haystack, Semblog, Structured Blogging

    Content Management Systems, Britannica Online

    Wikis, Wikipedia

    Semantic Wikis: Semantic MediaWiki, SemperWiki, Platypus, dbpedia, Rhizome

    Altavista, Google

    Google Personalised, DumbFind, Hakia

    Semantic Search: SWSE, Swoogle, Intellidimension

    CiteSeer, Project Gutenberg

    Google Scholar, Book Search

    Semantic Digital Libraries: JeromeDL, BRICKS, Longwell

    Message Boards

    Community Portals

    Semantic Forums and Community Portals: SIOC, OpenLink DataSpaces

    Buddy Lists, Address Books

    Online Social Networks

    Semantic Social Networks: FOAF, PeopleAggregator

    Semantic Social Information Spaces: Nepomuk, Gnowsis

    Web Evolution - summary (cont')

    • Traditional Web (Web1.0)
      • Normal User: browsing
      • Communication style: one-direction communication (e.g. reading a book)
      • Data: web data (string and syntactic format)
      • Data contributor: webmaster or experienced user
      • How to add data: compose HTML pages
    • Social Web (Web2.0) 
      • Normal User: browsing + publishing and organizing web data
      • Communication style: human-human (sharing) 
      • Data: web data + tags
      • Data contributor: normal user – revolution!
      • How to add data: tagging
    • Semantic Web 
      • Normal User: interacting (human-machine)
      • Communication style: humanmachine
      • Data: web data + tags + metadata (in SW Language)
      • Data contributor: normal user, machine
      • How to add data: machine generate or user publish

    Web principles

    • In the context of the traditional Web (Web 1.0) a set of principles were proposed:
      • Web resource are identified by URI (Universal Resource Identifier)
      • Namespaces should be used to denote consistent information spaces
      • Make use of HTML, XML and other W3C Web technology recommendations, as well as the decentralization of resources

    Web 1.0 + semantics = Semantic Web

    • The traditional Web represents information using:
      • natural language (English, German, Italian,…)
      • graphics, multimedia, page layout
    • Humans can process this easily
      • can deduce facts from partial information
      • can create mental associations
      • are used to various sensory information
    • However... Machines are ignorant!
      • partial information is unusable
      • difficult to make sense from, e.g., an image
      • drawing analogies automatically is difficult
      • difficult to combine information automatically
      • is same as ?
      • how to combine different XML hierarchies?

    Semantic Web

    • Semantic Web is about applying semantics to the tradition Web, Web 1.0
    • Some of the benefits of Semantic Web:
      • More precise queries
      • Smarter apps with less work
      • Share & link data between apps
      • Information has machine-processable and machine-understandable semantics

    Limitations of applying semantics to traditional Web

    • The principal limits of describing large, heterogeneous, and distributed systems
    • The principal limits of self representation and self reflection
    • Necessitates incompleteness and incorrectness of semantic descriptions.


    Limitations of applying semantics to traditional Web

    • The meta layer should apply heuristics that may help
      • Speed up the overall reasoning process. 
      • Increase its flexibility.
    • Therefore, it needs to be incomplete in various aspects and resemble important aspects of our consciousness.
      • Introspection
      • Reflection
    • Unbounded rationality, constrained rationality, limited rationality.
    • Description of data by metadata or programs by metaprograms
      • Always larger (even infinitely large) 
      • … or always an approximation

    Data look-up on the Web

    • In a large, distributed, and heterogeneous environment, classical ACID guarantees of the database world no longer scale in any sense.
    • Even a simple read operation in an environment such as the Web, a peer-to-peer storage network, a set of distributed repositories, or a space, cannot guarantee completeness in the sense of assuming that if data was not returned, then it was not there.
    • Similarly, a write can also not guarantee a consistent state that it is immediately replicated to all the storage facilities at once.

    Information retrieval on the Web

    • Modern information retrieval applies the same principles
      • In information retrieval, the notion of completeness (recall) becomes more and more meaningless in the context of Web scale information infrastructures.
      • It is very unlikely that a user requests all the information relevant to a certain topic that exists on a worldwide scale, since this could easily go far beyond the amount of information processing he or she is investing in achieving a certain goal. 
      • Therefore, instead of investigating the full space of precision and recall, information retrieval is starting to focus more around improving precision and proper ranking of results.

    Reasoning on the Web

    • What holds for a simple data look-up holds in an even stronger sense for reasoning on Web scale.
    • The notion of 100% completeness and correctness as usually assumed in logic-based reasoning does not even make sense anymore since the underlying fact base is changing faster than any reasoning process can process it.
    • Therefore, we have to develop a notion of usability of inferred results and relate them with the resources that are requested for it.

    LarKC – The Large Knowledge Collider

    • An open source, modular, and distributed platform for inference on the Web that makes use of new reasoning techniques
    • A plug-in architecture that supports cooperation between distributed, heterogeneous, cooperating modules enabling research into new and different reasoning techniques
    • A platform for infinitely scalable reasoning on the data-web
    • First real attempt at Reasoning at a Web scale
    • Not just adding a Web syntax to reasoning, but reflecting on the underlying assumptions of reasoning and the Web
      • Bringing Web principles to reasoning
      • Bringing reasoning to the Web
    • Thus LarKC is true Web Science
    • A number of broken assumptions of reasoning and logic in a Web context:
      • The Web is small
      • The Web does not change
      • The Web does not contradict itself

    LarKC – The Large Knowledge Collider

    • In fact:
      • The Web is huge
      • The Web changes faster than we can reason over it
      • The Web contains contradictions and different points of view
    • The essence of the web (search) must be included into the reasoning process, generating something new called reasearch
    • After 4000 years of separation LarKC merges induction and deduction

    Web Science – The Computer Science of the 21st Century

    • With the Web we have an open, heterogeneous, distributed, and fast changing computing environment.
    • Therefore we need computing to be understood as
      • A goal driven approach where the solution process is only partially determined and actually decided during runtime, based on available data and services.
      • A heuristic approach that gives up on absolute notion of completeness and correctness in order to gain scalability.
    • The times of 100% complete and correct solutions are gone.
    • The need for trade-offs:
      • In all areas one has to define the trade-off between the guarantees one provides in terms of service level agreements. Completeness and correctness are just examples of some very strong guarantees and what this requires in terms of assumptions, and computational complexity 
      • Different heuristic problem solving approaches are just different combinations of these three factors.
    • Service level agreements (or goals) define what has to be provided as result of solving a problem.
    • Do we request an optimal solution, a semi-optimal solution, or just any solution?

    Web Science – The Computer Science of the 21st Century (cont')

    • Assumptions describe the generality of the problem solving approach:
      • Assuming that there is only one solution allows stopping the search for an optimum immediately after a solution has been found. 
      • Instead of a global optimization method, a much simpler heuristic search method can be used in this case, which would still deliver a global optimum.
    • Computational complexity (scalability) or the resources that are required to fill the gap between the assumptions and the goals.
    • Computer science in the 20th century was about perfect solutions in closed domains and applications. 
    • Web science, the new computer science of the 21st century, will be about approximate solutions and frameworks that capture the relationships of partial solutions and requirements in terms of computational costs, i.e., the proper balance of their ratio.

    Web Science – The Computer Science of the 21st Century (cont')

    • This shift is comparable to the transition in physics, from classical physics to relativity theory and quantum mechanics,
    • ...where the notion of absolute space and time is replaced by relativistic notions and the principle limits of precision.
    • the more precisely we know about the location of a particle in space, the less we know about its movement in time and vice versa.

    Summary

    • The Web is a in an extending success story with over more than a 2 billion users more than 50 billion pages
    • There is a need for a new science that focuses on how huge decentralized Web systems work - Web Science
    • Semantics will play a central role in the future development of the Web
    • However there are limitations of applying semantics to traditional Web due to the principal limits of self representation and self reflection
    • Limitations should be addressed by considering 2 levels: meta layer and object layer
    • Meta layer should apply heuristics that may help speed up the overall reasoning process and increase its flexibility.
    • Introspection and Reflection can be used to move from one layer to the other

    References

    • T. Berners-Lee, W. Hall, J. Hendler, N. Shadbolt, D. Weitzner (2006): Creating a science of the Web. http://eprints.ecs.soton.ac.uk/12615/ 
    • T. Berners-Lee, W. Hall, J. Hendler, K. O’Hara, N. Shadbolt, D. Weitzner (2006): A Framework for Web Science. http://eprints.ecs.soton.ac.uk/13347/
    • D. Fensel, Dieter F. van Harmelen. Unifying Reasoning and Search to Web Scale, IEEE Internet Computing, 11(2), 2007 
    • D. Fensel, D. Wolf: The Scientific Role of Computer Science in the 21st Century. In Proceedings of the third International Workshop on Philosophy and Informatics (WSPI 2006), Saarbruecken, Germany, May 3-4, 2006. 
    • D. Fensel, F. van Harmelen, B. Andersson, P. Brennan, H. Cunningham E. Della Valle, F. Fischer, Z. Huang, A. Kiryakov and T. Kyung-il Lee, L. School,V. Tresp, S. Wesner, M. Witbrock and N. Zhong, Towards LarKC: a Platform for Web-scale ReasoningIEEE Computer Society Press Los Alamitos, CA, USA, 2008.
    • F. Fischer, G. Unel, B. Bishop and D. Fensel, Towards a scalable, pragmatic Knowledge Representation Language for the Web, 2009.
    • N. Shadbolt. Web Science Research Initiative Seminar November 2008
    • http://www.ecs.soton.ac.uk/podcasts/video.php?id=153
    • http://en.wikipedia.org/wiki/Web_of_Science
    • http://en.wikipedia.org/wiki/Web_2.0
    • http://en.wikipedia.org/wiki/Semantic_Web
    • Web Science Research Initiative http://webscience.org/
    • http://www.larkc.eu/