Intelligent Systems
INTRODUCTION
Where are we?

Title 

1 
Introduction 

2 
Propositional Logic 

3 
Predicate Logic 

4 
Reasoning 

5 
Search Methods 

6 
CommonKADS 

7 
ProblemSolving Methods 

8 
Planning 

9 
Software Agents 

10 
Rule Learning 

11 
Inductive Logic Programming 

12 
Formal Concept Analysis 

13 
Neural Networks 

14 
Semantic Web and Services 
Textbooks







Overview of the course: What is the course about?

Introduction

Propositional logic

Predicate logic

Reasoning

Search methods

CommonKADS

Problemsolving methods

Planning

Software Agents

Rule learning

Inductive logic programming

Formal concept analysis

Neural networks

Semantic Web and Services
Outline

Motivation

What is “Intelligence”?

What is “Artificial Intelligence” (AI)?

Strong AI vs. Weak AI

Technical Solution

Symbolic AI vs. Subsymbolic AI

Knowledgebased systems

Popular AI systems

Subdomains of AI

Some relevant people in AI

Summary
MOTIVATION
Introduction to Artificial Intelligence
What is “Intelligence”?

"Intelligence denotes the ability of an individual to adapt his thinking to new demands; it is the common mental adaptability to new tasks and conditions of life" (William Stern, 1912)

Being "intelligent" means to be able to cognitively grasp phenomena, being able to judge, to trade of between different possibilities, or to be able to learn.

An important aspect of "Intelligence" is the way and efficiency how humans are able to adapt to their environment or assimilate their environment for solving problems.

Intelligence manifests itself in logical thinking, computations, the memory capabilities of the brain, through the application of words and language rules or through the recognition of things and events.

The combination of information, creativity, and new problem solutions is crucial for acting "intelligent".
Testing “Intelligence” with the Turing Test

Turing test is a proposal to test a machine’s ability to demonstrate “intelligence”
Testing “Intelligence” with the Turing Test (1)

Turing test proceeds as follows:

A human judge C engages in a natural language conversation with one human B and one machine A , each of which tries to appear human.

All participants are placed in isolated locations.

If the judge C cannot reliably tell the machine A from the human B , the machine is said to have passed the test.

In order to test the machine's intelligence rather than its ability to render words into audio, the conversation is limited to a textonly channel such as a computer keyboard or screen

Turing test is an operational test for intelligent behaviour. For more details see [2].
“Chinese Room”

The “Chinese room” experiment developed by John Searle in 1980 attempts to show that a symbolprocessing machine like a computer can never be properly described as having a ”mind” or “understanding”, regardless of how intelligently it may behave.

With the “Chinese room” John Searle argues that it is possible to pass the Turing Test, yet not (really) think.

Source: http://en.wikipedia.org/wiki/Chinese_room
“Chinese Room” (1)

The “Chinese room” experiment proceeds as follows:

Searle, a human, who does not knows Chinese, is locked in a room with an enormous batch of Chinese script.

Slips of paper with still more Chinese script come through a slot in the wall.

Searle has been given a set of rules in English for correlating the Chinese script coming through with the batches of script already in the room.
“Chinese Room” (2)

Searle is instructed to push back through the slot the Chinese script with which the scripts coming in through the slot are correlated according to the rules.

Searle identifies the scripts coming in and going out on the basis of their shapes alone. He does not speak Chinese, he does not understand them

The scripts going in are called ‘the questions’, the scripts coming out are ‘the answers’, and the rules that Searle follows is ‘the program’.

Suppose also that the set of rules, the program is so good and Searle gets so good at following it that Searle’s answers are indistinguishable from those of a native Chinese speaker.
“Chinese Room” (3)

The result:

It seems clear that Searle nevertheless does not understand the questions or the answers

But Searle is behaving just a computer does, “performing computational operations on formally specified elements”

Hence, manipulating formal symbols, which is just what a computer running a program does, is not sufficient for understanding or thinking
What is “Artificial Intelligence”?

Many definitions exist, among them:

“The study of the computations that make it possible to perceive, reason, and act” (Winston, 1992)

“A field of study that seeks to explain and emulate [human] intelligent behaviour in terms of computational processes” (Schalkoff, 1990)

It is an interdisciplinary field that is based on results from philosphy, psychology, linguistics, or brain sciences

Difference to “traditional” computer science: Emphasis on cognition, reasoning, and acting

Generative theory of intelligence:

Intelligence emerges from the orchestration of multiple processes

Process models of intelligent behaviour can be investigated and simulated on machines
Early developments of Artificial Intelligence

Two main aspects begin to manifest in the early days of AI

Cognitive modelling, i.e., the simulation of cognitive processes through information processing models

The construction of “intelligent systems” that make certain aspects of human cognition and reasoning available.
Strong AI vs. Weak AI

Strong AI
 “An artificial intelligence system can think and have a mind . “ (John Searle 1986)
 “Machine intelligence with the full range of human intelligence” (Kurzweil 2005)
 Ai that matches or exceeds human intelligence.
 Intelligence can be reduced to information processing.
 “Science Fiction AI”

Weak AI
 Intelligence can partially be mapped to computational processes.
 Intelligence is information processing
 Intelligence can be simulated
TECHNICAL SOLUTIONS
Symbolic vs. Subsymbolic AI
Knowledgebased Systems
Information Processing and symbolic representation

Research on Information Processing in AI by

Exact formulisations.

Exemplary realisation via implementations.

Core aspect: Representation and processing of symbols as a foundation of internal processes.
Symbolic AI

Symbols are naming objects which provide access to meaning (Newell, 1958)

“Spoken words are the symbols of mental experience, and written words are the symbols of spoken words.” (Aristotle) [3]

Mental abilities of humans can be inspected on a symbolic level independent of neuronal architectures or processes.

Subject of Symbolic AI is thus the meaning of processes (or their symbolic representations respectively).

Symbolic AI aims to imitate intelligence via formal models.

Main persons behind symbolic AI are: Simon, Newell, Minsky
The “(General) Intelligent Agent”

Core paradigm of symbolic AI is the “Intelligent Agent” [4]:

has a memory and the capability to act in his world based on it.

has sensors to perceive information from his environment.

has actuators to influence the external world.

has the capability to probe actions. By that he is able to choose the best possible action.

has internal memory for methods and the exploration of the world is guided by knowledge kept in it.
Subymbolic AI

Subsymbolic AI (SSAI) aims to model intelligence empirically.

SSAI was inspired by biological systems: A model which imitates neural nets in the brain is the basis for the creation of artificial intelligence.

Neural nets consist of a network of
neurons which have weighted connections
with each other.

Early work by Rosenblatt (1962):
the “Perceptron” [6]

Advantages of artificial neuronal nets:

Distributed representation

Representation and processing of fuzziness

Highly parallel and distributed action

Speed and faulttolerance
Image: http://www.neuronalesnetz.de
new slide
KNOWLEDGEBASED SYSTEMS
Development

General Problem Solver

Knowledgeispower hypothesis

Knowledge levels

3a. Newell’s 3 levels of knowledge

3b. Brachman’s 5 levels of knowledge


Problem Solving Methods
1. General Problem Solver

The General Problem Solver (GPS) is a universal problem solving approach.

GPS is the first approach that makes the distinction between knowledge of problems domains and how to solve problems

GPS is domain and task independent approach.

GPS does not put any restrictions both on the domain knowledge and on the task.

Examples of GPS are: automated theorem proving and generic search methods
Automated theorem proving

Automatic theorem provers are GPS for which every problem can be expressed as logical inference

Automated theorem proving is about proving of mathematical theorems by a computer program
Generic Search Methods

Generic Search Methods are GPS for which every problem can be expressed as search

One particular example of a Generic Search Method is the A* algorithm.

A* works for problems that can be represented as a state space i.e. a graph of states. Initial conditions of the problem are represented as start state , goal conditions are represented as end state

A* is an informed search or heuristic search approach that uses the estimation function:

f(n)=g(n)+h(n)

g(n) the cost to get from the star state to current state n

h(n) estimated cost to get from current state n to end state

f(n) estimated total cost from start state through current state n to the end state
More in Lecture 5
1. General Problem Solver (1)

However, GPS has a set of limitations :

It works in theory but in practice works only on toy problems (e.g. Tower of Hanoi)

Could not solve realworld problems because search was easily lost in the combinatorial explosion of intermediate states
2. Knowledgeispower hypothesis
Knowledgeispower hypothesis , also called the Knowledge Principle was formulated by E.A. Feigenbaum in 1977:
“knowledge of the specific task domain in which the program is to do its problem solving was more important as a source of power for competent problem solving than the reasoning method employed” [15]
2. Knowledgeispower hypothesis (1)

The Knowledgeispower hypothesis shifted the focus on how to build intelligent systems from inference to the knowledge base .

Problem solving is guided by experiential, qualitative, heuristic knowledge.

The meaning of intelligence as knowledge is the common meaning in English world.

The Central Intelligence Agency (CIA) defines intelligence as knowledge.

The Knowledgeispower hypothesis lead to the emergence of a new filed i.e. expert systems and a new profession i.e. knowledge engineer
3. Knowledge levels
3a. Newell’s 3 levels of knowledge
3b. Brachman’s 5 levels of knowledge
3a. Newell’s 3 levels of knowledge [5]

In his work from 1981, Newell tried to answer questions such as

How can knowledge be characterised?

What is the relation of this characterisation and the representation of knowledge?

What is characteristic about a system which holds knowledge?

Newell distinguished 3 levels in the context of knowledge representation:

Knowledge Level

Logical Level

Implementation Level
3a. Newell’s 3 levels of knowledge (1)

Knowledge Level

The most abstract level of representing knowledge.

Concerns the total knowledge contained in the Knowledge Base

Example: The automated DBInformation system knows that a trip from Innsbruck to Vienna costs 120€
3a. Newell’s 3 levels of knowledge (2)

Logical Level

Encoding of knowledge in a formal language.

Example: Price(Innsbruck, Vienna, 120)
3a. Newell’s 3 levels of knowledge (3)

Implementation Level

The internal representation of the sentences.

Example:

As a String “Price(Innsbruck, Vienna, 120)”

As a value in a matrix
3b. Brachman’s 5 Levels of Knowledge [12]

Brachman defines 5 levels for different types of representations.

Levels interpret the transition from data to knowledge.

Each level corresponds to an explicit set of primitives offered to the knowledge engineer.

Ordering of knowledge levels from simple/abstract to complex/concrete:

Implementational Level

Logical Level

Epistemological Level

Conceptual Level

Linguistic Level
3b. Brachman’s 5 Levels of Knowledge (1)

Implementational Level

The primitives are pointers and memory cells.

Allows the construction of data structures with no a priori semantics
3b. Brachman’s 5 Levels of Knowledge (2)

Logical Level

The primitives are logical predicates, operators, and propositions.

An index is available to structure primitives.

A formal semantic is given to primitives in terms of relations among objects in the real world

No particular assumption is made however as to the nature of such relations
3b. Brachman’s 5 Levels of Knowledge (3)

Epistemological Level

The primitives are concept types and structuring relations.

Structuring relations provide structure in a network of conceptual types or units. (i.e. inheritance: conceptual units, conceptual subunits)

The epistemological level links formal structure to conceptual units

It contains structural connections in our knowledge needed to justify conceptual inferences.
3b. Brachman’s 5 Levels of Knowledge (4)

Conceptual Level

The primitives are conceptual relations, primitive objects and actions.

The primitives have a definite cognitive interpretation, corresponding to languageindependent concepts like elementary actions or thematic roles
3b. Brachman’s 5 Levels of Knowledge (5)

Linguistic Level

The primitives are words, and (lingustic) expressions.

The primitives are associated directly to nouns and verbs of a specific natural language

Arbitrary relations and nodes that exist in a domain
Problem Solving Methods

Problem Solving Methods (PSM) abstract from details of the implementation of the reasoning process.

Characteristics of PSM [10]:

A PSM specifies which inference actions have to be carried out for solving a given task.

A PSM determines the sequence in which these actions have to be activated.

Knowledge roles determine which role the domain knowledge plays in each inference action.
Heuristic Classification

Generic inference pattern “Heuristic Classification” describes the problemsolving behaviour of these systems on the Knowledge Level in a generic way.
Propose & Revise
Propose & Revise

The propose & revise method is an efficient method for solving the task of parametric design . (see more details in [14])

The method dependes on the following inferences:

propose – derives an initial design based on the requirements;

Ctest – requires knowledge that describes which possible designs are valid (i.e., the domain constraints);

revise – tries to improve an incorrect design based on the feedback of the Ctest step.
More in Lecture 7
Knowledgebased systems (KBS)

KBS are realized based on a knowledge base (KB).

KB contains a model represented in a (logical) formalism which can be interpreted by an interpreter (inference engine) that is able draw conclusions from it.

KBs typically capture knowledge of a domain.

Methodologies for the development of KBS: e.g. CommonKADS

Examples: CYC
One of the first systems that aimed to capture common knowledge in a knowledge base
Expert systems (ES)

Special form of a KBS.

Definition: An expert system is a software application that stores knowledge about a certain domain. It is able to draw conclusions from that knowledge and offers concrete solutions for problems in that domain.

ES simulate human experts and thus the knowledge base typically consists of highly specialized expert knowledge.

Reasoning of human experts vs. reasoning in ES:

Human experts are able to master unforeseen effects and situations.

Human experts are able to learn from experiences.

Human experts expand their knowledge continuously.

Human experts derive new knowledge not only based on drawn. conclusions but via analogy and intuition.
POPULAR AI SYSTEMS
ELIZA

Early computer program capable of natural language processing.

Written by J. Weizenbaum between 1964 and 1966.

ELIZA simulated a psychotherapist by reformulating questions posed by the user.

Sample ELIZA conversation:
(Source: Wikipedia)
More information: [9]
Deep Blue

Chessplaying computer developed by IBM that won against world champion Garry Kasparov in 1997.

Applied a brute force strategy, processing
was highly parallel.

Evaluation of 200 million positions per second.

Deep Blue's knowledge base contained over
4,000 positions and 700,000 grandmaster
games.

It was finetuned by chess grand masters.

Admission from IBM: „Deep Blue, as it stands

today, is not a "learning system." It is therefore
not capable of utilizing artificial intelligence to
either learn from its opponent or "think" about
the current position of the chessboard.“

Link: http://www.research.ibm.com/deepblue/
The Humanoid Robot COG

Project at the MIT Artificial Intelligence Lab

The goal of the COG project was to build a robot capable of interacting with humans and objects in a humanlike way.

"As I pondered [this] and thought about HAL, I decided to try to build the first serious attempt at a robot with humanlevel capabilities, the first serious attempt at a HALclass being." (Rodney Brooks, Inventor of COG)

Link: http://groups.csail.mit.edu/lbr/humanoidroboticsgroup/cog/
CALO („Cognitive Assistant that Learns and Organizes“)

DARPA funded project, “Personal assistant that learns” – program

Involves 25 partners, 300+ researchers, including top researchers in AI

500+ publications in first four years

“The goal of the project is to create cognitive software systems, that is, systems that can reason, learn from experience, be told what to do, explain what they are doing, reflect on their experience, and respond robustly to surprise. “ (calosystem.org)

CALO assists its user with six highlevel functions:

Organizing and Prioritizing Information

Preparing Information Artifacts

Mediating Human Communications

Task Management

Scheduling and Reasoning in Time

Resource allocation

Link: http://www.calosystem.org/
HAL 9000

An advanced device capable of performing a variety of tasks and interacting with its human users (companions?).

The HAL9000 communicates by voice and can control a auxiliary devices on a spaceship.

It (he?) has an unfortunate tendency towards obsessing over minor details or inconsistencies in the instructions given it, however.

In the events described in Arthur C. Clarke's “2001: A Space Odyssey,” HAL's tendency toward obsessive literalism led to the unfortunate death of most of its spaceship's human crew
Further popular applications

SEAS (“Synthetic Environment for Analysis and Simulation”)

Can be used to simulate realistic events; has a world model

http://www.krannert.purdue.edu/centers/perc/html/aboutperc/seaslabs/seaslabs.htm

SYSTRAN

Early machine translation system

Foundation for Yahoo’s Babelfish or Google Translator

http://www.systransoft.com/

VirtualWoman

Virtualreality based chatbot

http://virtualwoman.net/

For further references, see http://en.wikipedia.org/wiki/List_of_notable_artificial_intelligence_projects
SUBDOMAINS OF AI
Subdomains of AI

Cognition as information processing

Artificial neuronal networks

Heuristic search methods

Knowledge representation and logic

Automatic theorem proving

Nonmonotonic reasoning

Casebased reasoning

Planning

Machine Learning

Knowledge Engineering

Natural Language Processing

Image Understanding

Cognitive Robotics

Software Agents
Cognition

Deals with complex software systems that directly interact and communicate with human users.

Characteristics of cognitive systems (CS):

CS are directly integrated in their environment, act in it, and are able to communicate with it.

CS are able to direct and adapt their actions based on the environment they are situated in.

CS typically represent systemrelevant aspects of the environment.

Their information processing capabilities are characterized through learning aptitude and anticipation

Examples of cognitive system:

Organisms / biological cognitive systems

Technical systems such as robots or agents

Mixed humanmachine systems
Neural networks

Neural networks are networks

of neurons as in the real

biological brain.

Neurons are highly specialized

cells that transmit impulses within

animals to cause a change in a target

cell such as a muscle effector cell or glandular cell.

The axon , is the primary conduit through which the neuron transmits impulses to neurons downstream in the signal chain

Humans: 1011 neurons of > 20 types, 1014 synapses, 1ms10ms cycle time

Signals are noisy “spike trains” of electrical potential
Neural networks (2)

What we refer to as Neural Networks in the course are mostly Artificial Neural Networks (ANN).

ANN are approximation of biological neural networks and are built of physical devices, or simulated on computers.

ANN are parallel computational entities that consist of multiple simple processing units that are connected in specific ways in order to perform the desired tasks.

Remember: ANN are computationally primitive approximations of the real biological brains .

Application examples : e.g., handwriting recognition, time series prediction, kernel machines (support vector machines, data compression, financial predication, speech recognition, computer vision, protein structures
Search Methods

Search Methods are typically helping humans to solve complex tasks by generating (optimal) plans (i.e. a set of operations / states) that includes sequences / actions to reach a goal state.

Example problem: Tower of Hanoi



Definition: A search method is defined by picking the order of node expansion.

Search strategies are evaluated according to completeness, time complexity, space complexity, optimality.

Time and space complexity are measured in terms of maximum branching, depth of the leastcost solution, maximum depth of the state space

Distinction between informed / uninformed search techniques
Knowledge Representation and Logic

The term knowledge representation describes the design and implementation of formalisms, to model a part of the reality (a domain).

A model represented using a formalisms and implemented by an interpreter is often called a knowledge base.

A knowledge base is a collection of facts and beliefs.

Modelling of knowledge bases happens on a conceptual level.

Intention: To model a domain of discourse and to draw inferences about the objects in the domain (reasoning)

Logic studies the principles of reasoning and offers

Formal languages for expressing knowledge

Well understood formal semantics

Reasoning methods to make implicit knowledge explicit
Automatic theorem proving

Automatic theorem proving deals with the design and implementation of computer programmes that are capable of making mathematical proofs.

Theorem provers deduce new formulas from given formulas via logical deduction rules until the target formula is found.

Theoretical foundation of automated theorem proving: mathematical logic; typically firstorderlogic.

Formulas are mathematically precisely defined via interpretations (provide semantics for function and predicate symbols via mappings and relations respectively)
Nonmonotonic reasoning

Classical logic is monotonic in the following sense: whenever a sentence A is a logical consequence of a set of sentences T, then A is also a consequence of an arbitrary superset of T [13].

Nonmonotonic reasoning:

Additional information may invalidate conclusions.

Nonmonotonic reasoning is closer to (human) commonsense reasoning.

Most rules in commonsense reasoning only hold with exceptions (i.e. university_professors_teach)

Important approaches to formalise nonmonotonic reasoning:

DefaultLogics: Nonclassical inference rules are use to represent defaults

The modale approach: Modal operators are used to explicitely declare if sth. is believed in or is consistent.

Circumscription: Validity can be restricted to specific models.

Conditional approaches: A conditional junctor is used to represent defaults in a logical language.
Casebased reasoning

Definition: A “case” is an experience made during the solving of a problem.

A case is typically informally given and covers the problem and the solution.

Experiences (resp. cases) are used to solve newly occurring problems.

Cases are collected in a socalled casebase (analogous to a knowledge base in KBS)

Casebased reasoning is inspired by human problem solving capabilities.

Application scenarios are characterized through:

A considerable amount of cases has to be available

Using the cases to solve the problem has to be easier than solving the problem directly.

Available information is incomplete or unsecure and imprecise.

The construction of a KBS and the modelling of cases is not easily possible.

Typical application scenarios can be found in the area of diagnostics, electronic sales, configaration, or planning.
Planning

What is Planning?

“Planning is the process of thinking about the activities required to create a desired goal on some scale” [Wikipedia]

We take a more pragmatic view – planning is a flexible approach for taking complex decisions:

decide about the schedule of a production line;

decide about the movements of an elevator;

decide about the flow of paper through a copy machine;

decide about robot actions.

By “flexible” we mean:

the problem is described to the planning system in some generic language;

a (good) solution is found fully automatically;

if the problem changes, all that needs to be done is to change the description.

Planning looks at methods to solve any problem that can be described in the language chosen for the particular planning system.

Approaches for the generation of action sequences: action planning and situated activity planning.
Machine Learning

Machine Learning (ML) is a central research area in AI to acquire knowledge.

ML deals with the computeraided design and realisation of learning problems.

Learning is defined as the process that enables a systems to perform better during solving of the same or similar tasks in the future (Simon, 1983)

Reduction of learning to mathematical theories: Deduction, Induction, Abduction.

Learning task is typically characterized through the description of inputs, expected outputs, and environmental conditions.

Typical machine learning applications: Data mining, Speech recognition, text analysis, control learning, hidden markov networks, etc.
Knowledge Engineering

Knowledge engineering is concerned with the acquisition, management, use and transformation of knowledge.

Goals are similar to software engineering, i.e. to systematically develop expert systems using existing methods and tools.

Core process in knowledge engineering: knowledge acquisition; During knowledge acquisition knowledge is formalised, i.e. transformed from a natural language representation to a formal representation.

Process models for knowledge acquisition: Model by Puppe; model by Buchanan; Harmon/Mauss/Morrissey; Waterman; or MIKE

Methodical approachs and tools: D3; CommonKADS; MIKE; ProtégéII; RASSI

Application cases include the development of expert systems, workflow systems or knowledge management
Natural Language Processing

Goal: Processing and understanding of speech or written language.

Early applications include questionanswer systems, naturallanguage based access to databases or speechbased control of robots.

Challenges include information reconstruction from spoken words or information selection and reduction during speech production.

Application areas: Tools for interhuman communication, tools for text generation or text correction (i.e. identification of grammatical errors based on language models), information classification or filtering, or humanmachine communication.
Image Understanding

Image Understanding (IU) deals with the analysis and interpretation of visual information. IU denotes the reconstruction and interpretation of scenes based on images.

Early approaches based on pattern recognition (still one of the most important foundations of this field)

Prominent application: object recognition of still and moving objects

Application areas: symbol recognition, medical image analysis, vehicle navigation, image archiving, gesture recognition,
Cognitive Robotics

AI deals with the development of robots as autonomous and intelligent systems.

Robotic covers many subareas of AI and involves interdisciplinary work including mechanic and electrical design and cognitive areas.

Types of robots: static robots, mobile robots, and humanoid robots.

Application areas: construction, planning, or observation.
Software Agents

Core paradigm in AI.

Definition: A software agent is a longterm operating program whose function can be described as autonomous execution of tasks or tracing of goals via interaction with his environment.

Agent (see earlier slide)

has a memory and the capability to act in his world based on it.

has sensors to perceive information from his environment.

has actuators to influence the external world.

has the capability to probe actions

has internal memory for methods and the exploration of the world is guided by knowledge kept in it.

Applications: Data collection and filtering, event notification, planning and optimization in various application areas (commerce, production, military, education)
SOME RELEVANT PEOPLE IN AI
Some relevant people in AI




SUMMARY
Summary

Birth of AI in the 1950s

Broad spectrum of subdomains and combination of disciplines

Distinction between

Weak and strong AI

Symbolic and subsymbolic AI

Central role: symbols and knowledge representation

Knowledgebased systems and intelligent agents are core concepts in AI
REFERENCES
References

Wikipedia links:

http://en.wikipedia.org/wiki/List_of_notable_artificial_intelligence_projects

http://en.wikipedia.org/wiki/Turing_test

http://en.wikipedia.org/wiki/General_Problem_Solver
References

Mandatory reading:

[1] G. Görz, C.R. Rollinger, J. Schneeberger (Hrsg.) “Handbuch der künstlichen Intelligenz” Oldenbourg Verlag, 2003, Fourth edition

Further reading:

[2] A. Turing. "Computing Machinery and Intelligence", Mind LIX (236): 433–460, Ocotober, 1950.

[3] Aristotle “On Interpretation”, 350 B.C.E, see: http://classics.mit.edu/Aristotle/interpretation.html

[4] A. Newell, H.A. Simon, “Human Problem Solving” Englewood Cliffs, N.J.: Prentice Hall, 1972

[5] A. Newell. “The Knowledge Level”, AI Magazine 2 (2), 1981, p. 120.

[6] F. Rosenblatt. “Strategic Approaches to the Study of Brain Models” In: Förster, H.: Principles of SelfOrganization. Elmsford, N.Y.: Pergamon Press, 1962.

[7] S. Russell, E.H. Wefald. "Do the Right Thing: Studies in Limited Rationality" MIT Press, 1991.

[8] C. Beierle and G. KernIsberner "Methoden wissensbasierter Systeme. Grundlagen, Algorithmen, Anwendungen" Vieweg, 2005.
References

[9] J. Weizenbaum. "ELIZA  A Computer Program For the Study of Natural Language Communication Between Man And Machine", Communications of the ACM 9 (1): p. 36–45, 1966.

[10] W. Birmingham and G. Klinker “Knowledge Acquisition Tools with Explicit ProblemSolving Methods” The Knowledge Engineering Review 8, 1 (1993), 525

[11] A. Newell and H. Simon "GPS, a program that simulates human thought" In: Computation & intelligence: collected readings, pp. 415  428, 1995.

[12] R. J. Brachman “On the Epistemological Status of Semantic Networks” In: N.V. Findler (ed.): Associative Networks: Representation and Use of Knowledge by Computers. New York: Academic Press, 1979, 350.

[13] G. Brewka, I. Niemelä, M. Truszczynski “Nonmonotonic Reasoning” In: V. Lifschitz, B. Porter, F. van Harmelen (eds.), Handbook of Knowledge Representation, Elsevier, 2007, 239284

[14] D. Fensel “ProblemSolving Methods: Understanding, Description, Development and Reuse”,, Springer LNAI 1791, 2000

[15] E.A. Feigenbaum. “The Art of Artificial Intelligence: Themes and Case Studies of Knowledge Engineering,” Proceedings of the International Joint Conference on Artificial Intelligence, Cambridge, MA, 1977

[16] W.J. Clancey. “Heuristic Classification”, Artificial Intelligence , 27:289350, 1985
Next Lecture

Title 

1 
Introduction 

2 
Propositional Logic 

3 
Predicate Logic 

4 
Reasoning 

5 
Search Methods 

6 
CommonKADS 

7 
ProblemSolving Methods 

8 
Planning 

9 
Software Agents 

10 
Rule Learning 

11 
Inductive Logic Programming 

12 
Formal Concept Analysis 

13 
Neural Networks 

14 
Semantic Web and Services 
Questions?
Intelligent Systems
Propositional Logic
Where are we?

Title 

1 
Introduction 

2 
Propositional Logic 

3 
Predicate Logic 

4 
Reasoning 

5 
Search Methods 

6 
CommonKADS 

7 
ProblemSolving Methods 

8 
Planning 

9 
Software Agents 

10 
Rule Learning 

11 
Inductive Logic Programming 

12 
Formal Concept Analysis 

13 
Neural Networks 

14 
Semantic Web and Services 
Outline

Motivation

Technical Solution

Syntax

Semantics

Inference

Illustration by a Larger Example

Extensions

Summary

References
MOTIVATION
Logic and Deduction

Logic is used to formalize deduction

Deduction = derivation of true statements (called conclusions ) from statements that are assumed to be true (called premises )

Natural language is not precise, so the careless use of logic can lead to claims that false statements are true, or to claims that a statement is true, even tough its truth does not necessarily follow from the premises
=> Logic provides a way to talk about truth and correctness in a rigorous way, so that we can prove things, rather than make intelligent guesses and just hope they are correct
Why Propositional Logic?

Propositional logic is a good vehicle to introduce basic properties of logic; used to:

Associate natural language expressions with semantic representations

Evaluate the truth or falsity of semantic representations relative to a knowledge base

Compute inferences over semantic representations

One of the simplest and most common logic

The core of (almost) all other logics
What is Propositional Logic?

An unambiguous formal language, akin to a programming language

Syntax: Vocabulary for expressing concepts without ambiguity

Semantics: Connection to what we're reasoning about

Interpretation  what the syntax means

Reasoning: How to prove things

What steps are allowed
TECHNICAL SOLUTIONS
SYNTAX
Syntax

Logical constants: true, false

Propositional symbols: P, Q, S, ...

Wrapping parentheses: ( … )

Atomic formulas: Propositional Symbols or logical constants

Formulas are either atomic formulas, or can be formed by combining atomic formulas with the following connectives:

∧ ...and [conjunction]

∨ ...or [disjunction]

→...implies [implication / conditional]

↔ ...is equivalent [biconditional]

¬ ...not [negation]
Syntax (cont’)

A sentence (well formed formula) is defined as follows:

A symbol is a sentence

If S is a sentence, then ¬S is a sentence

If S is a sentence, then (S) is a sentence

If S and T are sentences, then (S ∨ T), (S ∧ T), (S → T), and (S ↔ T) are sentences

A sentence results from a finite number of applications of the above rules
Syntax – BNF Grammar

Sentence → AtomicSentence  ComplexSentence

AtomicSentence → True  False  P  Q  R  ...

ComplexSentence → ( Sentence )  Sentence Connective Sentence  ¬ Sentence

Connective → ∧  ∨  →  ↔

Ambiguities are resolved through precedence ¬ ∧ ∨ ↔
or parentheses
e.g. ¬ P ∨ Q ∧ R ⇒ S is equivalent to (¬ P) ∨ (Q ∧ R)) ⇒ S
Syntax – Examples
P means "It is hot. "  ¬ p ∧ ¬ q 
Q means "It is humid. "  ¬(p ∧ q) 
R means "It is raining. "  (p ∧ q) ∨ r 
(P ∧ Q) → R "If it is hot and humid, then it is raining " 
p ∧ q ∧ r (((¬ p) ∧ q) → r) ↔ ((¬ r) ∨ p) 
Q → P "If it is humid, then it is hot " 
(¬ (p ∨ q) → q) → r ((¬ p) ∨ (¬ q)) ↔ (¬ r) Etc. 
SEMANTICS
Semantics
 Interpretations
 Equivalence
 Substitution
 Models and Satisfiability
 Validity
 Logical Consequence (Entailment)
 Theory
Semantics – Some Informal Definitions

Given the truth values of all symbols in a sentence, it can be “evaluated” to determine its truth value (True or False)

A model for a KB is a “possible world” (assignment of truth values to propositional symbols) in which each sentence in the KB is True

A valid sentence or tautology is a sentence that is True under all interpretations, no matter what the world is actually like or how the semantics are defined (example: “It’s raining or it’s not raining”)

An inconsistent sentence or contradictio n is a sentence that is False under all interpretations (the world is never like what it describes, as in “It’s raining and it’s not raining”)

P entails Q , written P ⊧ Q, means that whenever P is True, so is Q; in other words, all models of P are also models of Q
Interpretations

In propositional logic, truth values are assigned to the atoms of a formula in order to evaluate the truth value of the formula

An assignment is a function
v : P → {T,F}
v
assigns a truth value to any atom in a given formula (
P
is the set of all propositional letters, i.e. atoms)
Suppose
F
denotes the set of all propositional formulas. We can extend an assignment v to a function
v : F → {T,F}
which assigns the truth value v(A) to any formula A in F. v is called an interpretation.
Interpretations (cont’)
 Example:
 Suppose v is an assignment for which
v(p) = F, v(q) = T.

If A = (¬p → q) ↔ (p V q) , what is v(A) ?
Solution:
v(A) = v ((¬ p → q ) ↔ ( p V q ))
= v (¬ p → q ) ↔ v ( p V q )
= ( v (¬ p ) → v ( q )) ↔ ( v ( p ) V v ( q ))
= (¬ v ( p ) → v ( q )) ↔ ( v ( p ) V v ( q ))
= (¬F → T) ↔ (F V T)
= (T → T) ↔ (F V T)
= T ↔ T
= T
Equivalence

If A,B are formulas are such that
v ( A ) = v ( B )
for all interpretations v , A is (logically) equivalent to B:
A ≡ B

Example: ¬p V q ≡ p → q since both formulas are true in all interpretations except when v ( p ) = T, v ( q ) = F and are false for that particular interpretation

Caution : ≡ does not mean the same thing as ↔ :

A ↔ B is a formula (syntax)

A ≡ B is a relation between two formula (semantics)

Theorem : A ≡ B if and only if A ↔ B is true in every interpretation; i.e. A ↔ B is a tautology .
Equivalence and Substitution – Examples

Examples of logically equivalent formulas
 Example: Simplify

Solution:
Models and Satisfiability

A propositional formula A is satisfiable iff v ( A ) = T in some interpretation v; s uch an interpretation is called a model for A .

A is unsatisfiable (or, contradictory) if it is false in every interpretation

A set of formulas U = { A 1 ,A 2 ,…,A n } is satisfiable iff there exists an interpretation v such that v ( A 1 ) = v ( A 2 ) = … = v ( A n ) = T; such an interpretation is called a model of U.

U is unsatisfiable if no such interpretation exists

Relevant properties:

If U is satisfiable, then so is U − {Ai} for any i = 1, 2,…, n

If U is satisfiable and B is valid, then U U { B } is also satisfiable

If U is unsatisfiable and B is any formula, U U { B } is also unsatisfiable

If U is unsatisfiable and some A i is valid, then U − {Ai} is also unsatisfiable
Validity

A is valid (or, a tautology), denoted ⊧ A , iff v ( A ) = T , for all i nterpretations v

A is not valid (or, falsifiable), denoted ⊭ A if we can find some interpretation v , such that v(A) = F

Relationship between validity, satisfiability, falsifiability, and unsatisfiability:
Validity (cont’)

Examples:

Valid (tautology):

Not valid, but satisfiable:

False (contradiction):

Theorem:

(a) A is valid if and only if ¬A is unsatisfiable

(b) A is satisfiable if and only if ¬A is falsifiable
Logical Consequence (i.e. Entailment)

Let U be a set of formulas and A a formula. A is a (logical) consequence of U , if any interpretation v which is a model of U is also a model for A :

U ⊧ A

Example:
If some interpretation v is a model for the set
it must satisfy but in this interpretation, we also have
Theory

A set of formulas T is a theory if it is closed under logical consequence. This means that, for every formula A , if T ⊧ A , then A is in T

Let U be a set of formulas. Then, the set of all consequences of U
T ( U ) = { A  U ⊧ A }
is called the theory of U . The formulas in U are called the axioms for the theory T(U) .
INFERENCE
Inference Methods

Several basic methods for determining whether a given set of premises propositionally entails a given conclusion

Truth Table Method

Deductive (Proof) Systems

Resolution
Truth Table Method

One way of determining whether or not a set of premises logically entails a possible conclusion is to check the truth table for the logical constants of the language

This is called the truth table method and can be formalized as follows:

Step 1: Starting with a complete truth table for the propositional constants, iterate through all the premises of the problem, for each premise eliminating any row that does not satisfy the premise

Step 2: Do the same for the conclusion

Step 3: Finally, compare the two tables; If every row that remains in the premise table, i.e. is not eliminated, also remains in the conclusion table, i.e. is not eliminated, then the premises logically entail the conclusion
Example

Simple sentences:

Amy loves Pat: lovesAmyPat

Amy loves Quincy: lovesAmyQuincy

It is Monday: ismonday

Premises:

If Amy loves Pat, Amy loves Quincy:

lovesAmyPat → lovesAmyQuincy

If it is Monday, Amy loves Pat or Quincy:

ismonday → lovesAmyPat ∨ lovesAmyQuincy

Question:

If it is Monday, does Amy love Quincy?

i.e. is ismonday → lovesAmyQuincy entailed by the premises?
Step 1: Truth table for the premises


lovesAmyPat 
lovesAmyQuincy 
ismonday 
lovesAmyPat → lovesAmyQuincy 
ismonday → lovesAmyPat ∨ lovesAmyQuincy 

T 
T 
T 
T 
T 

T 
T 
F 
T 
T 

T 
F 
T 
F 
T 

T 
F 
F 
F 
T 

F 
T 
T 
T 
T 

F 
T 
F 
T 
T 

F 
F 
T 
T 
F 

F 
F 
F 
T 
T 
Step 1: Eliminate nonsat interpretations


lovesAmyPat 
lovesAmyQuincy 
ismonday 
lovesAmyPat → lovesAmyQuincy 
ismonday → lovesAmyPat ∨ lovesAmyQuincy 

T 
T 
T 
T 
T 

T 
T 
F 
T 
T 

T 
F 
T 
F 
T 

T 
F 
F 
F 
T 

F 
T 
T 
T 
T 

F 
T 
F 
T 
T 

F 
F 
T 
T 
F 

F 
F 
F 
T 
T 
Step 2: Truth table for the conclusion


lovesAmyPat 
lovesAmyQuincy 
ismonday 
ismonday → lovesAmyQuincy 

T 
T 
T 
T 

T 
T 
F 
T 

T 
F 
T 
F 

T 
F 
F 
T 

F 
T 
T 
T 

F 
T 
F 
T 

F 
F 
T 
F 

F 
F 
F 
T 
Step 2: Eliminate nonsat interpretations


lovesAmyPat 
lovesAmyQuincy 
ismonday 
ismonday → lovesAmyQuincy 

T 
T 
T 
T 

T 
T 
F 
T 

T 
F 
T 
F 

T 
F 
F 
T 

F 
T 
T 
T 

F 
T 
F 
T 

F 
F 
T 
F 

F 
F 
F 
T 
Step 3: Comparing tables

Finally, in order to make the determination of logical entailment, we compare the two rightmost tables and notice that every row remaining in the premise table also remains in the conclusion table.

In other words, the premises logically entail the conclusion.

The truth table method has the merit that it is easy to understand

It is a direct implementation of the definition of logical entailment.

In practice, it is awkward to manage two tables, especially since there are simpler approaches in which only one table needs to be manipulated

Validity Checking

Unsatisfability Checking
Validity checking

Approach: To determine whether a set of sentences
{ φ_{1},…,φ_{n} }
logically entails a sentence j, form the sentence
( φ_{1} ∧…∧ φ_{n} → φ )
and check that it is valid.

To see how this method works, consider the previous example and write the tentative conclusion as shown below.
(lovesAmyPat → lovesAmyQuincy) ∧ (ismonday → lovesAmyPat ∨ lovesAmyQuincy) → (ismonday → lovesAmyQuincy)

Then, form a truth table for our language with an added column for this sentence and check its satisfaction under each of the possible interpretations for our logical constants
Unsatisfability Checking

It is almost exactly the same as the validity checking method, except that it works negatively instead of positively.

To determine whether a finite set of sentences {φ_{1},…,φ_{n}} logically entails a sentence φ, we form the sentence
(φ_{1} ∧…∧ φ_{n} ∧ ¬ φ)
and check that it is unsatisfiable .

Both the validity checking method and the satisfiability checking method require about the same amount of work as the truth table method, but they have the merit of manipulating only one table
Example – A truth table


p 
q 
r 
p → q 
p → r 
p → r ∨ q 
( p → q ) ∧ ( p → r ) → ( p → r ∨ q ) 
p → r ∧ q 
( p → q ) ∧ ( p → r ) → ( p → r ∧ q ) 

T 
T 
T 
T 
T 
T 
T 
T 
T 

T 
T 
F 
T 
F 
T 
T 
F 
T 

T 
F 
T 
F 
T 
T 
T 
F 
T 

T 
F 
F 
F 
F 
F 
T 
F 
T 

F 
T 
T 
T 
T 
T 
T 
T 
T 

F 
T 
F 
T 
T 
T 
T 
T 
T 

F 
F 
T 
T 
T 
T 
T 
T 
T 

F 
F 
F 
T 
T 
T 
T 
T 
T 
Deductive (proof) systems

Semantic methods for checking logical entailment have the merit of being conceptually simple; they directly manipulate interpretations of sentences

Unfortunately, the number of interpretations of a language grows exponentially with the number of logical constants.

When the number of logical constants in a propositional language is large, the number of interpretations may be impossible to manipulate.

Deductive (proof) systems provide an alternative way of checking and communicating logical entailment that addresses this problem

In many cases, it is possible to create a “proof” of a conclusion from a set of premises that is much smaller than the truth table for the language;

Moreover, it is often possible to find such proofs with less work than is necessary to check the entire truth table
Schemata

An important component in the treatment of proofs is the notion of a schema

A schema is an expression satisfying the grammatical rules of our language except for the occurrence of metavariables in place of various subparts of the expression.

For example, the following expression is a pattern with metavariables φ and ψ.
φ → (ψ → φ)

An instance of a sentence schema is the expression obtained by substituting expressions for the metavariables.

For example, the following is an instance of the preceding schema.
p →( q → p )
Rules of Inference

The basis for proof systems is the use of correct rules of inference that can be applied directly to sentences to derive conclusions that are guaranteed to be correct under all interpretations

Since the interpretations are not enumerated, time and space can often be saved

A rule of inference is a pattern of reasoning consisting of:

One set of sentence schemata, called premises , and

A second set of sentence schemata, called conclusions

A rule of inference is sound if and only if, for every instance, the premises logically entail the conclusions
E.g. Modus Ponens (MP)
φ → ψ φ ψ 
P →(Q →R) P Q →R 
raining → wet raining wet 
(P →Q) →R P → Q R 
wet → slippery wet slippery 

Axiom schemata

The implication introduction schema ( II ), together with Modus Ponens, allows us to infer implications
φ → (ψ → φ)

The implication distribution schema ( ID ) allows us to distribute one implication over another
(φ → (ψ → c)) → ((φ → ψ) → (φ → c))

The contradiction realization schemata ( CR ) permit us to infer a sentence if the negation of that sentence implies some sentence and its negation
(ψ → ¬ φ) → ((ψ → φ) → ¬ ψ)
(¬ ψ → ¬ φ) → ((¬ ψ → φ) → ψ)
Axiom schemata (cont’)

The equivalence schemata ( EQ ) captures the meaning of the ↔ operator
(φ ↔ ψ) → (φ → ψ)
(φ ↔ ψ) → (ψ → φ)
(φ → ψ) → ((ψ → φ) → (ψ ↔ φ))

The meaning of the other operators in propositional logic is captured in the following axiom schemata
(φ ← ψ) ↔ (ψ → φ)
(φ ∨ ψ) ↔ (¬φ → ψ)
(φ ∧ ψ) ↔ ¬(¬φ ∨ ¬ψ)

The above axiom schemata are jointly called the standard axiom schemata for Propositional Logic

They all are valid
Proofs

A proof of a conclusion from a set of premises is a sequence of sentences terminating in the conclusion in which each item is either

a premise,

an instance of an axiom schema, or

the result of applying a rule of inference to earlier items in sequence

Example:
1. p → q
Premise 2. q → r
Premise 3. ( q → r ) → ( p → ( q → r ))
II 4. p → ( q → r )
MP : 3,2 5. ( p → ( q → r )) → (( p → q ) →( p → r ))
ID 6. ( p → q ) → ( p → r )
MP : 5,4 7. p → r
MP : 6,1
Proofs (cont’)

If there exists a proof of a sentence φ from a set Δ of premises and the standard axiom schemata using Modus Ponens, then φ is said to be provable from Δ, written as
Δ ⊢ φ

There is a close connection between provability and logical entailment (⊧):
A set of sentences Δ logically entails a sentence φ if and only if φ is provable from Δ

Soundness Theorem:
If φ is provable from Δ, then Δ logically entails φ.

Completeness Theorem:
If Δ logically entails φ, then φ is provable from Δ.
Proofs (cont’)

The concept of provability is important because it suggests how we can automate the determination of logical entailment

Starting from a set of premises Δ, we enumerate conclusions from this set

If a sentence φ appears, then it is provable from Δ and is, therefore, a logical consequence

If the negation of φ appears, then ¬φ is a logical consequence of Δ and φ is not logically entailed (unless Δ is inconsistent)

Note that it is possible that neither φ nor ¬φ will appear
Resolution

Propositional resolution is an extremely powerful rule of inference for Propositional Logic

Using propositional resolution (without axiom schemata or other rules of inference), it is possible to build a theorem prover that is sound and complete for all of Propositional Logic

The search space using propositional resolution is much smaller than for standard propositional logic

Propositional resolution works only on expressions in clausal form

Before the rule can be applied, the premises and conclusions must be converted to this form
Clausal Forms

A clause is a set of literals which is assumed (implicitly) to be a disjunction of those literals

Example:

Unit clause: clause with only one literal; e.g. {¬ q }

Clausal form of a formula: Implicit conjunction of clauses

Example:
Abbreviated notation:

Notation:

l literal, l c complement of l

C clause (a set of literals)

S a clausal form (a set of clauses)
Resolution – Properties of Clausal Forms
(1) If l appears in some clause of S , but l ^{c} does not appear in any clause, then, if we delete all clauses in S containing l , the new clausal form S' is satisfiable if and only if S is satisfiable
Example: Satisfiability of is equivalent to satisfiability of
(2) Suppose C = { l } is a unit clause and we obtain S' from S by deleting C and l ^{c} from all clauses that contain it; then, S is satisfiable if and only if S' is satisfiable
Example: is satisfiable iff is satisfiable
Resolution – Properties of Clausal Forms (cont’)
(3) If S contains two clauses C and C' , such that C is a subset of C' , we can delete C‘ without affecting the (un)satisfiability of S
Example: is satisfiable iff is satisfiable
(4) If a clause
C
in
S
contains a pair of complementary literals
l
,
l
^{
c
}
, then
C
can be deleted from
S
without affecting its (un)satisfiability
Example: is satisfiable iff is such
Converting to clausal form
Theorem : Every propositional formula can be transformed into an equivalent formula in CNF
1. Implications:
φ1 → φ2 → ¬φ1 ∨ φ2
φ1 ← φ2 → φ1 ∨ ¬φ2
φ1 ↔ φ2 → (¬φ1 ∨ φ2 ) ∧ (φ1 ∨ ¬φ2)
2. Negations:
¬¬φ → φ
¬(φ1 ∧ φ2 ) → ¬φ1 ∨ ¬φ2
¬(φ1 ∨ φ2 ) → ¬φ1 ∧ ¬φ2
3. Distribution:
φ1 ∨ (φ2 ∧ φ3 ) → (φ1 ∨ φ2) ∧ (φ1 ∨ φ3 )
(φ1 ∧ φ2) ∨ φ3 → (φ1 ∨ φ3 ) ∧ (φ2 ∨ φ3)
(φ1 ∨ φ2) ∨ φ3 → φ1 ∨ (φ2 ∨ φ3)
(φ1 ∧ φ2) ∧ φ3 → φ1 ∧ (φ2 ∧ φ3)
4. Operators:
φ1 ∨... ∨ φn → {φ1,...,φn}
φ1 ∧ ...∧ φn → {φ1}...{φn}
Example

Transform the formula
(p → q) → (¬q → ¬p)
into an equivalent formula in CNF
Solution:
Resolution Rule

Suppose C 1 ,C 2 are clauses such that l in C 1 , l ^{c} in C 2 . The clauses C 1 and C 2 are said to be clashing clauses and they clash on the complementary literals l , l ^{c}
C , the resolvent of C 1 ,C 2 is the clause
C 1 and C 2 are called the parent clauses of C.
Resolution Rule (cont’)

Example:
The clauses
clash on P, ¬ P
C 1 ,C 2 also clash on Q, ¬ Q so, another way to find a resolvent for these two clauses is
Resolution (cont’)

Theorem : Resolvent C is satisfiable if and only if the parent clauses C 1 ,C 2 are simultaneously satisfiable

Resolution Algorithm:
Input : S – a set of clauses
Output : “S is satisfiable” or “S is not satisfiable”

Set S _{0} := S

Suppose S _{i} has already been constructed

To construct S _{i+1} , choose a pair of clashing literals and clauses C 1 ,C 2 in S (if there are any) and derive
C := Res( C 1 ,C 2 )
S _{i+1} := S i U {C}

If C is the empty clause, output “S is not satisfiable”; if S _{i+1} = S _{i} , output “S is satisfiable”

Otherwise, set i := i + 1 and go back to Step 2
Resolution (cont’)

Example: Show that (p → q) → (¬q → ¬p) is a valid formula
Solution: We will show that
¬[(p → q) → (¬q → ¬p)]
is not satisfiable.
(1) Transform the formula into CNF:
C is the empty clause
(2) Show, using resolution, that

A derivation of the empty clause from S is called a refutation of S
Resolution (cont’)

Theorem : If the set of a clauses labeling the leaves of a resolution tree is satisfiable, then the clause at the root is satisfiable

Theorem ( Soundness ): If the empty clause is derived from a set of clauses, then the set of clauses is unsatisfiable

Theorem ( Completeness ) If a set of clauses is unsatisfiable, then the empty clause can be derived from it using resolution algorithm
ILLUSTRATION BY LARGER EXAMPLE
Problem Example

For each of these sets of premises, what relevant conclusion or conclusions can be drawn? Explain the rules of inference used to obtain each conclusion from the premises.
(a) “If I eat spicy foods, then I have strange dreams.” “I have strange dreams if there is thunder while I sleep.” “I did not have strange dreams.”
(b) “I am dreaming or hallucinating.” “I am not dreaming.” “If I am hallucinating, I see elephants running down the road.”
(c) “If I work, it is either sunny or partly sunny.” “I worked last Monday or I worked last Friday.” “It was not sunny on Tuesday.” “It was not partly sunny on Friday.”
Solution (a)
 (a) “If I eat spicy foods, then I have strange dreams.” “I have strange dreams if there is thunder while I sleep.” “I did not have strange dreams.”
 The relevant conclusions are: “I did not eat spicy food” and “There is no thunder while I sleep”.
 Let the primitive statements be:
 s, ‘I eat spicy foods’
 d, ‘I have strange dreams’
 t, ‘There is thunder while I sleep’
 Then the premises are translated as: s → d, t → d, and ¬d.
 And the conclusions: ¬s, ¬t.
 Steps Reason
 s → d premise
 ¬d premise
 ¬s Modus Tollens to Steps 1 and 2
 t → d premise
 ¬t Modus Tollens to Steps 4 and 2.
Solution (b)

(b) “I am dreaming or hallucinating.” “I am not dreaming.” “If I am hallucinating, I see elephants running down the road.”

The relevant conclusion is: “I see elephants running down the road.”.

Let the primitive statements be:

d, ‘I am dreaming’

h, ‘I am hallucinating’

e, ‘I see elephants running down the road’

Then the premises are translated as: d ∨ h, ¬d, and h → e.

And the conclusion: e.

Steps Reason

d ∨ h premise

¬d premise

h rule of disjunctive syllogism to Steps 1 and 2

h → e premise

e Modus Ponens to Steps 4 and 3
Solution (c)

(c) “If I work, it is either sunny or partly sunny.” “I worked last Monday or I worked last Friday.” “It was not sunny on Tuesday.” “It was not partly sunny on Friday.”

There is no single relevant conclusion in this problem, its main difficulty is to to represent the premises so that one is able infer anything at all. One possible relevant conclusion is: “It was sunny or partly sunny last Monday or it was sunny last Friday.”.

Let the primitive statements be:

wm, ‘I worked last Monday’

wf , ‘I worked last Friday’

sm, ‘It was sunny last Monday’

st, ‘It was sunny last Tuesday’

sf , ‘It was sunny last Friday’

pm, ‘It was partly sunny last Monday’

pf , ‘It was partly sunny last Friday’

Then the premises are translated as: wm ∨ wf , wm → (sm ∨ pm), wf → (sf ∨ pf ), ¬st, and ¬pf .

And the conclusion: sf ∨ sm ∨ pm.
Solution (c) – Method 1
Steps  Reason  
1  wf → (sf ∨ pf )  premise 
2  ¬wf ∨ sf ∨ pf  expression for implication 
3  ¬pf → (¬wf ∨ sf )  expression for implication 
4  ¬pf  premise 
5  ¬wf ∨ sf  modus ponens to Steps 3 and 4 
6  wf → sf  expression for implication 
7  wm ∨ wf  premise 
8  ¬wm → wf  expression for implication 
9  ¬wm → sf  rule of syllogism to Steps 8 and 6 
10  wm ∨ sf  expression for implication 
11  ¬sf → wm  expression for implication 
12  wm → (sm ∨ pm)  premise 
13  ¬sf → (sm ∨ pm)  rule of syllogism to Steps 11 and 12 
14  sf ∨ sm ∨ pm  expression for implication. 
Solution (c) – Method 2 (Use the rule of resolution)
Steps  Reason  
1  wf → (sf ∨ pf )  premise 
2  ¬wf ∨ sf ∨ pf  expression for implication 
3  ¬pf  premise 
4  ¬wf ∨ sf  rule of resolution to Steps 2 and 3 
5  wm ∨ wf  premise 
6  wm ∨ sf  rule of resolution to Steps 4 and 5 
7  wm → (sm ∨ pm)  premise 
8  ¬wm ∨ sm ∨ pm  expression for implication 
9  sf ∨ sm ∨ pm  rule of resolution to Steps 7 and 8 
EXTENSIONS
Extensions

Propositional logic is not adequate for formalizing valid arguments that rely on the internal structure of the propositions involved

In propositional logic the smallest atoms represent whole propositions (propositions are atomic)

Propositional logic does not capture the internal structure of the propositions

It is not possible to work with units smaller than a proposition

Example:

“A Mercedes Benz is a Car” and “A car drives” are two individual, unrelated propositions

We cannot conclude “A Mercedes Benz drives”
Extensions

It is possible to represent everything you want in propositional logic

But often this is not very efficient

Basic idea : A proposition is expressed as predicate about (on or more) objects in the world

Propositions are predicates and arguments

I.e. Car(Mercedes Benz).

The most immediate way to develop a more complex logical calculus is to introduce rules that are sensitive to more finegrained details of the sentences being used

When the atomic sentences of propositional logic are broken up into terms, variables, predicates, and quantifiers, they yield firstorder logic , which keeps all the rules of propositional logic and adds some new ones
SUMMARY
Summary

Propositional logic is one of the simplest and most common logic and is the core of (almost) all other logics

Propositional logic commits only to the existence of facts that may or may not be the case in the world being represented

Propositional logic quickly becomes impractical, even for very small worlds

This lecture focused on three core aspects of the propositional logic:

Syntax: Vocabulary for expressing concepts without ambiguity

Semantics: Connection to what we're reasoning about

Interpretation  what the syntax means

Reasoning: How to prove things

What steps are allowed
REFERENCES
References

Mandatory Reading:

FirstOrder Logic and Automated Theorem Proofing (2nd edition) by Melvin Fitting

Further Reading:

Mathematical Logic for Computer Science (2nd edition) by Mordechai BenAri

http://www.springer.com/computer/foundations/book/9781852333195

Propositional Logic at The Internet Encyclopedia of Philosophy

http://www.iep.utm.edu/p/proplog.htm

Wikipedia links:

http://en.wikipedia.org/wiki/Propositional_calculus
Next Lecture

Title 

1 
Introduction 

2 
Propositional Logic 

3 
Predicate Logic 

4 
Reasoning 

5 
Search Methods 

6 
CommonKADS 

7 
ProblemSolving Methods 

8 
Planning 

9 
Software Agents 

10 
Rule Learning 

11 
Inductive Logic Programming 

12 
Formal Concept Analysis 

13 
Neural Networks 

14 
Semantic Web and Services 
Questions?
Intelligent Systems
Predicate Logic
Outline

Motivation

Technical Solution

Syntax

Semantics

Inference

Illustration by Larger Example

Extensions

Summary

References
MOTIVATION
Propositional logic is not expressive enough

Suppose we want to capture the knowledge that
Anyone standing in the rain will get wet.
and then use this knowledge. For example, suppose we also learn that
Jan is standing in the rain.

We'd like to conclude that Jan will get wet. But each of these sentences would just be a represented by some proposition, say P, Q and R . What relationship is there between these propositions? We can say
P /\ Q → R
Then, given P /\ Q , we could indeed conclude R . But now, suppose we were told
Pat is standing in the rain.
Propositional logic is not expressive enough (cont’)

We'd like to be able to conclude that Pat will get wet, but nothing we have stated so far will help us do this

The problem is that we aren't able to represent any of the details of these propositions

It's the internal structure of these propositions that make the reasoning valid.

But in propositional calculus we don't have anything else to talk about besides propositions!
⇒ A more expressive logic is needed
⇒ Predicate logic (occasionally referred to as Firstorder logic (FOL) )
TECHNICAL SOLUTIONS
SYNTAX
Objects in Predicate Logic

Constants

Names of specific objects

E.g. doreen, gord, william, 32

Functions

Map objects to objects

E.g. father(doreen), age(gord), max(23,44)

Variables

For statements about unidentified objects or general statements

E.g. x, y, z, …
Terms

Terms represent objects

The set of terms is inductively defined by the following rules:

Constants: Any constant is a term

Variables: Any variable is a term

Functions: Any expression f(t1,…,tn) of n arguments (where each argument is a term and f is a function of arity n ) is a term

Terms without variables are called ground terms

Examples:

c

f(c)

g(x,x)

g(f(c), g(x,x))
Predicate Symbols and Signatures

Predicate symbols represent relations between zero or more objects

The number of objects define a predicate‘s aritiy

Examples:

Likes(george, kate)

Likes(x,x)

Likes(joe, kate, susy)

Friends (father_of(david), father_of(andrew))

Signature: A signature is a collection of constants, function symbols and predicate symbols with specified arities
Connectives

FOL formulas are joined together by logical operators to form more complex formulas (just like in propositional logic)

The basic logical operators are the same as in propositional logic as well:

Negation: ¬p („it is not the case that p“)

Conjunction: p ∧ q („p and q“)

Disjunction: p ∨ q („p or q“)

Implication: p → q („p implies q“ or “q if p“)

Equivalence: p ↔ q („p if and only if q“)
Quantifiers

Two quantifiers: Universal (∀) and Existential (∃)

Allow us to express properties of collections of objects instead of enumerating objects by name

Apply to sentence containing variable

Universal ∀: true for all substitutions for the variable

“for all”: ∀

Existential ∃: true for at least one substitution for the variable

“there exists”: ∃

Examples:

∃ x: Mother(art) = x

∀ x ∀ y: Mother(x) = Mother(y) → Sibling(x,y)

∃ y ∃ x: Mother(y) = x
Formulas

The set of formulas is inductively defined by the following rules:

Preciate symbols : If P is an nary predicate symbol and t1,…,tn are terms then P(t1,…,tn) is a formula.

Negation : If φ is a formula, then ¬φ is a formula

Binary connectives : If φ and ψ are formulas, then (φ → ψ) is a formula. Same for other binary logical connectives.

Quantifiers : If φ is a formula and x is a variable, then ∀xφ and ∃xφ are formulas.

Atomic formulas are formulas obtained only using the first rule

Example: If f is a unary function symbol, P a unary predicate symbol, and Q a ternary predicate symbol, then the following is a formula:
∀x∀y(P(f(x)) → ¬(P(x)) → Q(f(y), x, x)))

Formulas

Any occurrence a variable in a formulate not in the scope of a quantifier is said to be a free occurrence

Otherwise it is called a bound occurrence

Thus, if x is a free variable in φ it is bound in ∀xφ and ∃xφ

A formula with no free variables is called a closed formula

Example: x and y are bound variables, z is a free variable
∀x∀y(P(f(x)) → ¬(P(x)) → Q(f(y), x, z)))
BNF for FOL Sentences

S := <sentence>

<sentence> := <atomicsentence>
 <sentence> <connective> <sentence>
 <quantifier> <variable> ,... <sentence>
 ¬ <sentence>
 ( <sentence> )

<atomicsentence> := <predicate> ( <term> , ... )

<term> := <function> ( <term> , ... )
 <constant>
 <variable>

<connective> := ∧  v  →  ↔

<quantifier> := ∃  ∀

<constant> := “c"  “x1"  “john"  ...

<variable> := "a"  "x"  "s"  ...

<predicate> := “before"  “hasColor"  “raining"  ...

<function> := “mother"  “leftLegOf"  ...
TECHNICAL SOLUTIONS
Semantics
Semantics

Interpretations

Models and Satisfiability

Logical Consequence (Entailment)
Semantics – Overview

Interpretation – Maps symbols of the formal language (predicates, functions, variables, constants) onto objects, relations, and functions of the “world” (formally: Domain, relational Structure, or Universe)

Valuation – Assigns domain objects to variables

The Valuation function can be used for describing value assignments and constraints in case of nested quantifiers.

The Valuation function otherwise determines the satisfaction of a formula only in case of open formulae.

Constructive Semantics – Determines the semantics of complex expressions inductively, starting with basic expressions
Interpretation
Domain, relational Structure, Universe
D finite set of Objects d ^{ 1}, d ^{ 2}, ... , d ^{ n }
R,... Relations over D R ⊆ D ^{ n }
F,... Functions over D F: D ^{ n } → D
Basic Interpretation Mapping
constant I [c] = d Object
function I [f] = F Function
predicate I [P] = R Relation
Valuation V
variable V(x) = d ∈ D
Next, determine the semantics for complex terms and formulae constructively, based on the basic interpretation mapping and the valuation function above.
Interpretation (cont’)
Terms with variables
I [f(t_{1},...,t_{n})) = I [f] (I [t_{1}],..., I [t_{n}]) = F(I [t_{1}],..., I [t_{n}]) ∈D
where I[t_{i}] = V(t_{i}) if t_{i} is a variable
Atomic Formula
I [P(t_{1},...,t_{n})] true if (I [t_{1}],..., I [t_{n}]) ∈ I [P] = R
Negated Formula
I [¬α] true if I [α] is not true
Complex Formula
I[α∨β] true if I [α] or I [β] true
I [α∧β] true if I [α] and I [β] true
I [α→β] if I [α] not true or I [β] true
Interpretation (cont’)
Quantified Formula (relative to Valuation function)
I [∃x:α]  true if α is true with V’(x)=d for some d∈D where V’ is otherwise identical to the prior V. 
I [∀x:α]  true if α is true with V’(x)=d for all d∈D and where V’ is otherwise identical to the prior V. 
Note: ∀x∃y: α is different from ∃y∀x: α
In the first case ∀x∃y:α , we go through all value assignments V'(x), and for each value assignment V'(x) of x, we have to find a suitable value V'(y) for y.
In the second case ∃y∀x:α, we have to find one value V'(y) for y, such that all value assignments V'(x) make α true.
Models and Satisfiability
Given is an interpretation I into a domain D with a valuation V, and a formula φ .
We say that:
φ is satisfied in this interpretation or
this interpretation is a model of φ iff
I[φ] is true.
That means the interpretation function I into the domain D (with valuation V) makes the formula φ true.
Logical Consequence (Entailment)
Given a set of formulae Φ and a formula α .
α is a logical consequence of Φ iff
α is true in every model in which Φ is true.
Notation:
Φ ⊧ α
That means that for every model (interpretation into a domain) in which Φ is true, α must also be true.
TECHNICAL SOLUTIONS
Inference
Entailment and Derivation

Entailment: KB ⊧ Q

Entailment is a relation that is concerned with the semantics of statements

Q is entailed by KB (a set of premises or assumptions) if and only if there is no logically possible world in which Q is false while all the premises in KB are true

Stated positively: Q is entailed by KB if and only if the conclusion is true in every possible world in which all the premises in KB are true

Provability: KB ⊢ Q

Provability is a syntactic relation

We can derive Q from KB if there is a proof consisting of a sequence of valid inference steps starting from the premises in KB and resulting in Q
Important Properties for Inference

Soundness: If KB ⊢ Q then KB ⊧ Q

If Q is derived from a set of sentences KB using a set of inference rules, then Q is entailed by KB

Hence, inference produces only real entailments, or any sentence that follows deductively from the premises is valid

Only sentences that logically follow will be derived

Completeness: If KB ⊧ Q then KB ⊢ Q

If Q is entailed by a set of sentences KB, then Q can also be derived from KB using the rules of inference

Hence, inference produces all entailments, or all valid senteces can be proved from the premises

All the sentences that logically follow can be derived
Inference rules for Predicate Logic

Inference rules for propositional logic apply to propositional logic as well

Modus Ponens, Modus tollens etc.

New (sound) inference rules for use with quantifiers:

Modus Ponens, Modus Tollens

Universal elimination

Existential elimination

Existential introduction

Generalized Modus Ponens (GMP)
Modus Ponens and Modus Tollens

Modus Ponens (Law of Detachment )

Based on claiming 1.) that P is true, and 2.) the implication P → Q , we can conclude that Q is true.

If P, then Q. P, therefore, Q

Example:
RegularlyAttends(joe, lecture),
RegularlyAttends(joe, lecture) → Pass(joe, lecture)
Allow us to conclude:
Pass(joe, lecture)
Modus Ponens and Modus Tollens

Modus Tollens (Denying the consequent)

We again make two claims. 1.) The implication P → Q, and 2.) that Q is false. We can conclude that P is false as well.

If P , then Q . ¬Q Therefore, ¬P

Example:

Detects(alarm, intruder) → GoesOff(alarm),

¬GoesOff(alarm)

Allows us to conclude:

¬Detects(alarm, intruder)
Universal and Existential elimination

Universal elimination

If ∀x P(x) is true, then P(c) is true, where c is any constant in the domain of x

Variable symbol can be replaced by any ground term

Example:

∀x RegularlyAttends(x, lecture) → Pass(x, lecture)

Allow us to conclude (assuming Joe is in the domain of x):

RegularlyAttends(joe, lecture) → Pass(joe, lecture)
Universal and Existential elimination

Existential elimination

From ∃x P(x) infer P(c)

Note that the variable is replaced by a brandnew constant not occurring in this or any other sentence in the KB

Also known as skolemization; constant is a skolem constant

In other words, we don’t want to accidentally draw other inferences by introducing the constant

Convenient to use this to reason about the unknown object, rather than constantly manipulating the existential quantifier

Example:

∃x Pass(x)

Allows us to conclude:

Pass(someStudent) , where someStudent is a new constant
Existential Introduction

If P(c) is true, then ∃x P(x) is inferred.

The inverse of existential elimination

All instances of the given constant symbol are replaced by the new variable symbol

Note that the variable symbol cannot already exist anywhere in the expression

Example:

Eats(Ziggy, IceCream)

∃x Eats(Ziggy,x)
Generalized Modus Ponens (GMP)

Apply modus ponens reasoning to generalized rules

Combines UniversalElimination, and Modus Ponens

E.g, from P(c) and Q(c) and ∀ x (P(x) ∧ Q(x)) → R(x) derive R(c)

GMP requires substitutions for variable symbols

subst(θ, α) denotes the result of applying a set of substitutions defined by θ to the sentence α

A substitution list θ = {v_{1}/t_{1}, v_{2}/t_{2}, ..., v_{n}/t_{n}} means to replace all occurrences of variable symbol v_{i} by term t_{i}

Substitutions are made in lefttoright order in the list

subst ({x/IceCream, y/Ziggy}, eats(y,x)) = eats (Ziggy, IceCream)
Generalized Modus Ponens (GMP)
 General definition: Given
 atomic sentences P_{1}, P_{2}, ..., P_{N}
 implication sentence (Q_{1} ∧ Q_{2} ∧ ... ∧ Q_{N}) → R
 Q_{1}, ..., Q_{N} and R are atomic sentences
 substitution subst(θ, P_{i}) = subst(θ, Q_{i}) for i=1,...,N
 Derive new sentence: subst( θ , R)
 GMP is usually written formally as following:
Generalized Modus Ponens (GMP)  Example
 A clause is a disjunction of literals
 Example: C_{1} ∨ … ∨ C_{n}
 A definite clause is a clause containing exactly one positive literal
 Example: ¬C_{1} ∨ … ∨ ¬C_{n} ∨ H
 GMP is used with a collection of definite clauses
 Example:

Person(John) , Rich(x), ( Person(x) ∧ Rich(x) → Popular(x))

Popular(John)

With the substitution θ = {x/John,y/John} , and

Rich θ = Popular θ
Completeness & Decidability (1)

Completeness : If a knowledgebase KB entails a statement S , we can prove S

Gödel Completeness Theorem : There exists a complete proof system for FOL

KB ⊨ Q ↔ KB ⊢ Q

Gödel proved that there exists a complete proof system for FOL.

Gödel did not come up with such a concrete proof system.

Robinson’s Completeness Theorem : Resolution is such a concrete complete proof system for FOL
Completeness & Decidability (2)

FOL is only semidecidable : If a conclusion follows from premises, then a complete proof system (like resolution) will find a proof.

If there’s a proof, we’ll halt with it (eventually)

However, If there is no proof (i.e. a statement does not follow from a set of premises), the attempt to prove it may never halt

From a practical point of view this is problematic

We cannot distinguish between the nonexistence of a proof or the failure of an implementation to simply find a proof in reasonable time.

Theoretical completeness of an inference procedure does not make a difference in this cases

Does a proof simply take too long or will the computation never halt anyway?
ILLUSTRATION BY LARGER EXAMPLE
Knowledge engineering in FOL

Consider the following english text:
“Anyone passing his Intelligent System exam and winning the lottery is happy. But any student who studies for an exam or is lucky can pass all his exams. John did not study but John is lucky. Anyone who is lucky wins the lottery. Mary did not win the lottery, however Mary passed her IS exam. Gary won the lottery. Gary, John, and Mary are all students.„
 Is John happy?
 Is Mary happy? Is she lucky?
 Did every student pass the IS exam?
Formalization in FOL

Identify the task

Knowledge to be captured, questions to be answered


Assemble the relevant knowledge

Relavant: Relation between exams, lottery jackpots, passing of an exam information about individuals, etc.

Irrelevant: Dates of exams, teacher of the exam, amount of money in the lottery pot, etc


Decide on a vocabulary

Predicates

Constants symbols

E.g. Win(x, Lottery) or Win(x, y) ∧ Lottery(y)?

Formalization in FOL (cont’)

Encode general knowledge of the domain

Anyone passing the IS exams and winning the lottery is happy
∀x Pass(x, IS) ∧ Win(x, Lottery) ⇒ Happy(x)

Any student who studies or is lucky can pass all his exams
∀x ∀y Student(x) ∧ Exam(y) ∧ (StudiesFor(x, y) ∨ Lucky(x)) ⇒ Pass(x,y)

Anyone who is lucky wins the lottery
∀x Lucky(x) ⇒ Win(x, Lottery)

Formalization in FOL (cont’)

Encode the specific problem instance

John did not study, but John is lucky
¬Study(John) ∧ Lucky(John)

Mary did not win the lottery, however Mary passed her IS exam ¬ Win(Mary, Lottery)
Pass(Mary,IS)

Gary won the lottery
Win(Gary, Lottery)

Gary, John, and Mary are all students
Student(Gary), Student(John), Student(Mary)

Formalization in FOL (cont’)
6. Pose queries to the inference procedure
 Is John happy?
 Is Mary happy? Is she lucky?
 Did every student pass the IS exam?

Specific inference procedures (e.g. Resolution ) can be used to systematically answer those queries (see next lecture)
Happy(John) ?
Happy(Mary), Lucky(Mary)?
∃x Student(x) ∧ Pass(x, IS) ?
EXTENSIONS
Extensions

Ordinary firstorder interpretations have a single domain of discourse over which all quantifiers range; manysorted firstorder logic allows variables to have different sorts , which have different domains

The characteristic feature of firstorder logic is that individuals can be quantified, but not predicates; secondorder logic extends firstorder logic by adding the latter type of quantification
Extensions (cont’)

Intuitionistic firstorder logic uses intuitionistic rather than classical propositional calculus; for example, ¬¬φ need not be equivalent to φ

Infinitary logic allows infinitely long sentences; for example, one may allow a conjunction or disjunction of infinitely many formulas, or quantification over infinitely many variables

Firstorder modal logic has extra modal operators with meanings which can be characterised informally as, for example "it is necessary that φ" and "it is possible that φ“
SUMMARY
Summary

Predicate logic differentiates from propositional logic by its use of quantifiers

Each interpretation of predicate logic includes a domain of discourse over which the quantifiers range.

There are many deductive systems for predicate logic that are sound (only deriving correct results) and complete (able to derive any logically valid implication)

The logical consequence relation in predicate logic is only semidecidable

This lecture focused on three core aspects of the predicate logic: Syntax, Semantics, and Inference
REFERENCES
References

Mandatory Reading:

M. Fitting: FirstOrder Logic and Automated Theorem Proving, 1996 SpringerVerlag New York (Chapter 5)

Further Reading:

Michael Huth and Mark Ryan, Logic in Computer Science(second edition) , 2004, Cambridge University Press

http://plato.stanford.edu/entries/modeltheory/

Wikipedia links:

http://en.wikipedia.org/wiki/Firstorder_logic
Questions?
Intelligent Systems
REASONING
Agenda

Motivation

Technical Solution

Introduction to Theorem Proving and Resolution

Description Logics

Logic Programming

Summary

References
MOTIVATION
Motivation
 Basic results of mathematical logic show:
 We can do logical reasoning with a limited set of simple (computable) rules in restricted formal languages like Firstorder Logic (FOL)

Computers can do reasoning
 FOL is interesting for this purpose because:
 It is expressive enough to capture many foundational theorems of mathematics
 Many realworld problems can be formalized in FOL
 It is the most expressive logic that one can adequately approach with automated theorem proving techniques
Motivation
 Due to its theoretical properties (decidability & complexity) FirstOrder Logic is not always an ideal solution
 This motivates research towards formalisms with more practically oriented computitional properties and expressivity
 Description Logics
 Syntactic fragments of FOL
 Focus on decidability and optimized algorithms key reasoning tasks (terminological reasoning / schema reasoning)
 Logic Programming
 Provides different expressivity than classical FOL (nonmonotonic reasoning with nonclassical negation)
 Provides a very intuitive way to model knowledge
 Efficient reasoning procedures for large datasets
TECHNICAL SOLUTIONS
Theorem Proving and Resolution, Description Logics, and Logic Programming
THEOREM PROVING
Introduction to Theorem Proving and Resolution
Introduction – Basic Notions
 A proof system is collection of inference rules of the form:


P _{ 1 } … P _{ n }
____________
C

where C is a conclusion sequent, and P _{ i } ‘s are premises sequents .
 If an infererence rule does not have any premises that are taken to be true (called an axiom ), its conclusion automatically holds.

Example:
Modus Ponens: From P, P → Q infer Q,
Universal instantiation: From (∀x)p(x) infer p(A)
 Theorems:
 Expressions that can be derived from the axioms and the rules of inference.
Resolution  Principle

Resolution is a refutation system .

To prove a statement we attempt to refute its negation

Basic idea: Given a consistent set of axioms KB and goal sentence Q , we want to show that KB ⊧ Q

This means that every interpretation I that satisfies KB also satisfies Q

Any interpretation I only satisfies either Q or ¬ Q, but not both

Therefore, if in fact KB ⊧ Q holds , an interpretation that satisfies KB , satisfies Q and does not satisfy ¬ Q

⇒
Hence KB
∪
{
¬
Q
}
is unsatisfiable, i.e., it is false under all interpretations
Resolution  Principle
 Resolution commonly requires sentences to be in clause normal form (also called conjunctive normal form or CNF)
 A clause is a disjunction of literals (implicitly universally quantified)
 E.g. : l_{1} ∨ … ∨ l_{n}
 A formula in clause normal form conjunction of a set of clauses
 E.g.: C_{1} ∧ … ∧ C_{n} = (l_{n} ∨ … ∨ l_{m}) ∧ … ∧ (l_{i} ∨ … ∨ l_{j})
 High level steps in a resolution proof:
 Put the premises or axioms into clause normal form (CNF)
 Add the negation of the to be proven statement, in clause form, to the set of axioms
 Resolve these clauses together, using the resolution rule , producing new clauses that logically follow from them
 Derive a contradiction by generating the empty clause.
 The substitutions used to produce the empty clause are those under which the opposite of the negated goal is true
Resolution  The Resolution Rule

The resolution rule is a single inference that produces a new clause implied by two clauses (called the parent clauses) containing complementary literals
 Two literals are said to be complements if one is the negation of the other (in the following, ai is taken to be the complement to bj).

The resulting clause contains all the literals that do not have complements. Formally:
 Where a_{1}, …, a_{n}, b_{1}, …, b_{m} are literals, and a_{i} is the complement to b_{j}
 The clause produced by the resolution rule is called the resolvent of the two input clauses.
 A literal and its negation produce a resolvent only if they unify under some substitution σ.
 σ Is then applied to the resolvent
Resolution  The Resolution Rule

When the two initial clauses contain more than one pair of complementary literals, the resolution rule can be applied (independently) for each such pair

However, only the pair of literals that are resolved upon can be removed: All other pairs of literals remain in the resolvent clause

Modus ponens (see last lecture) can be seen as a special case of resolution of a oneliteral clause and a twoliteral clause

Example: Given clauses

P(x, f(a)) v P(x, f(y)) v Q(y) and ¬P(z, f(a)) v ¬ Q(z)

P(x, f(a)) and ¬P(z, f(a)) unify with substitution σ = {x/z}

Therefore, we derive the resolvent clause (to which σ is applied):

P(z, f(y)) v Q(y) v ¬ Q(z)
Resolution – Complete Resolution Algorithm

Convert all the propositions of a knowledge base KB to clause normal form (CNF)

Negate Q (the to be proven statement) and convert the result to clause form. Add it to the set of clauses obtained in step 1.

Repeat until either a contradiction is found, no progress can be made, or a predetermined amount of effort has been expended:
a) Select two clauses. Call these the parent clauses.
b) Resolve them together using the resolution rule . The resolvent will be the disjunction of all the literals of both parent clauses with appropriate substitutions.
c) If the resolvent is the empty clause , then a contradiction has been found. If it is not, then add it to the set of clauses available to the procedure.
Resolution – Algorithm in Pseudocode
procedure resolutionrefutation(KB, Q)
;; KB a set of consistent, true sentences
;; Q is a goal sentence that we want to derive
;; return success if KB ⊢ Q, and failure otherwise
KB = union(KB, ¬Q)
while false not in KB do
pick 2 sentences, S1 and S2, in KB that contain
literals that unify
if none return "failure“
resolvent = resolutionrule(S1, S2)
KB = union(KB, resolvnt)
return "success"
Resolution  Converting to Clause Normal Form
 Resolution expects input in clause normal form
 Step 1: Eliminate the logical connectives → and ↔
 a ↔ b = (a → b) ∧ (b → a)
 a → b = ¬ a ∨ b
 Step 2: Reduce the scope of negation
 ¬ ( ¬ a) = a
 ¬ (a ∧ b) = ¬ a ∨ ¬ b
 ¬ (a ∨ b) = ¬ a ∧ ¬ b
 ¬ ( ∃ X) a(X) = ( ∀ X) ¬ a(X)
 ¬ ( ∀ X) b(X) = ( ∃ X) ¬ b(X)
Resolution  Converting to Clause Normal Form

Step 3: Standardize by renaming all variables so that variables bound by different quantifiers have unique names

( ∀ X) a(X) ∨ ( ∀ X) b(X) = ( ∀ X) a(X) ∨ ( ∀ Y) b(Y)

Step 4: Move all quantifiers to the left to obtain a prenex normal form

A formula is inn prenex normal form if it is written as a string of quantifiers followed by a quantifierfree part

Step 5: Eliminate existential quantifiers by using skolemization
Resolution  Converting to Clause Normal Form
 Step 5: Skolemization
 Skolem constant
 (∃X)(dog(X)) may be replaced by dog(fido) where the name fido is picked from the domain of definition of X to represent that individual X.
 Skolem function
 If the predicate has more than one argument and the existentially quantified variable is within the scope of universally quantified variables, the existential variable must be a function of those other variables.

(∀X)(∃Y)(mother(X,Y)) ⇒ (∀X)mother(X,m(X))
(∀X)(∀Y)(∃Z)(∀W)(foo (X,Y,Z,W))
⇒ (∀X)(∀Y)(∀W)(foo(X,Y,f(X,Y),w))
Resolution  Converting to Clause Normal Form
 Step 6: Drop all universal quantifiers
 Step 7: Convert the expression to the conjunctions of disjuncts

(a ∧ b) ∨ (c ∧ d)

= (a ∨ (c ∧ d)) ∧ (b ∨ (c ∧ d))

= (a ∨ c) ∧ (a ∨ d) ∧ (b ∨ c) ∧ (b ∨ d)
 step 8: Split each conjunction into a separate clauses, which are just a disjunction of negated and nonnegated predicates, called literals
 step 9: Rename so that no variable symbol appears in more than one clause.

(∀X)(a(X) ∧ b(X)) = (∀X)a(X) ∧ (∀ Y)b(Y)
Resolution  Complete Example
 (Nearly) Classical example: Prove “Fido will die.” from the statements
 “Fido is a dog.”
 “All dogs are animals.”
 “All animals will die.”
 Changing premises to predicates
 ∀ (x) (dog(X) → animal(X))
 dog(fido)
 Modus Ponens and {fido/X}
 animal(fido)
 ∀ (Y) (animal(Y) → die(Y))
 Modus Ponens and {fido/Y}
 die(fido)
Resolution – Complete Example
 Equivalent proof by Resolution
 Convert predicates to clause normal form


Predicate form

Clause form


1.
∀
(x) (dog(X)
→
animal(X))

¬
dog(X)
∨
animal(X)


2. dog(fido)

dog(fido)


3.
∀
(Y) (animal(Y)
→
die(Y))

¬
animal(Y)
∨
die(Y)

 Negate the conclusion

4.
¬
die(fido)
¬
die(fido)
Resolution  Complete Example
Resolution – Further Considerations
 Resolution can be used to establish that a given sentence is entailed by a set of axioms
 However, it cannot , in general, be used to generate all logical consequences of a set axioms

Also, the resolution cannot be used to prove that a given sentence is not entailed by a set of axioms
 Resolution defines a search space
 The decision which clauses will be resolved against which others define the operators in the space

A search method is required

Worst case: Resolution is exponential in the number of clauses to resolve
Resolution – Strategies

Order of clause combination is important

N clauses → N^{2} ways of combinations or checking to see whether they can be combined

Search heuristics are very important in resolution proof procedures

Strategies

BreadthFirst Strategy

Set of Support Strategy

Unit Preference Strategy

Linear Input Form Strategy
Concrete System: Prover9
 FirstOrder theorem prover
 Homepage: http://www.cs.unm.edu/~mccune/prover9/
 Successor of the well known “Otter” theorem prover
 Under active development, available for several platforms and released under the GPL
 Graphical userinterface and extensive documentation make Prover 9 comparatively user friendly
 Core of system builds on resolution / paramodulation as inference method
 Prover9 works in parallel with external component „Mace4“
 Mace4 searches for finite models and counterexamples
 This helps to avoid wasting time searching for a proof by first finding a counterexample
Concrete System: Prover9 Input
Concrete System: Prover9 Proof
Concrete System: Prover9 Options
Limitations of FirstOrderTheorem Proving
 Entailment: KB ⊧ Q
 Entailment is a relation that is concerned with the semantics of statements

Q is entailed by KB (a set of premises or assumptions) if and only if there is no logically possible world in which Q is false while all the premises in KB are true
 Entailment for FOL is only semidecidable : If a conclusion follows from premises, then a complete proof system (like resolution) will find a proof.
 If there’s a proof, we’ll halt with it (eventually)
 However, If there is no proof (i.e. a statement does not follow from a set of premises), the attempt to prove it may never halt
Limitations of FirstOrderTheorem Proving
 From a practical point of view this is problematic
 We cannot distinguish between the nonexistence of a proof or the failure of an implementation to simply find a proof in reasonable time.
 Theoretical completeness of an inference procedure does not make a difference in this cases

Does a proof simply take too long or will the computation never halt anyway?

Due to its complexity and remaining limitations FOL is often not suitable for practical applications
 Often less expressive (but decideable) formalisms or formalisms with different expressivity are more suitable:
 Description Logics

Logic Programming
DESCRIPTION LOGICS
Description Logic
 Most Description Logics are based on a 2variable fragment of First Order Logic
 Classes (concepts) correspond to unary predicates
 Properties correspond to binary predicates
 Restrictions in general:
 Quantifiers range over no more than 2 variables
 Transitive properties are an exception to this rule
 No function symbols (decidability!)
 Most DLs are decidable and usually have decision procedures for key reasoning tasks
 DLs have more efficient decision problems than First Order Logic
 We later show the very basic DL ALC as example
 More complex DLs work in the same basic way but have different expressivity
Description Logic Syntax

Concepts/classes (unary predicates/formulae with one free variable)

E.g. Person, Female

Roles (binary predicates/formulae with two free variables)

E.g. hasChild

Individuals (constants)

E.g. Mary, John

Constructors allow to form more complex concepts/roles

Union ⊔: Man ⊔ Woman

Intersection ⊓ : Doctor ⊓ Mother

Existential restriction ∃: ∃hasChild.Doctor (some child is a doctor)

Value(universal) restriction ∀: ∀hasChild.Doctor (all children are doctors)

Complement /negation¬: Man ⊑ ¬Mother

Number restriction ≥n, ≤n

Axioms

Subsumption ⊑ : Mother ⊑ Parent
Description Logic Syntax  Concepts

Classes/concepts are actually a set of individuals

We can distinguish different types of concepts:

Atomic concepts: Cannot be further decomposed (i.e. Person)

Incomplete concepts (defined by ⊑)

Complete concepts (defined by ≡)

Example incomplete concept defintion:

Man ⊑ Person ⊓ Male

Intended meaning: If an individual is a man, we can conclude that it is a person and male.

Man(x) ⇒ Person(x) ∧ Male(x)

Example complete concept definition:

Man ≡ Person ⊓ Male

Intended meaning: Every individual which is a male person is a man, and every man is a male person.

Man(x) ⇔ Person(x) ∧ Male(x)
Description Logic Syntax Roles
 Roles relate to individuals to each other

I.e. directedBy(Pool Sharks, Edwin Middleton), hasChild(Jonny, Sue)

Roles have a domain and a range
 Example:
 Domain(directedBy, Movie)
 Range(directedBy, Person)

Given the above definitions we can conclude that Pool Sharks is a move and that Edwin Middleton is (was) a person.
 Functional Roles
 Roles which have exactly one value
 Usually used with primitive datavalues

A special case of (unqualified) number restriction ≤1 R
Description Logic Syntax Roles
 Transitive Roles

Example: hasAncestor
Simple in a rule language: hasAncestor(X,Z) : hasAncestor(X,Y), hasAncestor(Y,Z).
 Requires more than one variable!

Transitivity can be captured in DLs by role hierarchies and transitive roles:
 Symmetric Roles
 Roles which hold in both directions
 I.e. hasSpouse, hasSibling
 Inverse Roles
 Roles are directed, but each role can have an inverse

I.e. hasParent ≡ hasChild
hasChild(X,Y) ⇔ hasChild(Y,X)
Description Logic Knowledge Bases
 Typically a DL knowledge base (KB) consists of two components
 Tbox (terminology): A set of inclusion/equivalence axioms denoting the conceptual schema/vocabulary of a domain
 Bear ⊑ Animal ⊓ Large
 transitive(hasAncestor)

hasChild ≡ hasParent
 Abox (assertions): Axioms, which describe concrete instance data and holds assertions about individuals
 hasAncestor(Susan, Granny)

Bear(Winni Puh)

From a theoretical point of view this division is arbitrary but it is a useful simplification
A basic Description Logic  ALC
 Smallest propositionally closed DL is ALC
 Only atomic roles
 Concept constructors: ⊔ , ⊓, ¬

Restricted use of quantifiers: ∃, ∀
 “Propositionally closed” Logic in general:

Provides (implicitly or explicitly) conjunction, union and negation of class descriptions
 Example:

Person ⊓ ∀hasChild.(Doctor ⊔ ∃hasChild.Doctor)
A basic Description Logic  ALC
 What can we express in ALC?
 ALC concept descriptions can be constructed as following:
A basic Description Logic  ALC
 Individual assertions :
 a ∈ C

Mary is a Woman.
 Role assertions:
 〈a, b〉 ∈ R

E.g. Marry loves Peter.
 Axioms:
 C ⊑ D
 C ≡ D, because C ≡ D ⇔ C ⊑ D and D ⊑ C

E.g.: A Dog is an animal. A man is a male Person.
The Description Logic Family
 Description Logics are actually a family of related logics
 Difference in expressivity and features, as well as complexity of inference
 Description Logics follow a naming schema according to their features
 ALC = Attributive Language with Complements
 S often used for ALC extended with transitive roles
 Additional letters indicate other extensions, e.g.:
 H for role hierarchy
 O for nominals, singleton classes
 I for inverse roles (e.g., isChildOf ≡ hasChild–)
 N for number restrictions
 Q for qualified number restrictions
 F for functional properties
 R for limited complex role inclusion axioms, role disjointness
 (D) for datatype support
Description Logic  Semantics

Semantics follow standard FOL model theory

Description Logics are a fragment of FOL

The vocabulary is the set of names (concepts and roles) used

I.e. Mother, Father, Person, knows, isRelatedTo, hasChild, …

An interpretation I is a tuple (∆^{I}, ⋅^{I})

∆^{I} is the domain (a set)

⋅^{I} is a mapping that maps:

Names of objects (individuals) to elements of the domain

Names of unary predicates (classes/concepts) to subsets of the domain

Names of binary predicates (properties/roles) to subsets of ∆^{I} x ∆^{I}
Description Logic  Semantics

The semantics of DL are based on standard First Order Model theory

A translation is usually very straightforward, according to the following correspondences (for ALC):

A description is translated to a firstorder formula with one free variable

An individual assertion is translated to a ground atomic formula

An axiom is translated to an implication, closed under universal implication

More complex DLs can be handled in a similar way
Description Logic  Semantics
 Mapping ALC to First Order Logic:
Description Logic  Reasoning

Main reasoning tasks for DL systems:

Satisfiability : Check if the assertions in a KB have a model

Instance checking : Check if an instance belongs to a certain concept

Concept satisfiability : Check if the definition of a concept can be satisfied

Subsumption : Check if a concept A subsumes a concept B (if every individual of a concept B is also of concept A)

Equivalence : A ≡ B ⇔ B ⊑ A and A ⊑ B

Retrieval : Retrieve a set of instances that belong to a certain concept
Description Logic  Reasoning

Reasoning Task are typically reduced to KB satisfiability sat(A) w.r.t. to a knowledge base A

Instance checking : instance(a,C, A) ⇔¬sat(A ⋃ {a: ¬ C})

Concept satisfiability : csat(C) ⇔ sat(A ⋃ {a: ¬ C})

Concept subsumption : B ⊑ A ⇔ A ⋃ {¬B ⊓ C} is not satisfiable ⇔ ¬sat(A ⋃ {¬B ⊓ C})

Retrieval : Instance checking for each instance in the Abox

Note: Reduction of reasoning tasks to one another in polynomial time only in propositionally closed logics

DL reasoners typically employ tableaux algorithms to check satisfiability of a knowledge base
Concrete System: Pellet
 Description Logic reasoner
 Homepage: http://clarkparsia.com/pellet
 Written in Java and available from Duallicensed AGPL license for opensource applications
 Proprietary license available for commercial applications
 Sound and complete reasoner aimed at OWLDL inference based on tableaux procedure
 Covers all constructs in OWLDL
 Supporting major part of OWL 2 specification
 Integrates with popular toolkits and editors
 E.g. Jena, Protege, TopBraid Composer
 Comprehensive handson tutorial available:
 http://clarkparsia.com/pellet/tutorial/iswc09
Concrete System: Pellet

Pellet supports expected standard DL reasoning tasks

Consistency, classification, realization

Conjunctive query answering

Concept satisfiability

Additionally Pellet supports:

SPARQLDL queries

Datatype reasoning

Userdefined datatypes

Nary datatype predicates

Rules support (DLsafe SWRL rules)

Explanation and debugging features

Axiom pinpointing service
Concrete System: Pellet

Explanation & Debugging support

Motivation: It is hard to understand large and/or complex ontologies

Examples:

Why is a specific subclass relation inferred?

Why is an ontology inconsistent?

Pellet provides axiom pinpointing service:

For any inference, returns the (minimal set of) source axioms that cause the inference

Applications can track additional provenance information

Axiom pinpointing is the first step to explanations

Precise justifications (pinpoint parts of axioms) are ongoing work in Pellet‘s developement
Concrete System: Pellet
 Pellet is integrated in OwlSight and performs classification of loaded ontologies
 Lightweight , webbased ontology browser: http://pellet.owldl.com/owlsight/
Concrete System: Pellet
 Classification Results in the Protege Editor
LOGIC PROGRAMMING
Logic Programming
 What is Logic Programming?
 Various different perspectives and definitions possible:
 Computations as deduction
 Use formal logic to express data and programs
 Theorem Proving
 Logic programs evaluated by a theorem prover
 Derivation of answer from a set of initial axioms
 High level (nonprecedural) programming language
 Logic programs do not specifcy control flow
 Instead of specifying how something should be computed, one states what should be computed
 Procedural interpretation of a declarative specification of a problem
 A LP systems procedurally interprets (in some way) a general declarative statement which only defines truth conditions that should hold
Logic Programming Basics

Logic Programming is based on a subset of First Order Logic called Horn Logic

Horn Logic can serve as a simple KR formalism and allows to express

IF <condition> THEN <result> rules

Under certain restrictions reasoning over knowledge bases based on such rules is decideable (in contrast to general ATP within First Order Logic)
Logic Programming Basics – Horn Logic
 Syntactically a LP rule is a First Order Logic Horn Clause
 The semantics of LP are different form the standard Tarski style FOL semantics (details later)
 Herbrand semantics differ from FOL smeantics in the structures it considers to be models (details later)

→ Minimal model semantics
 A FOL Horn clause is a disjunction of literals with one positive literal, with all variables universally quantified:
 (∀) ¬C_{1} ∨ … ∨ ¬C_{n} ∨ H
 This can be rewritten to closer correspond to a rulelike form:
 (∀) C_{1} ∧ … ∧ C_{n} → H
 In LP systems usually the following (non First Order Logic) syntax is used:
 H : C_{1},...,C_{n}
 Such rules can be evaluated very efficiently , since the resolution of two Horn clauses is a again a Horn clause
Logic Programming – Syntax
 The LP vocabulary consists of:
 Constants: b, cow, “somestring”
 Predicates: p, loves
 Function symbols: f, fatherOf
 Variables: x, y
 Terms can be:
 Constants
 Variables
 Constructed terms (i.e. function symbol with arguments)
 Examples:
 cow, b, Jonny,
 loves(John)
Logic Programming – Syntax
 From terms and predicates we can build atoms:
 For nary predicate symbol p and terms t_{1}, ..., t_{n}, p(t_{1}, ..., t_{n}) is an atom
 A ground atom is an atom without variables
 Examples:
 p(x)
 loves(Jonny, Mary), worksAt(Jonny, SomeCompany)
 worksAt(loves(Mary), SomeCompany)
 Literals
 A literal is a an atom or its negation
 A positive literal is an atom
 A negative literal is a negated atom
 A ground literals is a literal without variables
Logic Programming – Syntax
 Rules
 Given a rule of the form H : B_{1},...,B_{n} we call
 H the head of the rule (its consequent)
 B_{1} …. B_{n} the body of the rule (the antecedent or conditions)
 The head of the rule consists of one positive literal H
 The body of the rule consists of a number of literals B_{1}, ..., B_{n}
 B_{1}, …, B_{n} are also called subgoals
 Examples:
 parent(x) : hasChild(x,y)
 father(x) : parent(x), male(x)
 hasAunt(z,y) : hasSister(x,y), hasChild(x,z)
Logic Programming – Syntax
 Facts denote assertions about the world:
 A rule without a body (no conditions)
 A ground atom
 Examples:
 hasChild(Jonny, Sue)
 Male(Jonny)).
 Queries allow to ask questions about the knowledge base:
 Denoted as a rule without a head:
 ? B_{1},...,B_{n}.
 Examples:
 ?  hasSister(Jonny,y), hasChild(Jonny , z) gives all the sisters and children of Jonny
 ?  hasAunt(Mary,y) gives all the aunts of Mary
 ? father(Jonny) ansers if Jonny is a father
Logic Programming – Semantics

There are two main approaches to define the semantics of LP
 Model theoretic semantics
 Computional semanitcs
 Modeltheoretic semantics
 Defines the meaning of a model in terms of its minimal Herbrand model
 Computational semantics (proof theoretic semantics)
 Define the semantics in terms of an evaluation strategy which describes how to compute a model
 These two semantics are different in style, but agree on the minimal model
 LP semantics is only equivalent to standard FOL semantics
 Concerning ground entailment
 As long as LP is not extended with negation
Logic Programming – Semantics
 Semantics of LP vs Semantics of FOL
 Semantics LP defined in terms of minimal Herbrand model
 Only one minimal model
 Semantics FOL defined in terms of FirstOrder models
 Typically, infinitely many FirstOrder models
 The minimal Herbrand model is a FirstOrder model
 Every Herbrand model is a FirstOrder model
 There exist FirstOrder models which are not Herbrand models
 Example:
 p(a), p(x) → q(x)
 The Minimal Herbrand model: {p(a), q(a)}
 This is actually the only Herbrand model!
 FirstOrder models: {p(a), q(a)}, {p(a), q(a), p(b)}, etc.
Logic Programming – Semantics
 Recall:
 Terms not containing any variables are ground terms
 Atoms not containing any variables are ground atoms
 The Herbrand Universe U is the set of all ground terms which can be formed from
 Constancts in a program
 Function symbols in a program
 Example: a, b, c, f(a)
 The Herbrand Base B is the set of all ground atoms which can be built from
 Predicate symbols in a program
 Ground terms from U
 Example: p(a), q(b), q(f(a))
Logic Programming – Semantics

A Herbrand Interpretation I is a subset of the Herbrand Base B for a program

The domain of a Herbrand interpretation is the Herbrand Universe U

Constants are assigned to themselves

Every function symbol is interpreted as the function that applies it

If f is an nary function symbol (n>0) then the mapping from U^{n} to U defined by (t_{1}, …, t_{n}) → f(t_{1}, …, t_{n}) is assigned to f
Logic Programming – Semantics
 A Herbrand Model M is a Herbrand Interpretation which makes every formula true, so:
 Every fact from the program is in M
 For every rule in the program: If every positive literal in the body is in M, then the literal in the head is also in M
 The model of a Logic Program P is the minimal Herbrand Model
 This least Herbrand Model is the intersection of all Herbrand Models
 Every program has a Herbrand Model. Thus every model also has a minimal Herbrand Model.
 This model is uniquely defined only for programs without negation

⇒ A very intuitive and easy way to capture the semantics of LP
 As soon as negation is allowed a unique minimal model is not guaranteed anymore
Logic Programming  Negation
 How do we handle negation in Logic Programs?
 Horn Logic only permits negation in limited form
 Consider (∀) ¬C_{1} ∨ … ∨ ¬C_{n} ∨ H
 Special solution: Negationasfailure (NAF):
 Whenever a fact is not entailed by the knowledge base, its negation is entailed
 This is a form of “Default reasoning”
 This introduces nonmonotonic behavior (previous conclusions might need to be revised during the inference process)
 NAF is not classical negation and pushes LP beyond classical First Order Logic
 This allows a form of negation in rules:
 (∀) C_{1} ∧ … ∧ C_{i} ∧ not C_{n} → H
 H : B_{1}, … B_{i}, not B_{n}
Logic Programming  Negation

Logic Programs can also contain recursion
 Example:
 ancestor (x,y) : hasParent(x, y)

ancestor(x,z) : ancestor(x,y), ancestor(y,z).

This is a problem as soon as negation is allowed since the minimal model is not uniquely defined anymore
 It is useful to consider this using a dependency graph
 A predicate is a node in the graph
 There is a directed edge between predicates q and p if they occur in a rule where q occurs in the head and p in the body.

If the dependency graph contains a cycle then the program is recursive
Logic Programming  Negation
 As soon as negation is allowed, cycles in a dependency graph become problematic.

Example: What is the meaning of win(x) : not win(x) ?
 A Solution: Stratification
 Mark edges with negation in the dependency graph
 Separate predicates which are connected through a positive edge in a individual stratum
 Strata can be (partially) ordered
 If each predicate occurs only in one stratum, then the program is called stratifiable
 Each stratum can be evaluated as usual and independently from other strata
→ This guarantees a unique interpretation of a Logic Program using negation
Logic Programming  Subsets
 Classical Logic Programming
 Allows function symbols
 Does not allow negation

Is turing complete
 Full Logic Programming is not decideable

Prolog programs are not guaranteed to terminate
 Several ways to guarantee the evaluation of a Logic Program
 One is to enforce syntactical restrictions
 This results in subsets of full logic programming

Datalog is such a subset
Logic Programming  Datalog
 Datalog is a syntactic subset of Prolog

Originally a rule and query language for deductive databases
 Considers knowledge bases to have two parts

Extensional Database (EDB) consists of facts

Intentional Database(IDB) consists of nonground rules

Restrictions:

Datalog disallows function symbols

Imposes stratification restrictions on the use of recursion + negation

Allows only range restricted variables ( safe variables )

 Safe Variables:
 Only allows range restricted variables, i.e. each variable in the conclusion of a rule must also appear in a not negated clause in the premise of this rule.

This limits evaluation of variables to finitely many possible bindings
Logic Programming  Reasoning Tasks
 The typical reasoning task for LP systems is query answering
 Ground queries, i.e. ? loves(Mary, Joe)

Nonground query, i.e. ? loves(Mary, x)
 Nonground queries can be reduced to a series of ground queries
 ? loves(Mary, x)

Replace x by every possible value
 In Logic Programming ground queries are equivalent to entailment of facts
 Answering ? loves(Mary, Joe) w.r.t. a knowledge base A is equivalent to checking
A ⊧ loves(Mary, Joe)
Concrete Logic Programming System: IRIS
 Java based Datalog reasoner
 Developed at STI Innsbruck
 Freely available open source project

Homepage: http://www.irisreasoner.org/
 Extensions:
 Stratified / Wellfounded default negation
 XML Schema data types

Various builtin predicates (Equality, inequality, assignment, unification, comparison, type checking, arithmetic, regular expressions,... )
 Highly modular and includes different reasoning strategies
 Bottomup evaluation with Magic Sets optimizations (forwardchaining)

Topdown evaluation by SLDNF resolution (backwardchaining)
Concrete Logic Programming System: IRIS




Concrete Logic Programming System: IRIS
 Example program (also see online demo at http://www.irisreasoner.org/demo):

man('homer').

woman('marge').

hasSon('homer','bart').

isMale(?x) : man(?x).

isFemale(?x) : woman(?x).

isMale(?y) : hasSon(?x,?y).
 Query:

?isMale(?x).
 Output:

Init time: 14ms



Query: ? isMale(?x). ==>> 2 rows in 1ms

Variables: ?x

('bart')

('homer')
Concrete Logic Programming System: IRIS

IRIS performing computations using seminaiv evaluation in combination with stratification and complex datatypes
SUMMARY
Theorem Proving Summary

Resolution is a sound and complete inference procedure for FirstOrder Logic

Due to its complexity and remaining limitations FOL is often not suitable for practical applications

Often formalisms expressivity and complexity results are more appropriate:

Description Logics

Logic Programming
Description Logic Summary
 Description Logics are a syntactic fragment of First Order Logic, based on basic building blocks
 Concepts
 Roles

Individuals
 Limited constructs for building complex concepts, roles, etc.

Many different Description Logics exist, depending on choice of constructs
 Inference in Description Logics focuses on consistency checking and classification
 Main reasoning task: Subsumption

Usually reasoning tasks in DLs can all be reduced to satisfiablity checking
 Efficient Tbox (schema) reasoning
 ABox reasoning (query answering) do not scale so well
Logic Programming Summary
 Logic Programming (without negation) is syntactically equivalent to Horn subset of First Order Logic

The semantics of a Logic Program are however based on its minimal Herbrand Model
 Logic Programming comes in various variants for different applications (as programming language, for knowledge representation)
 Full Logic Programming including function symbols is not decidable
 Datalog is a syntactic restriction of LP, with desirable computational properties

Negationasfailure introduced nonmonotonic behavior and pushes LP beyond the expressivity of First Order Logic

A typical inference task for LP engines is conjunctive query answering
REFERENCES
References and Further Information
 Mandatory Reading:
 Schöning, U., Logic for Computer Scientists (2nd edition), 2008, Birkhäuser
 Chapter 1 & 2:Normal Forms, Resolution
 Chapter 3: Horn Logic, Logic Programming
 Baader, F. et al., The Description Logic Handbook, 2007, Cambridge University Press
 Chapter 2: Basic Description Logics
 Further Reading:
 Lloyd, J.W. , Foundations of logic programming, 1984, Springer
 Robinson, A. and Voronkov, A. Handbook of Automated Reasoning, Volume I, 2001, MIT Press
 Chapter 2: Resolution Theorem Proving
 Ullman, J. D., Principles of Database and KnowledgeBase Systems, Volume I, 1988, Computer Science Press
 Chapter 3: Logic as a Data Model (Logic Programming & Datalog)
 Wikipedia links:
 http://en.wikipedia.org/wiki/Logic_programming
 http://en.wikipedia.org/wiki/Description_logic
 http://en.wikipedia.org/wiki/Theorem_proving
Questions?
Intelligent Systems
Search Methods
Outline
 Motivation
 Technical Solution
 Uninformed Search
 DepthFirst Search
 BreadthFirst Search
 Informed Search
 BestFirst Search
 Hill Climbing
 A*
 Illustration by a Larger Example
 Extensions
 Summary
MOTIVATION
Problem Solving as Search
Motivation
 One of the major goals of AI is to help humans in solving complex tasks
 How can I fill my container with pallets?
 Which is the shortest way from Milan to Innsbruck?
 Which is the fastest way from Milan to Innsbruck?
 How can I optimize the load of my freight to maximize my revenue?
 How can I solve my Sudoku game?
 What is the sequence of actions I should apply to win a game?
 Sometimes finding a solution is not enough, you want the optimal solution according to some “cost” criteria
 All the example presented above involve looking for a plan
 A plan that can be defined as the set of operations to be performed of an initial state, to reach a final state that is considered the goal state
 Thus we need efficient techniques to search for paths, or sequences of actions, that can enable us to reach the goal state, i.e. to find a plan
 Such techniques are commonly called Search Methods
Examples of Problems: Towers of Hanoi

Move disk to peg
to the initial state ((123)()()) a new state is reached ((23)()(1))

Examples of Problems: Blocksworld


TECHNICAL SOLUTION
Search Methods
Search Space Representation




Search Space Representation

Example: Towers of Hanoi*
* A partial tree search space representation
Example: Towers of Hanoi*
* A complete direct graph representation
[http://en.wikipedia.org/wiki/Tower_of_Hanoi]
Search Methods
 A search method is defined by picking the order of node expansion
 Strategies are evaluated along the following dimensions:
 completeness: does it always find a solution if one exists?
 time complexity: number of nodes generated
 space complexity: maximum number of nodes in memory
 optimality: does it always find the shortest path solution?
 Time and space complexity are measured in terms of
 b: maximum branching factor of the search tree
 d: depth of the shortest path solution
 m: maximum depth of the state space (may be ∞)
Search Methods
 Uninformed techniques
 Systematically search complete graph, unguided
 Also known as brute force, naïve, or blind
 Informed methods
 Use problem specific information to guide search in promising directions
UNINFORMED SEARCH
Brute force approach to explore search space
Uninformed Search
 A class of general purpose algorithms that operates in a brute force way
 The search space is explored without leveraging on any information on the problem
 Also called blind search, or naïve search
 Since the methods are generic they are intrinsically inefficient
 E.g. Random Search
 This method selects randomly a new state from the current one
 If the goal state is reached, the search terminates
 Otherwise the methods randomly select an other operator to move to the next state
 Prominent methods:
 DepthFirst Search
 BreadthFirst Search
 UniformCost Search
DepthFirst Search
 DepthFirst Search (DFS) begins at the root node and exhaustively search each branch to it maximum depth till a solution is found
 The successor node is selected going in depth using from right to left (w.r.t. graph representing the search space)
 If greatest depth is reach with not solution, we backtrack till we find an unexplored branch to follow
 DFS is not complete
 If cycles are presented in the graph, DFS will follow these cycles indefinitively
 If there are no cycles, the algorithm is complete
 Cycles effects can be limited by imposing a maximal depth of search (still the algorithm is incomplete)
 DFS is not optimal
 The first solution is found and not the shortest path to a solution
 The algorithm can be implemented with a Last In First Out (LIFO) stack or recursion
DepthFirst Search: Algorithm
List open, closed, successors={};
Node root_node, current_node;
insertfirst(
root_node,open)
while notempty (open );

current_node= removefirst(open);
insertfirst ( current_node,closed);
if ( goal (current_node)) return current_node;
else

successors=
successorsOf
(current_node);
for(x in successors)
 if(
notin
(x,closed))
insertfirst
(x,open);
N.B.= this version is not saving the path for simplicity
DepthFirst Search: Example
DepthFirst Search: Example
Result is: S>A>B>F
DepthFirst Search: Complexity
=1 + b + b^{2}+ ......... + b^{m} = O (b ^{m} )



BreadthFirst Search
 BreadthFirst Search (BFS) begins at the root node and explore levelwise al the branches

BFS is complete
 If there is a solution, BFS will found it

BFS is optimal
 The solution found is guaranteed to be the shortest path possible
 The algorithm can be implemented with a First In First Out (FIFO) stack
BreadthFirst Search: Algorithm
List open, closed, successors={};
Node root_node, current_node;
insertlast(
root_node,open)
while notempty (open );

current_node= removefirst(open);
insertlast ( current_node,closed);
if ( goal (current_node)) return current_node;
else

successors=
successorsOf
(current_node);
for(x in successors)
 if(
notin
(x,closed))
insertlast
(x,open);
N.B.= this version is not saving the path for simplicity
BreadthFirst Search: Example
BreadthFirst Search: Example
Result is: S>A>F
BreadthFirst Search: Complexity
 Time complexity is the same magnitude as DFS
 O (b ^{ m } )
 where m is the depth of the solution
 Space Complexity
 how many nodes can be in the queue (worstcase)?
 assume (worst case) that there is 1 goal leaf at the RHS

so BFS will store all nodes
=1 + b + b ^{ 2 } + ......... + b ^{ m }
= O (b ^{ m } )
Further Uninformed Search Strategies

Depthlimited search (DLS) : Impose a cutoff (e.g. n for searching a path of length n 1), expand nodes with max. depth first until cutoff depth is reached (LIFO strategy, since it is a variation of depthfirst search).

Bidirectional search (BIDI) : forward search from initial state & backward search from goal state, stop when the two searches meet. Average effort O(b ^{d/2} ) if testing whether the search fronts intersect has constant effort

In AI, the problem graph is typically not known. If the graph is known, to find all optimal paths in a graph with labelled arcs, standard graph algorithms can be used
INFORMED SEARCH
Using knowledge on the search space to reduce search costs
Informed Search

Blind search methods take O(b^{m}) in the worst case

May make blind search algorithms prohibitively slow where d is large
 How can we reduce the running time?

Use problemspecific knowledge to pick which states are better candidates
Informed Search

Also called heuristic search

In a heuristic search each state is assigned a “heuristic value” (hvalue) that the search uses in selecting the “best” next step

A heuristic is an operationallyeffective nugget of information on how to direct search in a problem space

Heuristics are only approximately correct
Informed Search: Prominent methods

BestFirst Search

A*

Hill Climbing
Cost and Cost Estimation
f(n)=g(n)+h(n)

g(n) the cost (so far) to reach the node n

h(n) estimated cost to get from the node to the goal

f(n) estimated total cost of path through n to goal
Informed Search: BestFirst Search
 Special case of breadthfirst search
 Uses h(n) = heuristic function as its evaluation function
 Ignores cost so far to get to that node (g(n))
 Expand the node that appears closest to goal
 Best First Search is complete
 Best First Search is not optimal
 A solution can be found in a longer path (higher h(n) with a lower g(n) value)
 Special cases:
 uniform cost search: f(n) = g(n) = path to n
 A* search
BestFirst Search: Algorithm
List open, closed, successors={};
Node root_node, current_node;
insertlast(
root_node,open)
while notempty (open );

current_node= removefirst(open);
insertlast ( current_node,closed);
if ( goal (current_node)) return current_node;
else

successors=
estimationOrderedSuccessorsOf
(current_node);
for(x in successors)
 if(
notin
(x,closed))
insertlast
(x,open);
estimationOrderedSuccessorsOf
returns the list of direct
descendants of the current
node in shortest cost order
N.B.= this version is not saving the path for simplicity
BestFirst Search: Example
In this case we estimate the cost as the distance from the root node (in term of nodes)
BestFirst Search: Example

Result is: S>A>F!

If we consider real costs, optimal solution is: S>B>F
A*

Derived from BestFirst Search

Uses both g(n) and h(n)

A* is optimal

A* is complete
A* : Algorithm
List open, closed, successors={};
Node root_node, current_node, goal;
insertback(
root_node,open)
while notempty (open );

current_node= removefront(open);
insertback ( current_node,closed);
if (current_node==goal) return current_node;
else

successors=
totalEstOrderedSuccessorsOf
(current_node);
for(x in successors)
 if(
notin
(x,closed))
insertback
(x,open);
totalEstOrderedSuccessorsOf
returns the list of direct
descendants of the current
node in shortest total
estimation order
N.B.= this version is not saving the path for simplicity
A* : Example
In this case we estimate the cost as the distance from the root node (in term of nodes)
A* : Example
Result is: S>B>F!
Hill Climbing
 Special case of depthfirst search
 Uses h(n) = heuristic function as its evaluation function
 Ignores cost so far to get to that node (g(n))
 Expand the node that appears closest to goal
 Hill Climbing is not complete
 Unless we introduce backtracking
 Hill Climbing is not optimal
 Solution found is a local optimum
Hill Climbing: Algorithm

List successors={}; Node root_node, current_node, nextNode;

current_node=root_node
while (current_node!=null)
 if (
goal
(current_node)) return current_node;
else
 successors=successorsOf(current_node);
nextEval = ∞; nextNode=null;
for(x in successors)
 if(
eval
(x)
>
nextEval)
 nexEval=eval(x);
nextNode=x;
N.B.= this version is not saving the path for simplicity
Hill Climbing: Example
In this case we estimate the cost as the distance from the root node (in term of nodes)
Hill Climbing: Example
Result is: S>A>B>F!
Not optimal, more if at step 1 h(S)=2 we would have completed without funding a solution
Informed Search Algorithm Comparison
Algorithm

Time

Space

Optimal

Complete

Derivative


Best First Search

O(bm)

O(bm)

No

Yes

BFS


Hill Climbing

O(∞)

O(b)

No

No



A*

O(2^{N})

O(b^{d})

Yes

Yes

Best First Search


b
, branching factor
d , tree depth of the solution
m , maximum tree depth
ILLUSTRATION BY A LARGER EXAMPLE
Route Search

Graph Representation

DepthFirst Search
N.B.: by building the tree, we are exploring the search space!
BreadthFirst search
N.B.: by building the tree, we are exploring the search space!
DepthFirst Search vs BreadthFirst search
 Distance
 DFS: 464 km
 BFS: 358 km
 Q1: Can we use an algorithm to optimize according to distance?
 Time
 DFS: 4 hours 37 mins
 BFS: 5 hours 18 mins
 Q2: Can we use an algorithm to optimize according to time?
 Search space:
 DFS: 5 expansions
 BFS: 26 expansions
 Not very relevant… depends a lot on how you pick the order of node expansion, never the less BFS is usually more expensive
 To solve Q1 and Q2 we can apply for example and BestFirst Search
 Q1: the heuristic maybe the air distance between cities
 Q2: the heuristic maybe the air distance between cities x average speed (e.g. 90km/h)
Graph Representation with approximate distance
BestFirst search
N.B.: by building the tree, we are exploring the search space!
EXTENSIONS
Variants to presented algorithms
 Combine Depth First Search and Breadth First Search, by performing Depth Limited Search with increased depths until a goal is found
 Enrich Hill Climbing with random restart to hinder the local maximum and foothill problems
 Stochastic Beam Search: select w nodes randomly; nodes with higher values have a higher probability of selection
 Genetic Algorithms: generate nodes like in stochastic beam search, but from two parents rather than from one
SUMMARY
Summary
 Uninformed Search
 If the branching factor is small, BFS is the best solution
 If the tree is depth IDS is a good choice
 Informed Search
 Heuristic function selection determines the efficiency of the algorithm
 If actual cost is very expensive to be computed, then Best First Search is a good solution
 Hill climbing tends to stack in local optimal solutions
REFERENCES
References
 Mandatory reading:
 Chap. 4: Görz et. al. (eds.): Handbuch der Künstlichen Intelligenz , 2000
 Further reading and references:
 Chap. 3: Russell, Stuart J.; Norvig, Peter: Artificial Intelligence: A Modern Approach (2nd ed.), 2003
 Chap.23: M. Tim Jones: Artificial Intelligence: A Systems Approach
 Wikipedia links:
 http://en.wikipedia.org/wiki/Depthfirst_search
 http://en.wikipedia.org/wiki/Breadthfirst_search
 http://en.wikipedia.org/wiki/Bestfirst_search
 http://en.wikipedia.org/wiki/A*_search_algorithm
 http://en.wikipedia.org/wiki/Hill_climbing
Questions?
Intelligent Systems
CommonKADS
Agenda
 Motivation

Technical solution, illustrations and extensions
 Overview of CommonKADS
 Knowledge model components
 Template knowledge models
 Knowledge model construction
 Knowledge elicitation techniques
 Example
 Summary
 References

All slides are based on the book: Guus Schreiber, Hans Akkermans, Anjo Anjewierden, Robert de Hoog, Nigel Shadbolt, Walter Van de Velde and Bob Wielinga.
Knowledge Engineering and Management: The CommonKADS Methodology
, MIT Press, ISBN 0262193000. 2000.

And slides are partly based on the CommonKads Course http://www.commonkads.uva.nl
MOTIVATION
Knowledge engineering
 Process of
 eliciting,
 structuring,
 formalizing,
 operationalizing
 information and knowledge involved in a knowledgeintensive problem domain,
 in order to construct a program that can perform a difficult task adequately
Knowledge engineering problems
 Complex information and knowledge is difficult to observe
 Experts and other sources differ
 Multiple representations:
 textbooks
 graphical representations
 skills
Importance of proper knowledge engineering
 Knowledge is valuable and often outlives a particular implementation

knowledge management

Errors in a knowledgebase can cause serious problems
 Heavy demands on extendibility and maintenance

changes over time
TECHNICAL SOLUTION AND ILLUSTRATIONS
Overview of CommonKADS
CommonKADS principles

CommonKADS: a comprehensive methodology for KBS development

Knowledge engineering is not some kind of `mining from the expert's head', but consists of constructing different aspect models of human knowledge

The knowledgelevel principle: in knowledge modeling, first concentrate on the conceptual structure of knowledge, and leave the programming details for later

Knowledge has a stable internal structure that is analyzable by distinguishing specific knowledge types and roles.
CommonKADS Terminology
 Domain
 some area of interest

banking, food industry, photocopiers, car manufacturing
 Task
 something that needs to be done by an agent

monitor a process; create a plan; analyze deviant behavior
 Agent
 the executor of a task in a domain

typically either a human or some software system
 Application
 The context provided by the combination of a task and a domain in which this task is carried out by agents
 Application domain
 The particular area of interest involved in an application
 Application task
 The (toplevel) task that needs to be performed in a certain application
CommonKADS Terminology
 knowledge system (KS)

system that solves a reallife problem using knowledge about the application domain and the application task
 expert system

knowledge system that solves a problem which requires a considerable amount of expertise, when solved by humans.
CommonKADS Model Set
Model Set Overview (1)
 Organization model
 supports analysis of an organization,
 Goal: discover problems, opportunities and possible impacts of KBS (knowledgebased system) development.
 Task model
 describes tasks that are performed or will be performed in the organizational environment
 Agent model
 describes capabilities, norms, preferences and permissions of agents (agent = executor of task).
Model Set Overview (2)
 Knowledge model
 gives an implementationindependent description of knowledge involved in a task.
 Communication model
 models the communicative transactions between agents.
 Design model
 describes the structure of the system that needs to be constructed.
Models exist in various forms
 Model template
 predefined, fixed structure, can be configured
 Model instance
 objects manipulated during a project.
 Model versions
 versions of a model instance can exist.
 Multiple model instances
 separate instances can be developed
 example: ''current'' and ''future'' organization
The Product
 Instantiated models
 represent the important aspects of the environment and the delivered knowledge based system.
 Additional documentation
 information not represented in the filled model templates (e.g. project management information)
 Software
Roles in knowledgesystem development
 knowledge provider
 knowledge engineer/analyst
 knowledge system developer
 knowledge user
 project manager
 knowledge manager
manytomany relations between roles and people
Knowledge provider/specialist
 “traditional” expert
 person with extensive experience in an application domain
 can provide also plan for domain familiarization
 “where would you advise a beginner to start?”
 interprovider differences are common
 need to assure cooperation
Knowledge engineer

specific kind of system analyst

should avoid becoming an "expert"

plays a liaison function between application domain and system
Knowledgesystem developer

person that implements a knowledge system on a particular target platform

needs to have general design/implementation expertise
 needs to understand knowledge analysis

but only on the “use”level
Knowledge user
 interact with the prospective system or are affected indirectly by the system
 Level of skill/knowledge is important factor
 May need extensive interacting facilities
 explanation
 His/her work is often affected by the system
 consider attitude / active role
Project manager

responsible for planning, scheduling and monitoring development work

liaises with client

typically mediumsize projects (46 people)

profits from structured approach
Knowledge manager
 background role
 monitors organizational purpose of
 system(s) developed in a project
 knowledge assets developed/refined
 initiates (followup) projects
 should play key role in reuse
 may help in setting up the right project team
Roles in knowledgesystem development
Knowledge model components
Knowledge model
 specialized tool for specification of knowledgeintensive tasks
 abstracts from communication aspects
 realworld oriented
 reuse is central theme
Knowledge categories
 Domain knowledge
 relevant domain knowledge and information
 static
 Inference knowledge
 basic reasoning steps that can be made in the domain knowledge and are applied by tasks
 Task knowledge
 goaloriented
 functional decomposition
Knowledge Categories: domain knowledge
 domain schema
 schematic description of knowledge and information types
 defined through domain constructs
 knowledge base
 set of knowledge instances
Constructs for domain schema
 Concept
 cf. object class (without operations)
 Relation
 cf. association
 Attribute
 primitive value
 Rule type
 introduces expressions
Example: car concepts
gas dial
value: gas dial 
fuel tank
status: { full, almostempty, empty} 

CONCEPT gas dial; ATTRIBUTES: value: dialvalue; END CONCEPT gasdial; VALUETYPE dialvalue; VALUELIST: {zero, low, normal}; TYPE: ORDINAL; END VALUETYPE dialvalue; 
CONCEPT fueltank; ATTRIBUTES: status: {full, almostempty,empyt}; END CONCEPT fueltank; 
Modelling rules
 knowledge analysis is focused on finding rules with a common structure
 a rule as an instance of a rule type
 models a relation between expressions about feature values (e.g. attribute values)

gasdial.value = zero > fueltank.status = empty
 models set of realworld “rules” with a similar structure
Example rule type

person.income <= 10,000
RESTRICTS
loan.amount <= 2,000
person.income > 10,000 AND person.income <= 20,000
RESTRICTS
loan.amount <= 3,000
Rule type structure
 <antecedent> <connectionsymbol> <consequent>
 example rule:

fuelsupply.status = blocked
CAUSES
gasinengine.status = false;
 flexible use for almost any type of dependency
 multiple types for antecedent and consequent
Inference knowledge
 describes the lowest level of functional decomposition
 basic informationprocessing units:
 inference ⇒ reasoning
 transfer function ⇒ communication with other agents
 why special status?
 indirectly related to domain knowledge
 enables reuse of inference
Example inference: cover

fuel tank is empty leads to lack of gas in engine
if there is no gas in the the engine, then the car does not start
Task knowledge
 describes goals
 assess a mortgage application in order to minimize the risk of losing money
 find the cause of a malfunction of a photocopier in order to restore service.
 design an elevator for a new building.
 describes strategies that can be employed for realizing goals.
 typically described in a hierarchical fashion
Task

Description of the input/output
 Main distinction with traditional functions is that the data manipulated by the task are (also) described in a domainindependent way.

example, the output of a medical diagnosis task would not be a “disease” but an abstract name such as “fault category”
Template knowledge models
Lessons

Knowledge models partially reused in new applications

Type of task = main guide for reuse

Catalog of task templates
The need for reuse

prevent "reinventing the wheel"

cost/time efficient

decreases complexity

qualityassurance
Task template
 reusable combination of model elements
 (provisional) inference structure
 typical control structure
 typical domain schema from task pointofview
 specific for a task type
 supports topdown knowledge modeling
Analytic versus synthetic tasks
 analytic tasks
 system preexists
 it is typically not completely "known"
 input: some data about the system,
 output: some characterization of the system
 synthetic tasks
 system does not yet exist
 input: requirements about system to be constructed
 output: constructed system description
Task hierarchy
Structure of template description in catalog
 General characterization
 typical features of a task
 Default method
 roles, subfunctions, control structure, inference structure
 Typical variations
 frequently occurring refinements/changes
 Typical domainknowledge schema
 assumptions about underlying domainknowledge structure
Classification
 establish correct class for an object
 object should be available for inspection
 "natural" objects
 examples: rock classification, apple classification
 terminology: object, class, attribute, feature
 one of the simplest analytic tasks; many methods
 other analytic tasks: sometimes reduced to classification problem especially diagnosis
Assessment
 find decision category for a case based on domainspecific norms.
 typical domains: financial applications (loan application), community service
 terminology: case, decision, norms
 some similarities with monitoring
 differences:
 timing: assessment is more static
 different output: decision versus discrepancy
Diagnosis
 find fault that causes system to malfunction
 example: diagnosis of a copier
 terminology:
 complaint/symptom, hypothesis, differential, finding(s)/evidence, fault
 nature of fault varies
 state, chain, component
 should have some model of system behavior
 default method: simple causal model
 sometimes reduced to classification task
 direct associations between symptoms and faults
 automation feasible in technical domains
Monitoring
 analyze ongoing process to find out whether it behaves according to expectations
 terminology:
 parameter, norm, discrepancy, historical data
 main features:
 dynamic nature of the system
 cyclic task execution
 output "just" discrepancy => no explanation
 often: coupling monitoring and diagnosis
 output monitoring is input diagnosis
Prediction
 analytic task with some synthetic features
 analyses current system behavior to construct description of a system state at future point in time.
 example: weather forecasting
 often subtask in diagnosis
 also found in knowledgeintensive modules of teaching systems e.g. for physics.
 inverse: retrodiction: bigbang theory
Synthesis
 Given a set of requirements, construct a system description that fulfills these requirements
“Ideal” synthesis method
 Operationalize requirements
 preferences and constraints
 Generate all possible system structures
 Select subset of valid system structures
 obey constraints
 Order valid system structures
 based on preferences
Design
 synthetic task
 system to be constructed is physical artifact
 example: design of a car
 can include creative design of components
 creative design is too hard a nut to crack for current knowledge technology
 subtype of design which excludes creative design => configuration design
Configuration design
 given predefined components, find assembly that satisfies requirements + obeys constraints
 example: configuration of an elevator; or PC
 terminology: component, parameter, constraint, preference, requirement (hard & soft)
 form of design that is well suited for automation
 computationally demanding
Assignment
 create mapping between two sets of objects
 allocation of offices to employees
 allocation of airplanes to gates
 mapping has to satisfy requirements and be consistent with constraints
 terminology
 subject, resource, allocation
 can be seen as a “degenerative” form of configuration design
Planning
 shares many features with design
 main difference: "system" consists of activities plus time dependencies
 examples: travel planning; planning of building activities
 automation only feasible, if the basic plan elements are predefined
 consider use of the general synthesis method (e.g therapy planning) or the configurationdesign method
Planning method
Scheduling
 Given a set of predefined jobs, each of which consists of temporally sequenced activities called units, assign all the units to resources at time slots
 production scheduling in plant floors
 Terminology: job, unit, resource, schedule
 Often done after planning (= specification of jobs)
 Take care: use of terms “planning” and “scheduling” differs
In applications: typical task combinations
 monitoring + diagnosis
 Production process
 monitoring + assessment
 Nursing task
 diagnosis + planning
 Troubleshooting devices
 classification + planning
 Military applications
Knowledge model construction
Process & Product
 so far: focus on knowledge model as product
 bottleneck for inexperienced knowledge modelers
 how to undertake the process of model construction.
 solution: process model
 as prescriptive as possible
 process elements: stage, activity, guideline, technique
 but: modeling is constructive activity
 no single correct solution nor an optimal path
 support through a number of guidelines that have proven to work well in practice.
 knowledge modeling is specialized form of requirements specification
 general software engineering principles apply
Stages in KnowledgeModel Construction
STAGES  TYPICAL ACTIVITIES 
knowledge identification 

knowledge specification 

knowledge refinement 

Stage 1: Knowledge identification
 goal
 survey the knowledge items
 prepare them for specification
 input
 knowledgeintensive task selected
 main knowledge items identified.
 application task classified
 assessment, configuration, combination of task types
 activities
 explore and structure the information sources
 study the nature of the task in more detail
Exploring information sources
 Factors
 Nature of the sources
 wellunderstood?, theoretical basis?
 Diversity of the sources
 no single information source (e.g. textbook or manual)
 diverse sources may be conflicting
 multiple experts is a risk factor.
 Techniques
 text marking in key information sources
 some structured interviews to clarify perceived holes in domain
 main problem:
 find balance between learning about the domain without becoming a full
Guidelines

Talk to people in the organization who have to talk to experts but are not experts themselves

Avoid diving into detailed, complicated theories unless the usefulness is proven

Construct a few typical scenarios which you understand at a global level

Never spend too much time on this activity. Two person weeks should be maximum.
Results exploration
 Tangible
 Listing of domain knowledge sources, including a short characterization.
 Summaries of selected key texts.
 Glossary/lexicon
 Description of scenarios developed.
 Intangible
 your own understanding of the domain
 most important result
List potential components
 goal: pave way for reusing components
 two angles on reuse:
 Task dimension
 check task type assigned in Task Model
 build a list of task templates
 Domain dimension
 type of the domain: e.g. technical domain
 look for standardized descriptions
AAT for art objects ontology libraries, reference models, product model libraries
Stage 2: Knowledge specification
 goal: complete specification of knowledge except for contents of domain models
 domain models need only to contain example instances
 activities
 Choose a task template.
 Construct an initial domain conceptualization.
 Specify the three knowledge categories
Choose task template
 baseline: strong preference for a knowledge model based on an existing application.
 efficient, quality assurance
 selection criteria: features of application task
 nature of the output: fault category, plan
 nature of the inputs: kind of data available
 nature of the system: artifact, biological system
 constraints posed by the task environment:
 required certainty, costs of observations.
Guidelines for template selection
 prefer templates that have been used more than once

empirical evidence

construct annotated inference structure (and domain schema)

if no template fits: question the knowledgeintensity of the task
Guidelines
 use as much as possible existing data models:
 useful to use at least the same terminology basic constructs
 makes future cooperation/exchange easier
 limit use of the knowledgemodeling language to concepts, subtypes and relations
 concentrate on "data"
 similar to building initial class model
 If no existing data models can be found, use standard SE techniques for finding concepts and relations
 use “pruning” method
 Constructing the initial domain conceptualization should be done in parallel with the choice of the task template
 otherwise: fake it
Complete model specification
 Route 1: Middleout
 Start with the inference knowledge
 Preferred approach
 Precondition: task template provides good approximation of inference structure.
 Route 2: Middlein
 Start in parallel with task decomposition and domain modeling
 More timeconsuming
 Needed if task template is too coarsegrained
Middlein and Middleout
Guidelines
 inference structure is detailed enough, if the explanation it provides is sufficiently detailed
 inference structure is detailed enough if it is easy to find for each inference a single type of domain knowledge that can act as a static role for this inference
Guidelines for specifying task knowledge
 begin with the control structure
 "heart" of the method
 neglect details of working memory
 design issue
 choose role names that clearly indicate role
 "modeling is naming"
 do not include static knowledge roles
 realtime applications: consider using a different representation than pseudo code
 but: usage of "receive"
Guidelines for specifying domain knowledge
 domainknowledge type used as static role not required to have exactly the “right’” representation
 design issue;
 key point: knowledge is available.
 scope of domain knowledge is typically broader than what is covered by inferences
 requirements of communication, explanation
Stage 3: Knowledge Refinement
 Validate knowledge model
 Fill contents of knowledge bases
Fill contents of knowledge bases
 schema contains two kinds of domain types:
 information types that have instances that are part of a case
 knowledge types that have instances that are part of a domain model
 goal of this task: find (all) instances of the latter type
 case instances are only needed for a scenario
Guidelines for filling contents
 filling acts as a validation test of the schema
 usually not possible to define full, correct knowledge base in the first cycle
 knowledge bases need to be maintained
 knowledge changes over time
 techniques:
 incorporate editing facilities for KB updating, trace transcripts, structured interview, automated learning, map from existing knowledge bases
Validate knowledge model
 internally and externally
 verification = internal validation
 “is the model right?”
 validation = validation against user requirements
 "is it the right model?"
Validation techniques
 Internal
 structured walktroughs
 software tools for checking the syntax and find missing parts
 External
 usually more difficult and/or more comprehensive.
 main technique: simulation
 paperbased simulation
 prototype system
Paperbased simulation
the user says:
“the car does
not start” 
DIAGNOSIS:
Complaint: engine
behavior.status =
doesnotstart 
Complaint is received,
for which a
diagnostic task is
started 

a possible
cause is that
the fuel tank is
empty 
COVER:
hypothesis; fuel
tank.status =
empty 
One of the three
possible causes is
produced 

in that case we
would expect
the gas
indicator to be
low 
PREDICT:
expectedfinding:
gasdial.value = zer 
The expected finding
provides us with a
way of getting
supporting
evidence for
hypothesis 
Maintenance
 model development is a cyclic process
 models act as information repositories
 continuously updated
 but: makes requirements for support tools stronger
 transformation tools
Domain Documentation Document
 Knowledge model specification
 list of all information sources used.
 list of model components that we considered for reuse.
 scenarios for solving the application problem.
 results of the simulations undertaken during validation
 Elicitation material (appendices)
Summary process
 Knowledge identification
 familiarization with the application domain
 Knowledge specification
 detailed knowledge analysis
 supported by reference models
 Knowledge refinement
 completing the knowledge model
 validating the knowledge model
 Feedback loops may be required
 simulation in third stage may lead to changes in specification
 Knowledge bases may require looking for additional knowledge sources.
 general rule: feedback loops occur less frequently, if the application problem is wellunderstood and similar problems have been tackled
Knowledge elicitation techniques
Elicitation of expertise
 Timeconsuming
 Multiple forms
 e.g. theoretical, howtodoit
 Multiple experts
 Heuristic nature
 distinguish empirical from heuristic
 Managing elicitation efficiently
 knowledge about when to use particular techniques
Expert types
 Academic
 Regards domain as having a logical structure
 Talks a lot
 Emphasis on generalizations and laws
 Feels a need to present a consistent “story”: teacher
 Often remote from daytoday problem solving
 Practitioner
 Heavily into daytoday problem solving
 Implicit understanding of the domain
 Emphasis on practical problems and constraints
 Many heuristics
Human limitations and biases
 Limited memory capacity
 Context may be required for knowledge recollection
 Prior probabilities are typically undervalued
 Limited deduction capabilities
Elicitation techniques
 Interview
 Self report / protocol analysis
 Laddering
 Concept sorting
 Repertory grids
Interview
Interview: Session preparation
 Establish goal of the session
 Consider added value for expert
 Describe for yourself a profile of the expert
 List relevant questions
 Write down opening and closing statement
 Check recording equipment
 audio recording is usually sufficient
 Make sure expert is aware of session context: goal, duration, followup, et cetera
Interview: Start of the session
 Introduce yourself (if required)
 Clarify goal and expectations
 Indicate how the results will be used
 Ask permission for tape recording
 Privacy issues
 Check whether the expert has some questions left
 Create as much as possible a mutual trust
Interview: During the session
 Avoid suggestive questions
 Clarify reason of question
 Phrase questions in terms of probes
 e.g, “why …”
 Pay attention to nonverbal aspects
 Be aware of personal biases
 Give summaries at intermediate points
Interview: End of the session
 Restate goal of the session
 Ask for additional/qualifying
 Indicate what will be the next steps
 Make appointments for the next meetings
 Process interview results ASAP.
 Organize feedback round with expert
 Distribute session results
Unstructured interview
 No detailed agenda
 Few constraints
 Delivers diverse, incomplete data
 Used in early stages: feasibility study, knowledge identification
 Useful to establish a common basis with expert
 s/he can talk freely
Structured interview
 Knowledge engineer plans and directs the session
 Takes form of providerelicitor dialogue
 Delivers more focused expertise data
 Often used for “filling in the gaps” in the knowledge base
 knowledge refinement phase
 Also useful at end of knowledge identification or start of knowledge specification
 Always create a transcript
Interview structure for domainknowledge elicitation
 Identify a particular subtask
 should be relatively small task, e.g. an inference
 Ask expert to identify “rules” used in this task
 Take each rule, and ask when it is useful and when not
 Use fixed set of probes:
 “Why would you do that?”
 “How would you do that?”
 “When would you do that?”
 “What alternatives are there for this action?”
 “What if …?”
 “Can you tell me more about ..?”
Interview pitfalls
 Experts can only produce what they can verbalize
 Experts seek to justify actions in any way they can
 “spurious justification”
 Therefore: supplement with techniques that observe expertise “in action”
 e.g. self report
Self report
Self report
 Expert performs a task while providing a running commentary
 expert is “thinking aloud”
 Session protocol is always transcribed
 input for protocol analysis
 Variations:
 shadowing: one expert performs, a second expert gives a running commentary
 retrospection: provide a commentary after the problemsolving session
 Theoretical basis: cognitive psychology
Requirements for selfreport session
 Knowledge engineer must be sufficiently acquainted with the domain
 Task selection is crucial
 only a few problems can be tackled
 selection typically guided by available scenario’s and templates
 Expert should not feel embarrassed
 consider need for training session
Analyzing the selfreport protocol
 Use a reference model as a coding scheme for text fragments
 Task template
 Look out for “when”knowledge
 Taskcontrol knowledge
 Annotations and markups can be used for domainknowledge acquisition
 Consider need for tool support
Self report guidelines and pitfalls
 Present problems in a realistic way
 Transcribe sessions as soon as possible
 Avoid long sessions (maximum = 20 minutes)
 Presence of knowledge engineer is important
 Be aware of scope limitations
 Verbalization may hamper performance
 Knowledge engineer may lack background knowledge to notice distinctions
Use of self reports
 Knowledge specification stage
 Validation of the selection of a particular reference model
 Refining / customizing a task template for a specific application
 If no adequate task template model is available: use for bottomup reasoning model construction
 but: timeconsuming
Laddering
Laddering
 Organizing entities in a hierarchy
 Hierarchies are meant as preformal structures
 Nodes can be of any type
 class, process, relation, ….
 Useful for the initial phases of domainknowledge structuring
 in particular knowledge identification
 Can be done by expert
 tool support
Example ladder
Concept sorting
Concept sorting
 Technique:
 present expert with shuffled set of cards with concept names
 expert is asked to sort cards in piles
 Helps to find relations among a set of concepts
 Useful in case of subtle dependencies
 Simple to apply
 Complementary to repertory grids
 concept sort: nominal categories
 repertory grid: ordinal categories
Card sort tool
Repertory grids
Repertory grid
 Based on personal construct theory (Kelly, 1955)
 Subject: discriminate between triads of concepts
 Mercury and Venus versus Jupiter
 Subject is asked for discriminating feature
 E.g. “planet size”
 Reiterate until no new features are found
 Rate all concepts with respect to all features
 Matrix is analyzed with cluster analysis
 Result: suggestions for concept relations
 Tool support is required
Example grid
When to use which technique?
 Knowledge identification
 Unstructured interview, laddering
 Knowledge specification
 Domain schema: concept sorting, repertory grid
 Template selection: self report
 Task & inference knowledge: self report
 Knowledge refinement
 Structured interview
EXAMPLE
Housing application
 An application for assigning houses to potential renters
 We now sketch the organization, task and agent model and build the knowledge model on top.
Problem description
 Local government institution is responsible for assignment of rental houses to applicants
 Transparent assignment procedure
 twoweekly magazine with house offers
 publication of results
 Partially automated process
 Existing databases of applicants and residences
Organization models
Organization model 1
Organization Model

Problems and Opportunities Worksheet Organization model 1


Problems and opportunities

assessment takes too much time
not sufficient time for urgent cases


Organizational context

Mission: transparency of procedure, clear applicant responsibility External actors: local council, public opinion, national regulations, … Strategy: broaden scope of market 

Solutions


Organization model 2
Organization model

Variant aspects: Worksheet Organization model 2


Resources

Existing database
Priority calculator


Knowledge

Assessment criteria:
Assignment rules:
Urgency rules:


Culture & power

Hierarchical organization
Employees view the future with some trepidation Management style: history as civil servant 
Organization model 3
Task

Performed by

Where

Knowledge asset(s)

KI?

Signifi
cance


1. Magazine production

Magazine editor

Public service



No

3


2. Data entry applications

Data typist / automated telephone

Residence assignment



No

2


3. Application assessment

Assigner

Residence assignment

Assessment criteria

Yes

5


4. Residence assignment

Assigner

Residence
Assignment

Assignment &
urgency rules

Yes

5

Organization model 4
 Knowledge asset:
 “general residenceapplication norms”
 right form?
 no, should be also in electronic form
 right place, time, quality?
 yes
Task model
 Task = subpart of a business process
 goaloriented valueadding activity
 handles inputs and delivers desired outputs
 in a structured and controlled way
 consumes resources;
 requires (and provides) knowledge/skills
 adheres to quality and performance criteria
 carried out by responsible and accountable agents
Task model
Task model: data flow
Task model: control flow
Agent model
 OM and TM => process/task perspective
 AM: perspective of individual agents
 staff, software systems
 large part: rearrangement of information already in other worksheets
 just a single worksheet
 agent view useful for judging impact
 See attitude matrix
 important input for communication model
Agent model
Agent model
Name

Assigner


Organization

Residenceassignment department


Involved In

3. Application assessment
4. Residence assignment 

Communicates with

Database
Priority calculator 

Knowledge

Assessment criteria
Assignment rules Urgency rules 

Other competencies

Ability to handle problematic nonstandard cases


Responsibilities
& constraints 
Make sure that people are treated equally (no favors). This has been a problem in the past

Knowledge model
 Reading the twoweekly magazine in detail
 organizational goal of transparent procedure makes life easy
 Reading the original report of the local government for setting up the house assignment procedure
 identification of detailed information about handling urgent cases
 Short interviews/conversations
 staff member of organization
 two applicants (the “customers”)
 Now we look into
 Domain model
 Inference structure
 Task layer
Domain model
Inference structure
Task layer
SUMMARY
Summary
 Knowledge model components
 Knowledge model:
 specialized tool for specification of knowledgeintensive tasks
 abstracts from communication aspects
 realworld oriented
 reuse is central theme
 Task knowledge
 goaloriented
 functional decomposition
 Domain knowledge
 relevant domain knowledge and information
 static
 Inference knowledge
 basic reasoning steps that can be made in the domain knowledge and are applied by tasks
 Template knowledge models
 Knowledge models partially reused in new applications
 Type of task = main guide for reuse
 Catalog of task templates
 reusable combination of model elements
 (provisional) inference structure
 typical control structure
 typical domain schema from task pointofview
 specific for a task type
 supports topdown knowledge modeling
Summary (cont‘d)


STAGES

TYPICAL ACTIVITIES


knowledge
identification



knowledge
specification



knowledge
refinement


Summary (cont‘d)
 Knowledge elicitation techniques
 Interview
 Self report / protocol analysis
 Laddering
 Concept sorting
 Repertory grids
 Automated learning techniques
 Induction
REFERENCES
References
 Mandatory reading:
 Guus Schreiber, Hans Akkermans, Anjo Anjewierden, Robert de Hoog, Nigel Shadbolt, Walter Van de Velde and Bob Wielinga. Knowledge Engineering and Management: The CommonKADS Methodology , MIT Press, ISBN 0262193000. 2000.
 Chapters 1, 2, 4, 68
References
 Further reading:
 Guus Schreiber, Hans Akkermans, Anjo Anjewierden, Robert de Hoog, Nigel Shadbolt, Walter Van de Velde and Bob Wielinga. Knowledge Engineering and Management: The CommonKADS Methodology , MIT Press, ISBN 0262193000. 2000.
 http://www.commonkads.uva.nl
References
 Wikipedia links:
 http://en.wikipedia.org/wiki/Knowledge_engineering
 http://en.wikipedia.org/wiki/Knowledgebased_systems
 http://en.wikipedia.org/wiki/Knowledge_modeling
 http://en.wikipedia.org/wiki/Expert_system
Questions?
Intelligent Systems
ProblemSolving Methods
Agenda
 Motivation
 Technical Solution
 Development of Knowledgebased systems towards ProblemSolving Methods
 ProblemSolving Methods
 Illustration by a Larger Example
 Extensions
 Summary
 References
MOTIVATION
Motivation

In order to allow automation in the achievement of complex problems we should like a general solution with the following characteristics:
 Based on reasoning with knowledge;
 Works with a declarative , rather than an algorithmic, representation of knowedge;

Knowledge
 Represents knowledge on the problemsolving process , i.e., the dynamics of the solution;

Process
 Allows reuse of reasoning knowledge;
 Abstracts from implementation and domain to increase reusability.

Reuse
Motivating Example

As a motivating example we consider the task (i.e., goal) of parametric design , which can be defined over:

Design space – the space that contains all possible designs;

Requirements – a finite set of which are assumed to be provided by the user;

Constraints – a finite set of which model additional conditions for a valid design;

Preference – a function over the design space which can be used to discriminate between different solutions.
Motivating Example

Domain knowledge is implicit in the parametric design description in several places:

Design space – the concrete definition of the design space is domain specific knowledge
[Requirements]

Constraints – domain knowledge concerning regularities in the domain in which the design is constructed;
[Preference]
Motivating Example

Formally, a design artifact is described by a set of attributevalue pairs.
Let A_{1}, ..., A_{n} be a fixed set of parameters with fixed ranges R_{1}, ..., R_{n}:

Design space is the cartesian product R_{1} × ... × R_{n};

Requirements – a relation R which defines a subset of this space;

Constraints – a relation C which defines a subset of this space;

Preference – a partial function P having all possible designs as its domain and preference values as its range.
Motivating Example
Motivating Example
 Several types of design are implicitly defined by these aspects of the problem:
 Possible design – An element in design space (∈ R_{1} × ... × R_{n});
 Desired design – A design that fulfills all requirements (∈ R);
 Valid design – A design that fulfills all constraints (∈ C);
 Solution – A design which is desired and valid (∈ R ∩ C);

Optimal solution
 A solution for which no other solution which has a higher preference value exists
(any solution x such that ∀y ∈ R_{1} × ... × R_{n} . P(y) ≤ P(x)).
 There is also the possible extension:
 Acceptable solution  Given a threshold t, an acceptable solution is any solution x such that P(x) > t.
Motivating Example
Motivating Example

An inefficient naive approach to parametric design is called generate & test , which depends on the following inferences:

generate – requires knowledge that describes what constitutes a possible design (i.e., the design space);

Rtest – requires knowledge that describes which possible designs are desired (i.e. the user requirements);

Ctest – requires knowledge that describes which possible designs are valid (i.e., the domain constraints);

select – requires knowledge that evaluates solutions (i.e., knowledge that describes what constitutes a preferred design).
Motivating Example
Motivating Example
 This naive solution can also be represented as:

possible design :=
generate
_{
all
};

valid design :=
Ctest
(possible design);

desired design :=
Rtest
(possible design);

solution := valid design
∩
desired design;

optimal solution :=
select
(solution)
 Using the definition of acceptable solution this can be made somewhat more efficient as:

repeat

possible design :=
generate
_{
one
}
;

valid design :=
Ctest
(possible design);

desired design :=
Rtest
(possible design);

solution := valid design
∩
desired design;

acceptable solution :=
select
(solution)

until
ø ≠ acceptable solution
Lessons from Example
 generate & test has the following characteristics:
 it separates the different types of knowledge;
 it is not efficient (all possible designs are generated);
 It may not terminate if the design space is infinite.
 From the literature on expert systems [Stefik et al., 1983]:
 “an important issue is the distribution of knowledge between the generator and the tester: putting as much knowledge as possible into the generator often leads to a more efficient search.”
 A much more clever strategy is therefore to use these knowledge types to guide the generation of possible designs.
 However to do so requires strong assumptions about the domain knowledge.
TECHNICAL SOLUTIONS
DEVELOPMENT OF KNOWLEDGEBASED SYSTEMS TOWARDS PROBLEMSOLVING METHODS
Development

General Problem Solver

Knowledgeispower hypothesis

Knowledge levels
3a. Newell’s 3 levels of knowledge
3b. Brachman’s 5 levels of knowledge

Problem Solving Methods
1. General Problem Solver

The General Problem Solver (GPS) is a universal problem solving approach.

GPS is the first approach that makes the distinction between knowledge of problems domains and how to solve problems

GPS is domain and task independent approach.

GPS does not put any restrictions both on the domain knowledge and on the task.

Examples of GPS are: automated theorem proving and generic search methods
Automated theorem proving
 Automatic theorem provers are GPS for which every problem can be expressed as logical inference
 Automated theorem proving is about proving of mathematical theorems by a computer program
Generic Search Methods

Generic Search Methods are GPS for which every problem can be expressed as search

One particular example of a Generic Search Method is the A* algorithm.

A* works for problems that can be represented as a state space i.e. a graph of states. Initial conditions of the problem are represented as start state , goal conditions are represented as end state

A* is an informed search or heuristic search approach that uses the estimation function:
f(n)=g(n)+h(n)

g(n) the cost to get from the star state to current state n

h(n) estimated cost to get from current state n to end state

f(n) estimated total cost from start state through current state n to the end state
See Lecture 5
1. General Problem Solver (1)

However, GPS has a set of limitations :

It works in theory but in practice works only on toy problems (e.g. Tower of Hanoi)

Could not solve realworld problems because search was easily lost in the combinatorial explosion of intermediate states
2. Knowledgeispower hypothesis
Knowledgeispower hypothesis , also called the Knowledge Principle was formulated by E.A. Feigenbaum in 1977:
“knowledge of the specific task domain in which the program is to do its problem solving was more important as a source of power for competent problem solving than the reasoning method employed” [Feigenbaum, 1977]
2. Knowledgeispower hypothesis (1)

The Knowledgeispower hypothesis shifted the focus on how to build intelligent systems from inference to the knowledge base .

Problem solving is guided by experiential, qualitative, heuristic knowledge.

The meaning of intelligence as knowledge is the common meaning in English world.

The Central Intelligence Agency (CIA) defines intelligence as knowledge.

The Knowledgeispower hypothesis lead to the emergence of a new filed i.e. expert systems and a new profession i.e. knowledge engineer
3. Knowledge levels
3a. Newell’s 3 levels of knowledge
3b. Brachman’s 5 levels of knowledge
3a. Newell’s 3 levels of knowledge [Newell, 1981]

In his work from 1981, Newell tried to answer questions such as:

How can knowledge be characterised?

What is the relation of this characterisation and the representation of knowledge?

What is characteristic about a system which holds knowledge?

Newell distinguished 3 levels in the context of knowledge representation:

Knowledge Level

Logical Level

Implementation Level
3a. Newell’s 3 levels of knowledge (1)

Knowledge Level

The most abstract level of representing knowledge.

Concerns the total knowledge contained in the Knowledge Base

Example:
The automated DBInformation system knows that a trip from Innsbruck to Vienna costs 120€
3a. Newell’s 3 levels of knowledge (2)

Logical Level

Encoding of knowledge in a formal language.

Example:
Price(Innsbruck, Vienna, 120)
3a. Newell’s 3 levels of knowledge (3)

Implementation Level

The internal representation of the sentences.
 Example:
 As a String “Price(Innsbruck, Vienna, 120)”

As a value in a matrix
3b. Brachman’s 5 Levels of Knowledge [Brachman, 1979]
 Brachman defines 5 levels for different types of representations.
 Levels interpret the transition from data to knowledge.
 Each level corresponds to an explicit set of primitives offered to the knowledge engineer.
 Ordering of knowledge levels from simple/abstract to complex/concrete:
 Implementational Level
 Logical Level
 Epistemological Level
 Conceptual Level
 Linguistic Level
3b. Brachman’s 5 Levels of Knowledge (1)

Implementational Level

The primitives are pointers and memory cells.

Allows the construction of data structures with no a priori semantics
3b. Brachman’s 5 Levels of Knowledge (2)

Logical Level

The primitives are logical predicates, operators, and propositions.

An index is available to structure primitives.

A formal semantic is given to primitives in terms of relations among objects in the real world

No particular assumption is made however as to the nature of such relations
3b. Brachman’s 5 Levels of Knowledge (3)

Epistemological Level

The primitives are concept types and structuring relations.

Structuring relations provide structure in a network of conceptual types or units. (i.e. inheritance: conceptual units, conceptual subunits)

The epistemological level links formal structure to conceptual units

It contains structural connections in our knowledge needed to justify conceptual inferences.
3b. Brachman’s 5 Levels of Knowledge (4)

Conceptual Level

The primitives are conceptual relations, primitive objects and actions.

The primitives have a definite cognitive interpretation, corresponding to languageindependent concepts like elementary actions or thematic roles
3b. Brachman’s 5 Levels of Knowledge (5)

Linguistic Level

The primitives are words, and (lingustic) expressions.

The primitives are associated directly to nouns and verbs of a specific natural language

Arbitrary relations and nodes that exist in a domain
4. Problem Solving Methods

Problem Solving Methods (PSM) abstract from details of the implementation of the reasoning process.
 Characteristics of PSM [Birmingham&Klinker, 1993] :
 A PSM specifies which inference actions have to be carried out for solving a given task.
 A PSM determines the sequence in which these actions have to be activated.

Knowledge roles determine which role the domain knowledge plays in each inference action.
PROBLEMSOLVING METHODS
Technical Solution Overview
 What are problemsolving methods (PSMs) ?

“Reasoning strategies that gain efficiency through assumptions. ”
[Fensel, 2000]
Technical Solution Overview
 Problemsolving methods achieve an efficient realization of functionality by making assumptions .
 Assumptions put restrictions on the context of the problemsolving method, such as the domain knowledge and the possible inputs of the method or the precise definition of the functionality.
 Assumptions play two roles:
 they formulate requirements on reasoning support that is assumed by PSMs;
 they put restrictions on the reasoning support that is provided by PSMs.
 In consequence, assumptions link PSMs with the domain knowledge they use and tasks they are applied to.
Technical Solution Overview
Example Revisited

We consider again the task of parametric design.
 A more efficient method is named propose & revise and depends on the following inferences:
 propose – derives an initial design based on the requirements;
 Ctest – as before;

revise – tries to improve an incorrect design based on the feedback of the Ctest step.

In other words, instead of proposing a complete design which is then repaired, we can also incrementally develop a design and repair it at each step where a constraint violation occurs.
Example Revisited
Example Revisited
 A parameter which should receive a value in the next propose step is nondeterministically selected:
 The selection process does not make further assumptions about knowledge that could guide this second selection step

The implicit assumption is that this selection does not a
affect the performance of the problem solving process and the quality of its result
 These are very strong assumptions because to improve performance, heuristic methods are definitely needed
 At any time there is either precisely one applicable propose rule or one user input to derive the value of the selected parameter
 A parameter should not depend on itself (no recursive derivation rules)
Example Revisited
 revise is decomposed into:
 selectviolation  nondeterministically selects a constraint violation from those detected by Ctest; implicit assumption is that this selection does not influence the performance of the problem solving method and the quality of the result; strong assumption again
 derivefixes  computes the set of all possible fix combinations that could possibly resolve the selected constraint violation; each combination must be finite
 selectfix  selects a fix combination, guided by a costfunction
 applyfix  applies a fix combination
Example Revisited
 Test – propose & revise does not require an explicit Rtest, the method assumes that:
 propose derives only desired designs;

revise delivers designs that are desired or that requirement violations that it does not x must be accepted.
 Selection – does not contain such a process concerning user preferences:

It assumes that the propose step and the revise step deliver acceptable (or optimal) solutions or that the functionality of the task is reduced to finding an arbitrary solution.
Description of ProblemSolving Methods
 The main elements of a specification in the PSM framework are:

the task – specifies the goals that should be solved in order to solve a given problem. A second part of a task specification is the definition of requirements on domain knowledge;

the problemsolving method – describes an reasoning steps to perform a task as an operational specification separately from a description of the competence, and a description of the requirements on domain knowledge;

the domain model – usually ontologybased description in three parts, a metalevel characterisation of properties, the domain knowledge, and (external) assumptions of the domain model;

the adapter – maps the different terminologies of the task definition, problemsolving method and domain model. Moreover, gives further requirements and assumptions needed to relate the competence of the PSM with the functionality of the task.
Description of ProblemSolving Methods
Description of ProblemSolving Methods
 Several proof obligations follow the conceptual model of such a specification of a knowledgebased system:
 POi : the consistency of the task definition to ensure that a model exists, otherwise one could define an unsolvable problem;
 POii : that the operational specification of the PSM describes a terminating process which has the competence as specified;
 POiii : the internal consistency of the domain knowledge and domain model, also that the assumptions on domain knowledge implies its metalevel characterisation;

POiv
: relationships between the specification elements –
 the requirements of the adapter imply the knowledge requirements of the PSM and task,
 the adapter’s additional requirements on domain knowledge and assumption guarantee that the competence of the PSM is strong enough for task,
 The requirements of the adapter are implied by the meta knowledge of the domain model.
PSM Framework Example
 To illustrate the description of PSMs we consider search
 Local search can be refined into other versions using adapters, specifically:
 Hillclimbing : a local search algorithm which stops when it has found a local optimum , based on recursively considering the successors to a start object, selecting better at each stage;
 SetMinimizer : finds a minimal but still correct subset of a given set, with respect to hillclimbing –
 generic ‘object’ becomes a set,
 ‘successor’ relationship is hardwired,
 preference is implicit;
 Abductive Diagnosis : receives a set of observations as input and delivers a complete (explains all input data) and parsimonious (no subset of hypotheses explains all observations) explanation .
PSM Framework Example

Local search can be represented in the following operational specification:
operational specification local search
local search(X)
output := recursion(currents);
successors := generate(X); new := select2(X, successors);
then output := select3(X); else recursion(new); 
select1(x) ⊆ x

PSM Framework Example
 Adapters between refinement levels can be represented as follows:

PSM refinement adapter
setminimizer > abductionmethod

correct(x) = complete(x);

input = {h  h is hypothesis};

H ⊆ H’ → explain(H) ⊆ explain(H’)

endPSM refinement adapter

PSM refinement adapter
hillclimbing > setminimizer

correct(input);

select1(x) = {x};

successor(x, y) ↔ ∃z . (z ∈ x ∧ y = x \ {z});

x < y ↔ correct(y) ∧ y ⊂ x

endPSM refinement adapter

PSM refinement adapter
localsearch > hillclimbing

select1(x) = 1;

¬∃z . (z ∈ y’ ∧ y < z) → select2({y}, y’) = {y};

select2({y}, y’) = 1

endPSM refinement adapter
ILLUSTRATION BY A LARGER EXAMPLE
SisyphusI

The SisyphusI office allocation problem was the first of a series of test applications for approaches to building knowledgebased systems.

The aim is to automate the problemsolving behaviour of ‘Siggi’, a hypothetical domain expert .

Specifically Siggi is required to make assignments of the staff of a research group ‘YQT’ to offices.

The 4page problem statement describes the office layout and relevant data about the 15 group members.

A ‘protocol’ is provided, describing the steps Siggi takes.
SisyphusI Office Layout
SisyphusI Protocol
SisyphusI YQT Members Data

From the protocol can be drawn data such as:
SisyphusI PSMBased Approach

The approach used rests on mapping from task concepts to method concepts as shown [Motta, 1999] :
SisyphusI Values Ranges

From the protocol can be drawn value ranges, based on given justifications in the protocol:
SisyphusI Requirements and Constraints

From the protocol can be drawn the following requirements and constraints:
SisyphusI Preferences

From the protocol can be drawn the following preferences:
SisyphusI Cost Function
 Cost function produces a 4dimensional vector,
< n_{1}, n_{2}, n_{3}, n_{4} > , where:

n_{1} measures the distance between the room of the head of group and that of the secretaries;

n_{2} measures the distance between the manager’s room and the rooms of the head of group and the secretaries;

n_{3} measures the distance between the heads of projects and the head of group and secretaries;

n_{4} provides a measure of the ‘project synergy’ afforded by a solution.
SisyphusI Cost Function (cntd.)

A design model in the SisyphusI domain d1, with cost function
< n_{11}, n_{12}, n_{13}, n_{14} > is cheaper than a design model d2, with cost
< n_{21}, n_{22}, n_{23}, n_{24} > , iff one or more of the following conditions are satisfied:

n_{11} < n_{21}

n_{11} = n_{21} and n_{12} < n_{22}

n_{11} = n_{21} and n_{12} = n_{22} and n_{13} < n_{23}

n_{11} = n_{21} and n_{12} = n_{22} and n_{13} = n_{23} and n_{14} < n_{24}
SisyphusI PSMbased Solutions
 Solving by Gendesignpsm (Generic Model for Parametric Design), which enhances simple depthfirst search via:
 Focus selection – a DSR strategy analyses
 Value ranges associated with each parameter,
 The most constrained parameter in the current design model ;
 Operator selection – operator chosen according to given ordering.
 Solving by HCdesign (Hill climbing) – same competence as Gendesignpsm, but much less efficient (measured at 10%, as compared to 78%).
 Solving by A*design, based on heuristic function using estimated cost function – better competence (optimal solution), but comparable efficiency to HCdesign.
EXTENSIONS
Application to Web Service Models

A standard language was developed for the description of PSMs called the Unified ProblemSolving Method Language (UPML) [Fensel et al, 2002]

UPML was applied to the modelling of Web Services in the Web Service Modelling Framework (WSMF) [Fensel & Bussler, 2002]

The WSMF approach was encoded in the Web Service Modeling Ontology (WSMO) – wherein ontologybased models are builts for goals (~tasks), services (~methods) and these are linked by mediators (~adapters) [Fensel et al., 2007]

WSMO was encoded in a family of ontology languages, WSML, and an opensource implementation was carried out in WSMX.
SUMMARY
Summary

Problemsolving methods offer a means to structure and reuse elements of knowledgebased systems by abstracting from their domain.

Efficiency is achieved by introducing assumptions that either restrict the size of the problem or that postulate strong requirements on the available domain knowledge.

Adapters link tasks and PSMs to domains, allowing reuse of both, and express refinements between PSMs, allowing libraries to be built of these.
REFERENCES
References
 Mandatory reading:

[Fensel, 2000] “ProblemSolving Methods: Understanding, Description, Development and Reuse”, D. Fensel, Springer LNAI 1791, 2000
 Further reading:
 [Motta, 1999] “Reusable Components for Knowledge Modelling: Case Studies in Parametric Design”, E. Motta, IOS Press, 1999
 [Schreiber et al., 1994] “CommonKADS: A Comprehensive Methodology for KBS Development”, A. Th. Schreiber, B. Wielinga and J. M. Akkermans, W. Van de Vedle, and
 R. De Hoog, IEEE Expert , 9(6):28–37, 1994

[Stefik et al., 1983] “Basic Concepts for Building Expert Systems”, M. Stefik, J. Aitkins, R. Balzer, J. Benoit, L. Birnbaum, F. HayesRoth, and E. Sacerdoti in Building Expert Systems , AddisonWesley, 1983
References

[Smith & Lowry, 1990] “Algorithm Theories and Design Tactics”, D. R. Smith and M. R. Lowry in Science of Computer Programming , 14:305–321AddisonWesley, 1990

[Fensel et al, 2002] “The Unified Problemsolving Method Development Language UPML”, D. Fensel, E. Motta, R. Benjamins, M. Crubezy, S. Decker, M. Gaspari, R. Groenboom, W. Grosso, F. van Harmelen, M. Musen, E. Plaza, G. Schreiber, R. Studer, B. Wielinga, Knowledge and Information Systems, 5(1), 83–131, 2002

[Fensel & Bussler, 2002] “The Web Service Modeling Framework (WSMF)”, D. Fensel and C. Bussler, Electronic Commerce Research and Applications, 1(2), 2002

[Fensel et al., 2007] “Enabling Semantic Web Services:The Web Service Modeling Ontology (WSMO)”, D. Fensel, H. Lausen, A. Polleres, J. de Bruijn, M. Stollberg, D. Roman, and J. Domingue, Springer, 2007
References

[Brachman, 1979] “On the Epistemological Status of Semantic Networks” R. J. Brachman. In: N.V. Findler (ed.): Associative Networks: Representation and Use of Knowledge by Computers. New York: Academic Press, 1979, 350.

[Birmingham&Klinker, 1993] “Knowledge Acquisition Tools with Explicit ProblemSolving Methods”, W. Birmingham and G. Klinker. The Knowledge Engineering Review 8, 1 (1993), 525

[Feigenbaum, 1977] “The Art of Artificial Intelligence: Themes and Case Studies of Knowledge Engineering,” E.A. Feigenbaum. Proceedings of the International Joint Conference on Artificial Intelligence, Cambridge, MA, 1977

[Newell, 1981] “The Knowledge Level” Newell, AI Magazine 2 (2), 1981, p. 120
References

Wikipedia and other links:

http://en.wikipedia.org/wiki/Problem_solving

http://www.wsmo.org/

http://www.wsmo.org/wsml/

http://www.wsmx.org/

http://www.wsmo.org/papers/publications/wsmf.paper.pdf
Questions?
Intelligent Systems
Planning
Agenda

Motivation

Technical Solution

Illustration by a Larger Example

Extensions

Summary

References
Motivation
Motivation
 What is Planning?
 “Planning is the process of thinking about the activities required to create a desired goal on some scale” [Wikipedia]
 We take a more pragmatic view – planning is a flexible approach for taking complex decisions:
 decide about the schedule of a production line;
 decide about the movements of an elevator;
 decide about the flow of paper through a copy machine;
 decide about robot actions.
 By “flexible” we mean:
 the problem is described to the planning system in some generic language;
 a (good) solution is found fully automatically;
 if the problem changes, all that needs to be done is to change the description.
 We look at methods to solve any problem that can be described in the language chosen for the particular planning system.
An ‘easy’ planning example

Goal: Buy 1 liter of milk

It may sound like a simple task, but if you break it down, there are many small tasks involved:

obtain car keys,

obtain wallet,

exit home,

start car,

drive to store,

find and obtain milk,

purchase milk,

…



Logistics
 During the 1991 Gulf War, US forces deployed an AI logistics planning and scheduling program ( Dynamic Analysis and Replanning Tool ) that involved up to 50,000 vehicles, cargo, and people
 This one application alone reportedly more than offset all the money DARPA had funneled into AI research in the last 30 years.
Space Exploration
 Deep Space 1

NASA's onboard autonomous planning program controlled the scheduling of operations for a spacecraft
Space Exploration (cont’d)
 Mars Exploration Rover (MER)
 Autonomous planning, scheduling, control for Mars Exploration Rover.
 MER uses a combination of planning techniques including:
 Mixedinitiative planning that involves significant human participation
 Constraintbased  quantitative reasoning with metric time and other numerical quantities
TECHNICAL SOLUTIONS
Planning as State Space Search
 This lecture will consider planning as statespace search, for which there are several options:
 Forward from I (initial state) until G (goal state) is satisfied;
 Backward from G until I is reached;

Use a Heuristic function to estimate G or Idistance, respectively – prefer states that have a lower heuristic value.

It will also introduce partialorder planning as a more flexible approach
 In all cases there are three implicit concerns to consider:
 Representation – how the problem/state space and goal is defined;
 Algorithm – e.g. which (combination) of these approaches is used;
 Complexity and decidability – the feasibility of the approach.
REPRESENTATION
Planning – basic terminology

A problem to be solved by a planning system is called a planning problem

The world in which planning takes place is often called the domain or application domain.

A state is a point in the search space of the application

A planning problem is characterized by an initial state and a goal state .

The initial state is the state of the world just before the plan is executed

The goal state is the desired state of the world that we want to reach

after the plan has been executed. A goal state is also refer as simply goal
STRIPS

Stanford Research Institute Problem Solver (STRIPS) is an automated planner developed by Richard Fikes and Nils Nilsson in 1971 [Nilsson & Fikes, 1970]

STRIPS is also the name of the planning language used for the STRIPS planner
STRIPS

A state (excluding the goal state) is represented in STRIPS as a conjunction of functionfree ground literals
e.g. on(A, Table) ^ on (B, A) ^ ¬ clear(A) ^ clear(B)
(block A is on the Table, block B is on block A, block A has something on top, block B has nothing on top)

A conjunction of functionfree ground literals is also know as a fact

A goal is represented in STRIPS as a conjunction of literals that may contain variables
e.g. on(B, Table) ^ on (A, B) ^ clear(A) ^ ¬ clear(B)
(block B is on the Table, block A is on block B, block B has something on top, block A has nothing on top)
STRIPS
Definition – Action
Let P be a set of facts. An action a is a triple a = ( pre(a ), add(a ), del(a )) of subsets of P , where add(a ) \ del(a ) = Ø
pre(a ) , add(a ) , and del(a ) are called the action precondition , add list , and delete list , respectively
 In other words, an action is defined in terms of:
 Preconditions  conjunction of literals that need to be satisfiable before the action is performed
 Effects  conjunction of literals that need to satisfiable after the action is performed. An effect description includes:
 Add List: list of formulas to be added to the description of the new state resulting after the action is performed

Delete List: list of formulas to be deleted from the description of state before the action is performed
STRIPS

The action pickup in the Blocks World domain can be modeled as follows in STRIPS
pickup(x)
Preconditions : on(x, Table) ^ handEmpty ^ clear(x)
Add List : holding(x)
Delete List : on(x, Table) ^ handEmpty ^ clear(x)
STRIPS
Definition – Task
A task is a quadruple (P,A,I,G) where P is a (finite) set of facts, A is a (finite) set of actions, and I and G are subsets of P, I being the initial state and G the goal state.
 In other words, a task is defined in terms of:
 A (finite) set of facts
 A (finite) set of actions that the planner can perform. Each action is defined in terms of preconditions and effects (i.e. add list and delete list)
 An initial state i.e. the state of the world before the execution of the task

A goal state i.e. the desired state of the world that we want to reach
STRIPS

Definition – Result

Let
P
be a set of facts,
s
⊆
P
a state, and
a
an action. The
result
result(s
,
‹a›
)
of applying
a
to
s
is:

result(s
,
‹a›
) = (
s
∪
add(a)) \ del(a)
iff
pre(a)
⊆
s
; undefined otherwise

In the first case, the action
a
is said to be applicable in
s
. The result of applying an action sequence
‹
a
1
, ... ,
a
n
›
of arbitrary length to
s
is defined by
result(s
,
‹
a
1
, . . . ,
a
n
›
) =
result(result(s
,
‹
a
1
, . . . , a
n1
›
),
‹
a
n
›
)
and
result(s
,
‹›
) =
s
.
STRIPS
 In other words, the result of applying an action a in a given state s is a new state of the world that is described by the initial set of formulas describing state s to which the formulas from the add list are added and the formulas from delete list are removed. Action a can be performed only if action’s preconditions are fulfilled in state s

e.g.

state
s
= on(A, B) ^ handEmpty ^ clear(A)

action
a =
unstack
(x,y)

Preconditions
: on(x, y) ^ handEmpty ^ clear(x)

Add List
: holding(x) ^ clear(y)

Delete List
: on(x, y) ^ handEmpty ^ clear(x)
STRIPS

result(s
,
‹a›
) = (
s
∪
add(a)) \ del(a)

= (
on(A
, B) ^
handEmpty
^
clear(A
)
∪
holding(x
) ^
clear(y
)) \
on(x
,
y
) ^
handEmpty
^
clear(x
)

=
holding(A
) ^
clear(B
)

Precondition
on(x
,
y
) ^
handEmpty
^
clear(x
)
is fulfilled in state
s
=
on(A
, B) ^
handEmpty
^
clear(A
)

For convenience let’s denote
result(s
,
‹a›
)
with
s1

Given a second action
a1

action
a1 =
stack
(x,y)

Preconditions
: hoding(x) ^ clear(y)

Add List
: on(x,y) ^ handEmpty ^ clear(x)

Delete List
: holding(x) ^ clear(y)
STRIPS
result(s , ‹a , a1› ) = result(result(s , ‹ a › ), ‹a1› ) = result(s1, ‹a1› ) = s1 ∪ add(a1)) \ del(a1)
= ((holding (A) ^ clear(B ) ∪ holding(x ) ^ on(x,y ) ^ handEmpty ^ clear(x )) \ holding(x ) ^ clear(y )
= on(A,B ) ^ handEmpty ^ clear(A )
Precondition hoding(x ) ^ clear(y ) is fulfilled in state s1 = holding(A ) ^ clear(B )
STRIPS
 Definition – Plan
 In other words, a plan is an organized collection of actions.
 A plan is said to be a solution to a given problem if the plan is applicable in the problem’s initial state, and if after plan execution, the goal is true.
 Assume that there is some action in the plan that must be executed first. The plan is applicable if all the preconditions for the execution of this first action hold in the initial state.
 A task is called solvable if there is a plan, unsolvable otherwise

Let (P,A,I,G) be a task. An action sequence ‹a1, ... ,an› is a plan for the task iff G ⊆ result(I, ‹a1, ... ,an›).
STRIPS
Definition – Minimal Plan
Let (P,A,I,G) be a task. A plan ‹a_{1}, . . . ,a_{n}› for the task is minimal if ‹a_{1}, . . . , a_{i}1, a_{i}+1, . . . ,a_{n}› is not a plan for any i.
Definition – Optimal Plan
Let (P,A,I,G) be a task. A plan ‹a_{1}, . . . ,a_{n}› for the task is optimal if there is no plan with less than n actions
STRIPS

STRIPS

STRIPS
 Actions in “Block World”
 pickup(x)
 picks up block ‘ x ’ from table
 putdown(x )
 if holding block ‘ x ’, puts it down on table
 stack(x,y )
 if holding block ‘ x ’, puts it on top of block ‘ y ’
 unstack(x,y )
 picks up block ‘ x ’ that is currently on block ‘ y ’
 Action stack(x,y ) in STRIPS

stack
(x,y)

Preconditions
: hoding(x) ^ clear(y)

Add List
: on(x,y) ^ handEmpty ^ clear(x)

Delete List
: holding(x) ^ clear(y)
PDDL
 Planning Domain Description Language (PDDL)
 Based on STRIPS with various extensions
 Created, in its first version, by Drew McDermott and others
 Used in the biennial International Planning Competition (IPC) series
 The representation is lifted, i.e., makes use of variables these are instantiated with objects
 Actions are instantiated operators
 Facts are instantiated predicates
 A task is specified via two files: the domain file and the problem file
 The problem file gives the objects, the initial state, and the goal state
 The domain file gives the predicates and the operators; these may be reused for different problem files
 The domain file corresponds to the transition system, the problem files constitute instances in that system
PDDL

Blocks World Example
domain file
:

(define (domain blocksworld)

(:predicates (clear ?x)

(holding ?x)

(on ?x ?y))
(

:action stack

:parameters (?ob ?underob)

:precondition (and (clear ?underob) (holding ?ob))

:effect (and (holding nil) (on ?ob ?underob)
 (not (clear ?underob)) (not (holding ?ob)))

…
PDDL

Blocks World Example
problem file
:

(define (problem bwxy)

(:domain blocksworld)

(:objects nil table a b c d e)

(:init (on a table) (clear a)

(on b table) (clear b)

(on e table) (clear e)

(on c table) (on d c) (clear d)

(holding nil)

)

(:goal (and (on e c) (on c a) (on b d))))
ALGORITHMS
Algorithms

We now present particular algorithms related to (direct) statespace search:

Progression based search also know as forward search

Regression based search also know as backward search

Heuristic Functions and Informed Search Algorithms
Algorithms
Definition – Search Scheme
A search scheme is a tuple (S, s_{0}, succ, Sol):

the set S of all states s ∈ S,

the start state s_{0} ∈ S,

the successor state function succ : S → 2^{S}, and

the solution states Sol ⊆ S.

Note:

Solution paths s_{0} →, ... , → s_{n} ∈ Sol correspond to solutions to our problem

Progression
 Definition – Progression
 S = 2^{P} is the set of all subsets of P,
 s_{0} = I,
 succ : S → 2^{S} is defined by succ(s) = {s_{0} ∈ S  ∃a ∈ A: pre(a) ⊆ s, s_{0} = result(s, ‹a›)}
 Sol = {s ∈ S  G ⊆ s}

Let (P,A,I,G) be a task. Progression is the quadruple (S, s_{0}, succ, Sol):
 Progression algorithm:
 Start from initial state
 Find all actions whose preconditions are true in the initial state (applicable actions)
 Compute effects of actions to generate successor states

Repeat steps 23, until a new state satisfies the goal

Progression explores only states that are reachable from
I
; they may not be relevant for
G
Progression

Progression in “Blocks World“ – applying the progression algorithm

Step 1:

Initial state s0 = I = on(A, Table) ^ on(B, Table) ^ on(C, B) ^ clear(A) ^ clear(C) ^ handEmpty

Step 2:

Applicable actions: unstack(C,B), pickup(A)

Step 3:

result(s0, ‹unstack(C,B)›) = on(A, Table) ^ on(B, Table) ^ on(C, B) ^ clear(A) ^ clear(C) ^ handEmpty
∪
holding(x) ^ clear(y) \ on(x, y) ^ handEmpty ^ clear(x) = on(A, Table) ^ on(B, Table) ^ clear(A) ^ clear(B) ^ holding(C)

result(s0, ‹pickup(A)›) = on(A, Table) ^ on(B, Table) ^ on(C, B) ^ clear(A) ^ clear(C) ^ handEmpty
∪
holding(x) \ on(x, Table) ^ handEmpty ^ clear(x) = on(B, Table) ^ on(C, B) ^ clear(C) ^ holding(A)
Regression

Definition – Regression

Let
P
be a set of facts,
s
⊆
P
, and
a
an action. The regression
regress(s
, a)
of
s
through
a
is:

regress(s
, a) = (
s
\
add(a
))
∪
pre(a
)
if
add(a)
∩
s
≠
Ø,
del(a
)
∩
s
= Ø
; undefined otherwise

In the first case,
s
is said to be regressable through a.
 If s \ add(a ) = Ø ; then a contributes nothing
 If s \ del(a ) ≠ Ø ; then we can’t use a to achieve ∧ _{ p ∈ s } p
Regression
Regression algorithm:

Start with the goal

Choose an action that will result in the goal

Replace that goal with the action’s preconditions

Repeat steps 23 until you have reached the initial state
 Regression explores only states that are relevant for G; they may not be reachable from I

Regression is not as nonnatural as you might think: if you plan a trip to Thailand, do you start with thinking about the train to Frankfurt?
Regression

Regression in “Blocks World“ – applying the regression algorithm

Step 1:

Goal state s
_{
n
}
= G = on(B, Table) ^ on(C, B) ^ on(A, C) ^ clear(A) ^ handEmpty

Step 2:

Applicable actions: stack(A,C)

Step 3:

regress(G, stack(A,C)) =
G \
add(stack
)
∪
pre(stack
)
=
on(B, Table) ^ on(C, B) ^ on(A, C) ^ clear(A) ^ handEmpty
\
on(A,C) ^ handEmpty ^ clear(A)
∪
hoding(A) ^ clear(C) = on(B, Table) ^ on(C, B) ^ holding(A) ^ clear(C)
Heuristic Functions
Definition – Heuristic Function
Let (S, s _{ 0 } , succ , Sol) be a search scheme. A heuristic function is a function h : S → N _{ 0 } ∪ { ∞ } from states into the natural numbers including 0 and the ∞ symbol. Heuristic functions , or heuristics , are efficiently computable functions that estimate a state’s “solution distance”.
Heuristic Functions
A heuristic function for Blocks World domain could be:
h1(s) = n – sum(b(i )) , i =1… n , where
b(i )=0 in state s if the block i is resting on a wrong thing
b(i )=1 in state s if the block i is resting on the thing it is supposed to be resting on;
n is the total number of blocks
h1(I) = 3 – 1 (because of block B) – 0 (because of block C) – 0 (because of block A) = 2
h1(G) = 3 – 1 (because of block B) – 1 (because of block C) – 1 (because of block A) = 0
Heuristic Functions
Definition – Solution Distance
Let (S, s _{ 0 } , succ , Sol) be a search scheme. The solution distance sd(s ) of a state s ∈ S is the length of a shortest succ path from s to a state in Sol , or 1 if there is no such path. Computing the real solution distance is as hard as solving the problem itself.
Heuristic Functions

In Progression search:

sd(I
) = 4,
sd(result(I
,
‹
pickup(C)
›
)) = 3,
sd(result(result(I
,
‹
pickup(C)
›
)),
‹stack
(C,B)
›
) = 2, …
for all other s ∈ succ(I)

In Regression search:

sd(G
) = 4,
sd(regress(G
,
stack
(A,C)
)) = 3,
sd(regress(regress(G
,
stack
(A,C)
)),
pickup
(A
)
) = 2, …
Heuristic Functions

Definition – Admissible Heuristic Function

In other words, an admissible heuristic function is a heuristic function that never overestimates the actual cost i.e. the solution distance.

e.g. h1 heuristic function defined below is admissible

b(i )=0 in state s if the block i is resting on a wrong thing

b(i )=1 in state s if the block i is resting on the thing it is supposed to be resting on;

n is the total number of blocks
Let (S, s_{0} , succ , Sol) be a search scheme. A heuristic function h is admissible if h(s ) ≤ sd(s ) for all s ∈ S .
h1(s) = n – sum(b(i )) , i =1… n , where
COMPLEXITY AND DECIDABILITY
Complexity and Decidability

Its time complexity : in the worst case, how many search states are expanded?

Its space complexity : in the worst case, how many search states are kept in the open list at any point in time?

Is it complete , i.e. is it guaranteed to find a solution if there is one?

Is it optimal , i.e. is it guaranteed to find an optimal solution?
Complexity and Decidability
Definition  PLANSAT
Let PLANSAT denote the following decision problem. Given a STRIPS task (P, A, I, G), is the task solvable?
Theorem
PLANSAT is PSPACEcomplete. Proof in see [Bylander, 1994]
Definition – Bounded PLANSAT
Let BoundedPLANSAT denote the following decision problem. Given a STRIPS task (P, A, I, G) and an integer b . Is there a plan with at most b actions?
ILLUSTRATION BY A LARGER EXAMPLE
PROGRESSION
Progression

Progression in “Blocks World“ – applying the progression algorithm

Step1:

Initial state s
_{
0
}
= I = on(A, Table) ^ on(B, Table) ^ on(C, Table) ^ on(D, C) ^ clear(A) ^ clear(B) ^ clear(D) ^ handEmpty

Step 2:

Applicable actions: pickup(A), pickup(B), unstack(D,C)

Step 3:

s11 = result(s
_{
0
}
, ‹pickup(A)›) = on(A, Table) ^ on(B, Table) ^ on(C, Table) ^ on(D, C) ^ clear(A) ^ clear(B) ^ clear(D) ^ handEmpty
∪
holding(x) \ on(x, Table) ^ handEmpty ^ clear(x) = on(B, Table) ^ on(C, Table) ^ on(D, C) ^ clear(B) ^ clear(D) ^ holding(A)
Progression
s12 = result(s0, ‹pickup(B)›) = on(A, Table) ^ on(B, Table) ^ on(C, Table) ^ on(D, C) ^ clear(A) ^ clear(B) ^ clear(D) ^ handEmpty ∪ holding(x) \ on(x, Table) ^ handEmpty ^ clear(x) = on(A, Table) ^ on(C, Table) ^ on(D, C) ^ clear(A) ^ clear(D) ^ holding(B)
s13 = result(s0, ‹unstack(D,C)›) = on(A, Table) ^ on(B, Table) ^ on(C, Table) ^ on(D, C) ^ clear(A) ^ clear(B) ^ clear(D) ^ handEmpty ∪ holding(x) ^ clear(y) \ on(x, y) ^ handEmpty ^ clear(x) = on(A, Table) ^ on(B, Table) ^ on(C, Table) ^ clear(A) ^ clear(B) ^ clear(C) ^ holding(D)
Progression

2
^{nd}
iteration

Step1:

s11 = on(B, Table) ^ on(C, Table) ^ on(D, C) ^ clear(B) ^ clear(D) ^ holding(A)

Step 2:

Applicable actions:
putdown (A)
, stack(A,D)

Step 3:

s111 = result(s11, ‹putdown (A)›) = on(B, Table) ^ on(C, Table) ^ on(D, C) ^ clear(B) ^ clear(D) ^ holding(A)
∪
on(x, Table) ^ handEmpty ^ clear(x)
\ holding(x) = on(A, Table) ^ on(B, Table) ^ on(C, Table) ^ on(D, C) ^ clear(A) ^ clear(B) ^ clear(D) ^ handEmpty

we are back in the initial state!
Progression

3
^{rd}
iteration

Step1:

s11 = on(B, Table) ^ on(C, Table) ^ on(D, C) ^ clear(B) ^ clear(D) ^ holding(A)

Step 2:

Applicable actions: putdown (A),
stack(A,D
)

Step 3:

s112 = result(s11, ‹stack (A,D)›) = on(B, Table) ^ on(C, Table) ^ on(D, C) ^ clear(B) ^ clear(D) ^ holding(A)
∪
on(x, y) ^ handEmpty ^ clear(x)
\ holding(x) ^ clear(y)= on(B, Table) ^ on(C, Table) ^ on(D, C) ^ on(A,D) ^ handEmpty ^ clear(A) ^ clear(B)
Progression

4
^{th}
iteration

Step1:

s_{12} = on(A, Table) on(C, Table) on(D, C) ^ clear(A) ^ clear(D) ^ holding(B)

Step 2:

Applicable actions: putdown (B), stack(B,D)

Step 3:

s_{121} = result(s_{12}, 〈putdown (B)〉) = on(A, Table) on(C, Table) on
(D, C) ^ clear(A) ^ clear(D) ^ holding(B) ∪ on(x, Table) ^
handEmpty ^ clear(x) \ holding(x) = on(A, Table) on(B, Table) on
(C, Table) on(D, C) ^ clear(A) ^ clear(B) ^ clear(D) ^ handEmpty
we are back in the initial state!
Progression

5
^{th}
iteration

Step1:

s_{12} = on(A, Table) on(C, Table) on(D, C) ^ clear(A) ^ clear(D) ^ holding(B)

Step 2:

Applicable actions: putdown (B), stack(B,D)

Step 3:

s_{122} = result(s_{11}, 〈stack (B,D)〉) = on(A, Table) on(C, Table) on(D,
C) ^ clear(A) ^ clear(D) ^ holding(B) ∪ on(x, y) ^ handEmpty ^
clear(x) \ holding(x) ^ clear(y)= on(A, Table) on(C, Table) on(D,
C) on(B,D) ^ handEmpty ^ clear(A) ^ clear(B)
Progression

6
^{th}
iteration

Step1:

s_{13} = on(A, Table) on(B, Table) on(C, Table) ^ clear(A) ^ clear(B)
^ clear(C) ^ holding(D)

Step 2:

Applicable actions: putdown(D), stack(D,A), stack(D,B), stack(D,C)

Step 3:

s_{131} = result(s_{13}, 〈putdown(D)〉) = on(A, Table) on(B, Table) on(C,
Table) ^ clear(A) ^ clear(B) ^ clear(C) ^ holding(D) ∪ on(x, Table) ^
handEmpty ^ clear(x) \ holding(x) = on(A, Table) on(B, Table) on
(C, Table) on(D, Table) ^ clear(A) ^ clear(B) ^ clear(C) ^ clear(D) ^
handEmpty
 In iterations 7,8,9, new states are obtained from s_{13} by applying the
actions stack(D,A), stack(D,B), stack(D,C)
Progression

10
^{
th
}
iteration

Step1:

s131 = on(A, Table) on(B, Table) on(C, Table) on(D, Table) ^
clear(A) ^ clear(B) ^ clear(C) ^ clear(D) ^ handEmpty

Step 2:

Applicable actions: pickup(A), pickup(B),
pickup(C)
, pickup(D)

Step 3:

s
_{
1311
}
= result(s
_{
131
}
, 〈putdown(C)〉) = on(A, Table) on(B, Table) on
(C, Table) on(D, Table) ^ clear(A) ^ clear(B) ^ clear(C) ^ clear(D) ^
handEmpty ∪ holding(x) \ on(x, Table) ^ clear(x) ^ handEmpty = on
(A, Table) on(B, Table) on(D, Table) ^ clear(A) ^ clear(B) ^ clear
(D) ^ holding(C)
Progression

n
^{
th
}
iteration

Step1:

s
_{
n1
}
= on(A, Table) on(D, Table) on(B,C) ^ clear(A) ^ clear(B) ^
holding(C)

Step 2:

Applicable actions: putdown(C),
stack(C,A)
, stack(C,B)

Step 3:

s
_{
n
}
= result(s
_{
n1
}
, 〈stack(C,A)〉) = on(A, Table) on(D, Table) on
(B,C) ^ clear(A) ^ clear(B) ^ holding(C) ∪ on(x,y) ^ clear(x) ^
handEmpty \ clear(y) ^ holding(x) = on(A, Table) on(D, Table) on
(C, A) on(B,D) ^ clear(C) ^ clear(B) ^ handEmpty
s _{ n } is the goal state G
REGRESSION
Regression

Regression in “Blocks World“ – applying the regression algorithm

1
^{
th
}
iteration

Step1:

Goal state sn = G = on(A, Table) ôn(D, Table) ôn(C, A) ôn(B,D) ^
clear(C) ^ clear(B) ^ handEmpty

Step 2:

Applicable actions: stack(C,A), stack(B,D)
Regression
Step 3:
sn1=regress(G, stack(C,A)) = G \ add(stack ) ∪ pre(stack ) = on(A, Table) ^ on(D, Table) ^ on(C, A) ^ on(B,D) ^ clear(C) ^ clear(B) ^ handEmpty \ on(C,A) ^ handEmpty ^ clear(C) ∪ holding(C) ^ clear(A) = on(A, Table) ^ on(D, Table) ^ on(B, D) ^ clear(B) ^ clear(A) ^ holding(C)
sn2=regress(G, stack(B,D)) = G \ add(stack ) ∪ pre(stack ) = on(A, Table) ^ on(D, Table) ^ on(C, A) ^ on(B,D) ^ clear(C) ^ clear(B) ^ handEmpty \ on(B,D) ^ handEmpty ^ clear(B) ∪ holding(B) ^ clear(D) = on(A, Table) ^ on(D, Table) ^ on(C, A) ^ clear(C) ^ clear(D) ^ holding(B)
Regression

2
^{
nd
}
iteration

Step1:

s
_{
n1
}
= on(A, Table) on(D, Table) on(B, D) ^ clear(B) ^ clear(A) ^ holding
(C)

Step 2:

Applicable actions:
pickup(C)
, unstack(C,A)

Step 3:

s
_{
n2
}
=regress(s
_{
n1
}
, pickup(C)) = s
_{
n1
}
\ add(pickup) ∪ pre(pickup) =on(A,
Table) on(D, Table) on(B, D) ^ clear(B) ^ clear(A) ^ holding(C) \ holding
(C) ∪ clear(C) on(C, Table) ^ handEmpty = on(A, Table) on(D, Table) ^
on(B, D) ^ clear(B) ^ clear(A) ^ clear (C) on(C, Table) ^ handEmpty
Regression

3
^{
rd
}
iteration

Step1:

s_{n1} = on(A, Table) on(D, Table) on(B, D) ^ clear(B) ^ clear(A) ^ holding
(C)

Step 2:

Applicable actions: pickup(C), unstack(C,A)

Step 3:

s_{n3}=regress(s_{n1}, unstack(C,A)) = s_{n1} \ add(unstack) ∪ pre(unstack) = on
(A, Table) on(D, Table) on(B, D) ^ clear(B) ^ clear(A) ^ holding(C) \
holding(C) ^ clear(A) ∪ clear(C) on(C, A) ^ handEmpty = on(A, Table) ^
on(D, Table) on(B, D) ^ clear(B) ^ clear (C) on(C, A)
 we are back in the goal state!
Regression

4
^{
th
}
iteration

Step1:

s_{n2} = on(A, Table) on(D, Table) on(B, D) ^ clear(B) ^ clear(A) ^ clear
(C) on(C, Table) ^ handEmpty

Step 2:

Applicable actions: putdown(C), putdown(A), stack(B,D)

Step 3:

s_{n4}=regress(s_{n2}, stack(B,D)) = s_{n2} \ add(stack) ∪ pre(stack) = on(A, Table) ^
on(D, Table) on(B, D) ^ clear(B) ^ clear(A) ^ clear (C) on(C, Table) ^
handEmpty \ on(B,D) ^ clear(B) ^ handEmpty ∪ clear(D) ^ holding(B) = on
(A, Table) on(D, Table) on(C, Table) ^ clear(A) ^ clear (C) ^ clear(D) ^
holding(B)
Regression

n
^{
th
}
iteration

Step1:

s_{1} = on(A, Table) on(B, Table) on(C, Table) ^ clear(A) ^ clear(B) ^ clear
(C) ^ holding(D)

Step 2:

Applicable actions: pickup(D), unstack(D,C)

Step 3:

s_{0}=regress(s_{1}, unstack(D,C)) = s_{1} \ add(unstack) ∪ pre(unstack) =on(A,
Table) on(B, Table) on(C, Table) ^ clear(A) ^ clear(B) ^ clear (C) ^
holding(D) \ holding(D) ^ clear(C) ∪ on(D,C) ^ clear(D) ^ handEmpty = on
(A, Table) on(B, Table) on(C, Table) on(D, C) ^ clear(A) ^ clear(B) ^
clear(D) ^ handEmpty
EXTENSIONS
PartialOrder Planning

Forward and backward statespace search are both forms of totallyordered plan search – explore linear sequences of actions from initial state or goal.

As such they cannot accomodate problem decomposition and work on subgraphs separately.
 A partialorder planner depends on:
 A set of ordering constraints , wherein A ≺ B means that A is executed some time before B (not necessarily directly before)
 A set of causal links , A –p> B, meaning A achieves p for B

A set of open preconditions not achieved by any action in the current plan
PartialOrder Plans


Partial Order Plan:

Total Order Plans:




ConstraintBased Search

Compared with a classical search procedure, viewing a problem as one of constraint satisfaction can reduce substantially the amount of search.

Constrainbased search involved complex numerical or symbolic variable and state constraints

ConstraintBased Search:
1. Constraints are discovered and propagated as far as possible.
2. If there is still not a solution, then search begins, adding new constraints.
ConstraintBased Search

Constrain the task by a time horizon, b

Formulate this as a constraint satisfaction problem and use backtracking methods to solve it: “do x at time t yes/no”, “do y at time t ’ yes/no”,

If there is no solution, increment b

Inside backtracking : constraint propagation, conflict analysis are used

ConstraintBased Searcg is an “undirected” search i.e. there is no fixed timeorder to the decisions done by backtracking
SUMMARY
Planning Summary
 Planning is a flexible approach for taking complex decisions:
 the problem is described to the planning system in some generic language;
 a (good) solution is found fully automatically;
 if the problem changes, all that needs to be done is to change the description.
 Following concerns must be addressed:
 Syntax  we have examined representations in PDDL and STRIPS
 Semantics – we have examined progression, regression and the use of heuristics in algorithms
 Complexity and decidability
REFERENCES
References
 Mandatory reading:
 D. McDermott. “PDDL – the planning domain definition language”, Technical Report CVC TR98003/DCS TR1165, Yale Center for Computational Vision and Control, 1998.
 N. Nilsson and R. Fikes. “STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving”, SRI Technical Note 43, 1970. http://www.ai.sri.com/pubs/files/tn043rfikes71.pdf
 Further reading:
 S. Russell and P. Norvig. “AI: A Modern Approach” (2nd Edition), Prentice Hall, 2002
 G. Malik; D.S. Nau, P. Traverso (2004), Automated Planning: Theory and Practice, Morgan Kaufmann, ISBN 1558608567
 T .Bylander,The Computationa lComplexity of Propositional STRIPS Planning, Artificial Intelligence Journal, 1994
 Wikipedia links:
 http://en.wikipedia.org/wiki/Automated_planning_and_scheduling
 http://en.wikipedia.org/wiki/STRIPS
 http://en.wikipedia.org/wiki/Hierarchical_task_network
 http://en.wikipedia.org/wiki/Planning_Domain_Definition_Language
Questions?
Intelligent Systems
Software Agents
Agenda
 Motivation
 Technical solution and illustrations
 Definitions
 Properties
 Environments
 Agents as intentional systems
 Abstract architecture
 Multiagent systems
 Large example: Robocup
 Extensions
 Summary
 References
MOTIVATION
Motivation
 Do tasks that normally human user needs to do automatically.
 Agents are supposed to carry out (probably complex) tasks autonomously.
 Idea: “intelligent agents” react to changes and act in order to reach specific goals.
Motivation
 The agent perceives the environment via sensors and influences it through actuators.
 Agents can carry out tasks autonomously and react to changes in its environment.
Motivating example

TECHNICAL SOLUTION AND ILLUSTRATIONS
Definition
Definition
 There are many different definitions from various areas, such as software engineering, classical logics, logic programming, robotic:
 Genesereth/Ketchpel: A program is a software agent if it communicates correctly in an agent language, such as ACL (Agent Communication Language) or KQML (Knowledge Query and Manipulation Language).
 BDI (Belief, Desire and Intentions) agents are described by their believes, desires, and intentions. This complies with three modalities of a complex modal logic, which can be found in the data structures of the system.
Definition
 Kowalski follows a traditional approach of logic based agents and uses logic programming for the implementation of agents.
 Shoham‘s definition is more focused: A hard or software is an agent if one analyses it with the help of mental terms.
 Wooldridge/Jennings consider hard or software an agent if it is:
 autonomous (independently follows its goals)
 social (is cooperating with a human or other agents)
 proactive (takes initiative) und
 reactive (perceives its environment and reacts to changes).
 An agent is a computer system capable of autonomous action in some environment in order to meet its design objectives.
Definition
 Autonomous entities that perceive their environment and act upon it.
 Autonomy is the ability to control their own behavior and act without human intervention.
 Agents pursue goals in a such a way as to optimize some given performance measure
 They operate flexibly and rationally in a variety of circumstances
 Does NOT include omniscience, omnipotence, or perfection
Properties
Properties
 Interaction
 Reactivity
 Proactiveness
 Balancing Reactive and GoalOriented Behavior
 Social Ability
 Mobility
 Veracity
 Benevolence
 Rationality
 Learning/adaption
Interaction
 Agents may be affected by other agents (including humans) in pursuing their goals
 May take place directly via a communication language
 May take place indirectly via the environment
 Agents sense the actions of other agents and react accordingly
Reactivity
 If a program’s environment is guaranteed to be fixed, the program need never worry about its own success or failure – program just executes blindly
 Example of fixed environment: compiler
 The real world is not like that: things change, information is incomplete. Many (most?) interesting environments are dynamic
 Software is hard to build for dynamic domains: program must take into account possibility of failure – ask itself whether it is worth executing!
 A reactive system is one that maintains an ongoing interaction with its environment, and responds to changes that occur in it (in time for the response to be useful)
Proactiveness
 Reacting to an environment is easy (e.g., stimulus → response rules)
 But we generally want agents to do things for us
 Hence goal directed behavior
 Proactiveness = generating and attempting to achieve goals; not driven solely by events; taking the initiative
 Recognizing opportunities
Balancing Reactive and GoalOriented Behavior
 We want our agents to be reactive, responding to changing conditions in an appropriate (timely) fashion
 We want our agents to systematically work towards longterm goals
 These two considerations can be at odds with one another
 Designing an agent that can balance the two remains an open research problem
Social Ability
 The real world is a multiagent environment: we cannot go around attempting to achieve goals without taking others into account
 Some goals can only be achieved with the cooperation of others
 Similarly for many computer environments: witness the Internet
 Social ability in agents is the ability to interact with other agents (and possibly humans) via some kind of agentcommunication language, and perhaps cooperate with others
Other Properties
 mobility : the ability of an agent to move around an electronic network
 veracity : an agent will not knowingly communicate false information
 benevolence : agents do not have conflicting goals, and that every agent will therefore always try to do what is asked of it
 rationality : agent will act in order to achieve its goals, and will not act in such a way as to prevent its goals being achieved — at least insofar as its beliefs permit
 learning/adaption : agents improve performance over time
Other properties, sometimes discussed in the context of agents:
Environments
Environments – Accessible vs. inaccessible
 An accessible environment is one in which the agent can obtain complete, accurate, uptodate information about the environment’s state
 Most moderately complex environments (including, for example, the everyday physical world and the Internet) are inaccessible
 The more accessible an environment is, the simpler it is to build agents to operate in it
Environments –Deterministic vs. nondeterministic
 A deterministic environment is one in which any action has a single guaranteed effect — there is no uncertainty about the state that will result from performing an action
 The physical world can to all intents and purposes be regarded as nondeterministic
 Nondeterministic environments present greater problems for the agent designer
Environments  Episodic vs. nonepisodic
 In an episodic environment, the performance of an agent is dependent on a number of discrete episodes, with no link between the performance of an agent in different scenarios
 Episodic environments are simpler from the agent developer’s perspective because the agent can decide what action to perform based only on the current episode — it need not reason about the interactions between this and future episodes
Environments  Static vs. dynamic
 A static environment is one that can be assumed to remain unchanged except by the performance of actions by the agent
 A dynamic environment is one that has other processes operating on it, and which hence changes in ways beyond the agent’s control
 Other processes can interfere with the agent’s actions (as in concurrent systems theory)
 The physical world is a highly dynamic environment
Environments – Discrete vs. continuous
 An environment is discrete if there are a fixed, finite number of actions and percepts in it
 Russell and Norvig give a chess game as an example of a discrete environment, and taxi driving as an example of a continuous one
 Continuous environments have a certain level of mismatch with computer systems
 Discrete environments could in principle be handled by a kind of “lookup table”
Agents as intentional systems
Agents as Intentional Systems
 When explaining human activity, it is often useful to make statements such as the following:
 Janine took her umbrella because she believed it was going to rain.
 Michael worked hard because he wanted to possess a PhD.
 These statements make use of a folk psychology, by which human behavior is predicted and explained through the attribution of attitudes, such as believing and wanting (as in the above examples), hoping, fearing, and so on.
 The attitudes employed in such folk psychological descriptions are called the intentional notions.
Agents as Intentional Systems
 The philosopher Daniel Dennett coined the term intentional system to describe entities ‘whose behavior can be predicted by the method of attributing belief, desires and rational acumen’
 Dennett identifies different ‘grades’ of intentional system:‘A firstorder intentional system has beliefs and desires (etc.) but no beliefs and desires about beliefs and desires. …A secondorder intentional system is more sophisticated; it has beliefs and desires (and no doubt other intentional states) about beliefs and desires (and other intentional states) — both those of others and its own’
Agents as Intentional Systems
 The intentional notions are thus abstraction tools, which provide us with a convenient and familiar way of describing, explaining, and predicting the behavior of complex systems
 Remember: most important developments in computing are based on new abstractions:
 procedural abstraction
 abstract data types
 objects
 Agents, and agents as intentional systems, represent a further, and increasingly powerful abstraction
 So agent theorists start from the (strong) view of agents as intentional systems: one whose simplest consistent description requires the intentional stance
Agents as Intentional Systems
 This intentional stance is an abstraction tool — a convenient way of talking about complex systems, which allows us to predict and explain their behavior without having to understand how the mechanism actually works
 Now, much of computer science is concerned with looking for abstraction mechanisms (witness procedural abstraction, ADTs, objects,…)
 So why not use the intentional stance as an abstraction tool in computing — to explain, understand, and, crucially, program computer systems?
 This is an important argument in favor of agents
Agent architecture
Abstract Architecture for Agents
 Assume the environment may be in any of a finite set E of discrete, instantaneous states:
 Agents are assumed to have a repertoire of possible actions available to them, which transform the state of the environment:
 A run , r , of an agent in an environment is a sequence of interleaved environment states and actions:
Abstract Architecture for Agents
 Let:
 R be the set of all such possible finite sequences (over E and Ac )
 R ^{Ac} be the subset of these that end with an action
 R ^{E} be the subset of these that end with an environment state
State Transformer Functions
 A state transformer function represents behavior of the environment:
 Note that environments are…
 history dependent
 nondeterministic
 If τ( r )=∅, then there are no possible successor states to r . In this case, we say that the system has ended its run
 Formally, we say an environment Env is a triple Env = ⟨ E , e_{0} , τ ⟩ where: E is a set of environment states, e_{0} ∈ E is the initial state, and τ is a state transformer function
Agents
 Agent is a function which maps runs to actions:
Systems
 A system is a pair containing an agent and an environment
 Any system will have associated with it a set of possible runs; we denote the set of runs of agent Ag in environment Env by R (Ag, Env)
 (We assume R (Ag, Env) contains only terminated runs)
Systems

Formally, a sequence
 e_{0} is the initial state of Env
 α_{0} = Ag ( e_{0} ); and
 For u > 0,
Purely Reactive Agents
 Some agents decide what to do without reference to their history — they base their decision making entirely on the present, with no reference at all to the past
 We call such agents purely reactive :
 A thermostat is a purely reactive agent
Perception
 Now introduce perception system:
Perception
 The see function is the agent’s ability to observe its environment, whereas the action function represents the agent’s decision making process
 Output of the see function is a percept :

see
:
E
→
Per

action
:
Per*
→
A
Agents with State
 We now consider agents that maintain state:
Agents with State
 These agents have some internal data structure, which is typically used to record information about the environment state and history.Let I be the set of all internal states of the agent.
 The perception function see for a statebased agent is unchanged:

see
:
E
→
Per

action
:
I
→
Ac

next
:
I
×
Per
→
I
Agent Control Loop
 Agent starts in some initial internal state i_{0}
 Observes its environment state e , and generates a percept see ( e )
 Internal state of the agent is then updated via next function, becoming next ( i_{0} , see ( e ))
 The action selected by the agent is action ( next ( i_{0} , see ( e )))
 Goto 2
Tasks for Agents
 We build agents in order to carry out tasks for us
 The task must be specified by us…
 But we want to tell agents what to do without telling them how to do it
Utility Functions over States
 One possibility: associate utilities with individual states — the task of the agent is then to bring about states that maximize utility
 A task specification is a function

u
:
E
→ ◊
Utility Functions over States
 But what is the value of a run…
 minimum utility of state on run?
 maximum utility of state on run?
 sum of utilities of states on run?
 average?
 Disadvantage: difficult to specify a long term view when assigning utilities to individual states (One possibility: a discount for states later on.)
Utilities over Runs
 Another possibility: assigns a utility not to individual states, but to runs themselves:
 Such an approach takes an inherently long term view
 Other variations: incorporate probabilities of different states emerging
 Difficulties with utilitybased approaches:
 where do the numbers come from?
 we don’t think in terms of utilities!
 hard to formulate tasks in these terms

u
: R → ◊
Expected Utility & Optimal Agents

Write
P
(
r

Ag
,
Env
) to denote probability that run
r
occurs when agent
Ag
is placed in environment
Env
Note:  Then optimal agent Ag _{opt} in an environment Env is the one that maximizes expected utility :
Bounded Optimal Agents

Some agents cannot be implemented on some computers
(A function Ag : R ^{ E } → Ac may need more than available memory to implement)  Write AG _{ m } to denote the agents that can be implemented on machine (computer) m :
 We can replace equation (1) with the following, which defines the bounded optimal agent Ag _{ opt } :
Predicate Task Specifications
 A special case of assigning utilities to histories is to assign 0 (false) or 1 (true) to a run
 If a run is assigned 1, then the agent succeeds on that run, otherwise it fails
 Call these predicate task specifications

Denote predicate task specification by Ψ.
Thus Ψ : R → {0, 1}.
Task Environments

A
task environment
is a pair ⟨
Env
, Ψ ⟩ where
Env
is an environment,
 A task environment specifies:
 the properties of the system the agent will inhabit
 the criteria by which an agent will be judged to have either failed or succeeded
 Ψ : R → {0, 1}
Task Environments
 Write R _{ Ψ } ( Ag , Env ) to denote set of all runs of the agent Ag in environment Env that satisfy Ψ:
 We then say that an agent Ag succeeds in task environment ⟨ Env , Ψ ⟩ if
Achievement & Maintenance Tasks
 Two most common types of tasks are achievement tasks and maintenance tasks:
 Achievement tasks are those of the form “achieve state of affairs Φ”
 Maintenance tasks are those of the form “maintain state of affairs Ψ”
Achievement & Maintenance Tasks

An achievement task is specified by a set
G
of “good” or “goal” states:
G
⊆
E
The agent succeeds if it is guaranteed to bring about at least one of these states (we do not care which one — they are all considered equally good). 
A maintenance goal is specified by a set
B
of “bad” states:
B
⊆
E
The agent succeeds in a particular environment if it manages to avoid all states in B — if it never performs actions which result in any state in B occurring
Agent Synthesis

Agent synthesis
is automatic programming: goal is to have a program that will take a task environment, and from this task environment automatically generate an agent that succeeds in this environment:
 Synthesis algorithm is:
 sound if, whenever it returns an agent, then this agent succeeds in the task environment that is passed as input
 complete if it is guaranteed to return an agent whenever there exists an agent that will succeed in the task environment given as input
Agent Synthesis

Synthesis algorithm
syn
is sound if it satisfies the following condition:
Multiagent systems
Multi Agent Systems
 Traditional Multiagent Systems
 Several agents coordinate their knowledge and activities by reasoning about the problem solving process
 Distributed Problem Solving
 A particular problem is solved by dividing tasks among a number of generally equivalent nodes who divide and share knowledge about the problem
 Modern multiagent systems actually cover both
Characteristics of Multiagent Systems
 Each agent has incomplete information
 Control is decentralized
 Data is decentralized
 Computation is asynchronous
Diversity of Multiagent Systems
Agents 
Number Uniformity Goals Architecture Abilities (sensor & effectors) 

Interaction 
Frequency Persistence Level Pattern (flow of control) Variability Purpose 

Environment 
Predictability Accessibility Dynamics Diversity Availability of resources 
Common Application Characteristics
 Inherent Distribution
 Geographically
 Temporally
 Semantics – requires different ontologies and languages
 Functional – requires different cognitive capabilities
 Inherent Complexity
 Too large to be solved by single, centralized system
Properties of Multiagent Systems
Speed & Efficiency
Robustness & Reliability
Scalability & Flexibility
Cost
Distributed Development
Reusability
Challenging Issues in MultiAgent Systems
 When and how should agents interact – cooperate and compete – to successfully meet their design objectives?
 Two approaches
 Bottom up – search for specific agentlevel capabilities that result in sufficient group capabilities
 Top down – search for grouplevel conventions that appropriately constrain interaction at the agent level
 Leads to several interesting issues …
Challenging Issues in MultiAgent Systems

How to enable agents to decompose their tasks and goals (and allocate subgoals and subtasks to other agents) and synthesize partial results

How to enable agents to communicate, what languages and protocols to use

How to enable agents to represent and reason about the actions, plans, and knowledge of other agents in order to interact with them
Challenging Issues in MultiAgent Systems
How to enable agents to represent and reason about the state of their interactions
How to enable agents to recognize and handle conflicts between agents
How to engineer practical multiagent systems
Challenging Issues in MultiAgent Systems
How to effectively balance local computational versus communication
How to avoid or mitigate harmful (chaotic or oscillatory) system wide behavior
How to enable agents to negotiate and contract with each other
How to form and dissolve organizational structures to meet specific goals and objectives
Challenging Issues in MultiAgent Systems
How to formally describe multiagent systems and the interaction between agents and how to ensure multiagent systems are correctly specified
 How to realize intelligent processes such as problem solving, planning, decision making, and learning in a multiagent systems context
Applications of Multiagent Systems
Electronic commerce
Realtime monitoring and control of networks
Modeling and control of transportation systems
Information handling
Automatic meeting scheduling
Applications of Multiagent Systems (cont.)
Industrial manufacturing and production
Electronic entertainment
Reengineering of information flow in large organizations
Investigation of complex social phenomena such as evolution of roles, norms, and organizational structures
LARGE EXAMPLE
Overview of RoboCup
 The Robot World Cup Initiative

“By the year 2050, develop a team of fully autonomous humanoid robots that can win against the human world soccer champion team.”
www.robocup.org  A standard problem for AI research
 Started in 1992 as the Robot JLeague (Japan)
 First games and conferences in 1997
 Workshops, conferences and yearly competition

Check out this YouTube video:
http://www.youtube.com/watch?v=VaXtnqjk4Bc
(Graz 2009)

Slides adapted from Kevin Lam
RoboCup Leagues
Robocup Leagues: humanoid

Robocup Leagues: small size
 A Small Size robot soccer game takes place between two teams of five robots each.
 Each robot must conform to the dimensions as specified in the F180 rules:
 The robot must fit within an 180mm diameter circle.
 It must be no higher than 15cm unless they use onboard vision.
 Robots come in two flavours, those with local onboard vision sensors and those with global vision.
 Global vision robots use an overhead camera and offfield PC to identify and track the robots as they move around the field.
 Local vision robots have their sensing on the robot itself.
 The vision information is either processed onboard the robot or is transmitted back to the offfield PC for processing.
 Communications is wireless and typically uses dedicated commercial FM transmitter/receiver units.
Robocup Leagues: middle size
 The middle size league involves robots that must fit inside a 50 cm x 50 cm x 80 cm box.
 The maximum weight of the robot is 40 kg.
 Two teams of 5 midsized robots with all sensors onboard play soccer on a field.
 Communications is wireless and typically uses dedicated commercial FM transmitter/receiver units.
Robocup Leagues: simulation

Challenges involved in Robocup Simulation League


Robocup System – Overview for simulation league
 Provides a platform to develop software techniques, without the necessity of creating physical robots.
 Consists of three main applications:
 Soccer Server
 Soccer Monitor
 Soccer Player(s)  agents
Robocup System – Soccer Server
A system allowing several autonomous program agents to play a virtual soccer game.
The games are carried out in a client/server style – where each client = one player.
The communication can be done over a network connection, or even the Internet.
Robocup System – Monitor
Used to display the visual progress of the game.
Several Monitors can be connected to the server.
It can also be used to interrupt game play by doing simple tasks such as dropping a ball.
Robocup System – Soccer Team  players
The agents are the brains of the players.
Players receive sensory information from the server, upon which decisions are made.
Commands are formatted and sent to the server.
Robocup System – Soccer Team  players
Robocup System – Team – coaches
 Privileged clients used to provide assistance.
 Receives noisefree view of the whole field.
 Can only send occasional messages to players (info, advice, freeform, etc.).
 Used for
 opponent modelling,
 game analysis,
 giving strategic tips to teammates.
Some RoboCup Teams

EXTENSIONS
Agents and Objects
 Are agents just objects by another name?
 Object:
 encapsulates some state
 communicates via message passing
 has methods, corresponding to operations that may be performed on this state
Agents and Objects
 Main differences:

agents are autonomous:
agents embody stronger notion of autonomy than objects, and in particular, they decide for themselves whether or not to perform an action on request from another agent 
agents are smart:
capable of flexible (reactive, proactive, social) behavior, and the standard object model has nothing to say about such types of behavior 
agents are active:
a multiagent system is inherently multithreaded, in that each agent is assumed to have at least one thread of active control
Agents and Expert Systems
 Aren’t agents just expert systems by another name?
 Expert systems typically disembodied ‘expertise’ about some (abstract) domain of discourse (e.g., blood diseases)
 Example: MYCIN knows about blood diseases in humans
 It has a wealth of knowledge about blood diseases, in the form of rules
 A doctor can obtain expert advice about blood diseases by giving MYCIN facts, answering questions, and posing queries
Agents and Expert Systems
 Main differences:

agents
situated in an environment:
MYCIN is not aware of the world — only information obtained is by asking the user questions 
agents
act:
MYCIN does not operate on patients  Some realtime (typically process control) expert systems are agents
Intelligent Agents and AI
 When building an agent, we simply want a system that can choose the right action to perform, typically in a limited domain
 We do not have to solve all the problems of AI to build a useful agent:
 Oren Etzioni, speaking about the commercial experience of NETBOT, Inc:“We made our agents dumber and dumber and dumber…until finally they made money.”

a little intelligence goes a long way!
SUMMARY
Summary
 Agent preceives the environment and acts.
 Agent = architecture + program
 Ideal agent takes action that maximizes performance at a given perception.
 Actions of an autonomous agent depend on experience.
 Mapping of perceptiuon to actions.
 Reactive agents act on perceptions, goaloriented agents act to reach a goal, utilitybased agents maximize their profit.
 Representation of knowledge!
 Various environments. Most difficult: not accessible, episodic, dynamic, and continuous.
Summary
Multiagent systems are systems in which multiple interacting agents interact to solve problems
Key concepts of multiagent systems are agents and agent coordination
There are many important issues for multiagent systems that attempt to answer when and how to interact with whom
Summary
Common characteristics of multiagent systems are their inherent distribution and complexity
Distributed and flexible nature of multiagent systems leads to increased speed, robustness, scalability and reusability
REFERENCES
References
 Mandatory Reading:

G. Görz et al., Handbuch der künstlichen Intelligenz,
Oldenbourg, 2003.
Kapitel 24: Software Agenten. 
Bradshaw, J.; „An Introduction to Software Agents“;
AAAI Press/The MIT Press  Finin, Tim and Fritzson, Richard and McKay, Don and McEntire, Robin: KQML as an agent communication language. CIKM '94: Proceedings of the third international conference on Information and knowledge management. ACM. 1994.
References
 Further Reading:
 Wooldridge, M. & Jennings, M; „Intelligent Agents: Theory and Practice“; The Knowledge Engineering Review 10
 Nwana, H.; „Software Agents: An Overview“; The Knowledge Engineering Review, Vol. 11 No 3
 Rao A. & Geogreff M.; „BDI Agents: From Theory to Practice“; Tech. Rep. 56, Australian Artificial Intelligence Institue, Melbourne, Australia, Apr 1995
 http://www.robocup.org/
References
 Wikipedia links:
 http://en.wikipedia.org/wiki/Agent
 http://en.wikipedia.org/wiki/Software_agent
 http://en.wikipedia.org/wiki/Intelligent_agent
 http://en.wikipedia.org/wiki/Mycin
 http://de.wikipedia.org/wiki/Knowledge_Query_and_Manipulation_Language
 http://en.wikipedia.org/wiki/RoboCup
Questions?
Intelligent Systems
Rule Learning
Tom Mitchell “Machine Learning”

Slides in this lecture are partially based
on [1], Section 3 “Decision Tree Learning”.
Outline
 Motivation
 Technical Solution
 Rule learning tasks
 Rule learning approaches
 Specialization
 The ID3 Algorithm
 Generalization
 The RELAX Algorithm
 Combining specialization and generalization
 The JoJo Algorithm
 Refinement of rule sets with JoJo
 Illustration by a Larger Example
 Extensions
 The C4.5 Algorithm
 Summary
MOTIVATION
Motivation: Definition of Learning in AI

Research on learning in AI is made up of diverse subfields (cf. [5])
 “Learning as adaptation”: Adaptive systems monitor their own performance and attempt to improve it by adjusting internal parameters, e.g.,
 Selfimproving programs for playing games
 Pole balancing

Problem solving methods
 “Learning as the acquisition of structured knowledge” in the form of
 Concepts
 Production rules

Discrimination nets
Machine Learning
 Machine learning (ML) is concerned with the design and development of algorithms that allow computers to change behavior based on data. A major focus is to automatically learn to recognize complex patterns and make intelligent decisions based on data (Source: Wikipedia)
 Driving question: “How can we build computer systems that automatically improve with experience, and what are the fundamental laws that govern all learning processes?” (cf. [2])
 Three niches for machine learning [1]:
 Data mining: using historical data to improve decisions, e.g. from medical records to medical knowledge
 Software applications hard to program by hand, e.g. autonomous driving or speech recognition
 Selfcustomizing programs, e.g., a newsreader that learns users interest
 Practical success of ML:
 Speech recognition
 Computer vision, i.e. to recognize faces, to automatically classify microscope images of cells, or for recognition of hand writings
 Bio surveillance, i.e. to detect and track disease outbreaks
 Robot control
Rule learning

Machine Learning is a central research area in AI to acquire knowledge [1].
 Rule learning
 is a popular approach for discovering interesting relations in data sets and to acquire structured knowledge.

is a means to alleviate the „bottleneck“ (knowledge aquisition) – problem in expert systems [4].
Simple Motivating Example

I = {milk, bread, butter, beer}

Sample database
ID

Milk

Bread

Butter

Beer

1

1

1

1

0

2

0

1

1

0

3

0

0

0

1

4

1

1

1

0

5

0

1

0

0


A possible rule would be {milk, bread} → {butter}
TECHNICAL SOLUTIONS
Decision Trees

Many inductive knowledge acquisition algorithms generate („induce“) classifiers in form of decision trees.
 A decision tree is a simple recursive structure for expressing a sequential classification process.
 Leaf nodes denote classes

Intermediate nodes represent tests

Decision trees classify instances by sorting them down the tree from the root to some leaf node which provides the classification of the instance.
Decision Tree: Example

{Outlook = Sunny, Temperature = Hot,
Humidity = High, Wind = Strong}:
Negative Instance
Decision Trees and Rules
 Rules can represent a decision tree:

if item1 then subtree1
elseif item2 then subtree2
elseif...
 There are as many rules as there are leaf nodes in the decision tree.
 Advantage of rules over decision trees:
 Rules are a widelyused and wellunderstood representation formalism for knowledge in expert systems;
 Rules are easier to understand, modify and combine; and
 Rules can significantly improve classification performance by eliminating unnecessary tests.
 Rules make it possible to combine different decision trees for the same task.
Advantages of Decision Trees
 Decision trees are simple to understand.
People are able to understand decision trees model after a brief explanation.
 Decision trees have a clear evaluation strategy
Decision trees are easily interpreted
 Decision trees are able to handle both nominal and categorical data
Other techniques are usually specialised in analysing datasets that have only one type of variable. Ex: neural networks can be used only with numerical variables
Advantages of Decision Trees (1)
 Decision trees are a white box model.
If a given situation is observable in a model the explanation for the condition is easily explained by boolean logic. An example of a black box model is an artificial neural network since the explanation for the results is excessively complex to be comprehended.
 Decision trees are robust, perform well with large data in a short time.
Large amounts of data can be analysed using personal computers in a time short enough to enable stakeholders to take decisions based on its analysis.
Converting a Decision Tree to Rules
RULE LEARNING TASKS
Rule learning tasks

There are two major rule learning tasks:

Learning of correct rule that cover a maximal number of positive examples but no negative ones. We call this maximal requirement for each rule.

Learning of a minimal rule set that covers all positive example. We call this minimal requirement for the rule set.

Rule learning tasks
To illustrate the two rule learning tasks let’s consider a data set, containing both positive and negative examples, as depicted in the following chart
Maximal requirement for each rule
We want to learn a correct rule that cover a maximal number of positive examples but no negative examples (maximal requirement for each rule).
Maximal requirement for each rule

We start with deriving a rule that cover the positive example (x=3;y=3).
If x=3 and y=3 then class = positive
Maximal requirement for each rule

We try to cover more positive examples by refining the rule. We need to make sure that we don’t cover negative examples. The new rule

if x>=2 and x<=4 and y=3 then class = positive

covers positive example (x=2;y=3),(x=3;y=3),(x=4;y=3)
Maximal requirement for each rule

We try to cover more positive examples by further refining the rule. The new rule

if x>=2 and x<=4 and y>=3 and y<=4 then class = positive

covers positive example (x=2;y=3),(x=3;y=3),(x=4;y=3), (x=2;y=4),(x=3;y=4),(x=4;y=4)
Maximal requirement for each rule

We try to cover more positive examples by further refining the rule. The new rule

if x>=2 and x<=4 and y>=2 and y<=5 then class = positive

covers all positive example in the lower part of the space
Maximal requirement for each rule

We can not refine anymore the rule

if x>=2 and x<=4 and y>=2 and y<=5 then class = positive to cover more positive without covering negative examples as well. To cover the rest of the positive examples we need to learn a new rule.
Maximal requirement for each rule

Following the same approach as illustrated before we can learn a second rule that covers the rest of the positive examples:

if x>=5 and x<=6 and y>=6 and y<=8 then class = positive
Maximal requirement for each rule

The full set of positive examples is covered by the following two rules:

if x>=2 and x<=4 and y>=2 and y<=5 then class = positive
if x>=5 and x<=6 and y>=6 and y<=8 then class = positive
Minimal requirement for the rule set

In general, for a decision problem, there are many possible sets of rules that cover all positive examples without covering negative examples. Possible sets of rules for our previous example are:
Solution 1:
if x>=2 and x<=4 and y>=2 and y<=5 then class = positive
if x>=5 and x<=6 and y>=6 and y<=8 then class = positive
Minimal requirement for the rule set
Solution 2:
if x>=2 and x<=4 and y>=2 and y<=3 then class = positive
if x>=2 and x<=4 and y>3 and y<=5 then class = positive
if x>=5 and x<=6 and y>=6 and y<=7 then class = positive
if x>=5 and x<=6 and y>7 and y<=8 then class = positive
Minimal requirement for the rule set

The minimal requirement for the rule set is about learning of a minimal rule set that covers all positive example.

The minimal requirement for the rule set is similar with the minimal set cover problem [18]

The minimal requirement for the rule set, which basically translates to building the optimal linear decision tree, is NPhard [19].
RULE LEARNING APPROACHES
Latice of possible rules
Rule learning approaches

Rule learning can be seen as a search problem in the latice of possible rules
 Depending on the starting point and on the direction of search we can distinguished three classes of rule learning approaches:
 Specialization – the search procedure starts at the top of the lattice and searches towards the bottom of the latice, towards the concrete descriptions. Specialization is topdown .
 Generalization – the search procedure starts at the bottom of the lattice and advances up the lattice towards the t o p element. Generalization is bottomup .

Combining of specialization and generalization – the search procedure can start at any arbitrary point in the lattice and can move freely up or down as needed.
Rule learning approaches

Specialization and Generalization are dual search directions in a given rule set.
SPECIALIZATION
Specialization
 Specialization algorithms start from very general descriptions and specializes those until they are correct.

This is done by adding additional premises to the antecedent of a rule, or by restricting the range of an attribute which is used in an antecedent.
 Algorithms relying on specialization generally have the problem of overspecialization: previous specialization steps could become unnecessary due to subsequent specialization steps.

This brings along the risk for ending up with results that are not maximalgeneral.

Some examples of (heuristic) specialization algorithms are the following: ID3, AQ, C4, CN2, CABRO, FOIL, or PRISM; references at the end of the lecture.
THE ID3 ALGORITHM
ID3
 Most algorithms apply a topdown, greedy search through the space of possible trees.
 e.g., ID3 [5] or its successor C4.5 [6]
 ID3
 Learns trees by constructing them top down.
 Initial question: “Which attribute should be tested at the root of the tree?” >each attribute is evaluated using a statistical test to see how well it classifies.
 A descendant of the root node is created for each possible value of this attribute.
 Entire process is repeated using the training examples associated with each descendant node to select the best attribute to test at that point in the tree.
The Basic ID3 Algorithm
Selecting the Best Classifier

Core question: Which attribute is the best classifier?

ID3 uses a statistical measure called information gain that measures how well a given example separates the training examples according to their target classification.
Entropy
 Entropy characterizes the (im)purity of an arbitrary collection of examples.
 Given a collection S (containing positive and negative examples), the entropy of S relative to this boolean classification is




Information Gain

Information gain measures the expected reduction in entropy caused by partitioning the examples according to an attribute:

First term: entropy of the original collection S; second term: expected value of entropy after S is partitioned using attribute A (Sv subset of S).

Gain(S,A): The expected reduction in entropy caused by knowing the value of attribute A.

ID3 uses information gain to select the best attribute at each step in growing the tree.
Example: Which attribute is the best classifier?
Hypothesis Space Search in Decision Tree Learning
 ID3 performs a simpletocomplex hillclimbing search through the space of possible decision trees (the hypothesis search).
 Start: empty tree; progressively more elaborate hypotheses in search of a decision tree are tested.
Capabilities and Limitations of ID3

ID3’s hypothesis space is the complete space of finite discretevalued functions, relative to the available attributes.

ID3 maintains only a single current hypothesis, as it searches through the space of trees (earlier (perhaps better) versions are eliminated).

ID3 performs no backtracking in search; it converges to locally optimal solutions that are not globally optimal.

ID3 uses all training data at each step to make statistically based decisions regarding how to refine its current hypothesis.
Inductive Bias in Decision Tree Learning
 Definition: Inductive bias is the set of assumptions that, together with the training data, deductively justifies the classifications assigned by the learner to future instances [1].
 Central question: How does ID3 generalize from observed training examples to classify unseen instances? What is its inductive bias?
 ID3 search strategy:
 ID3 chooses the first acceptable tree it encounters.
 ID3 selects in favour of shorter trees over longer ones.
 ID3 selects trees that place the attributes with the highest information gain closest to the root.
 Approximate inductive bias of ID3: Shorter trees are preferred over larger trees. Trees that place high information gain attributes close to the root are preferred over those that do not.
ID3’s Inductive Bias

ID3 searches a complete hypothesis search.
 It searches incompletely through this space, from simple to complex hypotheses, until its termination condition is met.

Termination condition: Discovery of a hypothesis consistent with the data

ID3’s inductive bias is a preference bias (it prefers certain hypotheses over others).

Alternatively: A restriction bias categorically restricts considered hypotheses.
Occam’s Razor

…or why prefer short hypotheses?

William Occam was one of the first to discuss this question around year 1320.

Several arguments have been discussed since then.
 Arguments:
 Based on combinatorial arguments there are fewer short hypothesis that coincidentally fit the training data.

Very complex hypotheses fail to generalize correctly to subsequent data.
ID3 by Example ([1])

Target attribute: PlayTennis (values: yes / no)
Example: Which attribute should be tested first in the tree?

ID3 determines information gain for each candidate attribute (e.g., Outlook, Temperature, Humidity, and Wind) and selects the one with the highest information gain

Gain(S, Outlook) = 0.246 ; Gain(S, Humidity) = 0.151; Gain(S, Wind) = 0.048; Gain(S, Temperature)=0.029
Example: Resulting Tree
Example: Continue Selecting New Attributes

Determing new attributes at the „Sunny“ – branch only using the examples classified there:

Process continues for each new leaf node until either of two conditions are met:
 Every attribute has already been included along this path through the tree, or

The training examples associated with this leaf node all have the same target attribute value (i.e., their entropy is zero).
Example: Final Decision Tree
GENERALIZATION
Generalization
 Generalization starts from very special descriptions and generalizes them as long as they are not incorrect, i.e. in every step some unnecessary premises are deleted from the antecedent.

The generalization procedure stops if no more premises to remove exist.
 Generalization avoids the maximalgeneral issue of specialization, in fact it guarantees mostgeneral descriptions.

However, generalization of course risks to derive final results that are not mostspecific.

RELAX is an example of a generalizationbased algorithm; references at the end of the lecture.
THE RELAX ALGORITHM
RELAX
 RELAX is a generalization algorithm, and proceeds as long as the resulting rule set is not incorrect.
 The RELAX algorithm:
 RELAX considers every example as a specific rule that is generalized.
 The algorithm then starts from a first rule and relaxes the first premise.
 The resulting rule is tested against the negative examples.
 If the new (generalized) rule covers negative examples, the premise is added again, and the next premise is relaxed.
 A rule is considered minimal, if any further relaxation would destroy the correctness of the rule.
 The search for minimal rules starts from any not yet considered example, i.e. examples that are not covered by already discovered minimal rules.
RELAX: Ilustration by an example

Consider the following positive example for a consequent C:
(pos, (x=1, y=0, z=1))
 This example is represented as a rule: x ∩ ¬ y ∩ z → C

In case of no negative examples, RELAX constructs and tests the following set of rules:
1) x ∩ ¬ y ∩ z → C5) x → C2) ¬ y ∩ z → C6) ¬ y → C3) x ∩ z → C7) z → C4) x ∩ ¬ y → C8) → C
COMBINING SPECIALIZATION AND GENERALIZATION
THE JOJO ALGORITHM
JoJo

In general, it cannot be determined which search direction is the better one.

JoJo is an algorithm that combines both search directions in one heuristic search procedure.

JoJo can start at an arbitrary point in the lattice of complexes and generalizes and specializes as long as the quality and correctness can be improved, i.e. until a local optimum can be found, or no more search resources are available (e.g., time, memory).
JoJo (1)
 While specialization moves solely from ‘Top‘ towards ‘Bottom‘ and generalization from ‘Bottom‘ towards ‘Top‘, JoJo is able to move freely in the search space.

Either of the two strategies can be used interchangeable, which makes JoJo more expressive than comparable algorithms that apply the two in sequential order (e.g. ID3).
JoJo (2)
 A starting point in JoJo is described by two parameters:
 Vertical position (length of the description)

Horizontal position (chosen premises)

Reminder: JoJo can start at any arbitrary point, while specialization requires a highly general point and generalization requires a most specific point.

In general, it is possible to carry out several runs of JoJo with different starting points. Rules that were already produced can be used as subsequent starting points.
JoJo (4)

Criteria for choosing a vertical position:

Approximation of possible lenght or experience from earlier runs.

Random production of rules; distribution by means of the average correctness of the rules with the same length (socalled quality criterion).

Start with a small sample or very limited resources to discover a real starting point from an arbitrary one.

Randomly chosen starting point (same average expectation of success as starting with ‘Top‘ or ‘Bottom‘).

Heuristic: few positive examples and maximalspecific descriptions suggest long rules, few negative examples and maximalgeneric descriptions rather short rules.

JoJo Principle Components
 JoJo consists of three components: The Specializer , Generalizer , and Scheduler
 The former two can be provided by any such components depending on the chosen strategies and preference criterias.
 The Scheduler is responsible for selecting the next description out of all possible generalizations and specializations available (by means of a tpreference, total preference).
 Simple example scheduler:
 Specialize, if the error rate is above threshold;
 Otherwise, choose the best generalization with allowable error rate;
 Otherwise stop.
REFINEMENT OF RULES WITH JOJO
Refinement of rules with JoJo

Refinement of rules refers to the modification of a given rule set based on additonal examples.

The input to the task is a socalled hypothesis (a set of rules) and a set of old and new positive and negative examples.

The output of the algorithm are a refined set of rules and the total set of examples.

The new set of rules is correct, complete, nonredundant and (if necessary) minimal.
Refinement of rules with JoJo (1)

There is a four step procedure for the refinment of rules:

Rules that become incorrect because of new examples are refined: incorrect rules are replaced by new rules that cover the positive examples, but not the new negative ones.

Complete the rule set to cover new positive examples.

Redundant rules are deleted to correct the rule set.
 Minimize the rule set.


Steps 1 and 2 are subject to the algorithm JoJo that integrates generalization and specification via a heuristic search procedure.
Refinement of rules with JoJo (3)
 Correctness:
 Modify overly general rules that cover too many negative examples.

Replace a rule by a new set of rules that cover the positive examples, but not the negative ones.
 Completeness:

Compute new correct rules that cover the not yet considered positive examples (up to a threshold).
 Nonredundancy:

Remove rules that are more specific than other rules (i.e. rules that have premises that are a superset of the premises of another rule).
ILLUSTRATION BY A LARGER EXAMPLE: ID3
ID3 Example: The Simpsons
ID3 Example: The Simpsons
ID3 Example: The Simpsons
ID3 Example: The Simpsons
ID3 Example: The Simpsons
ID3 Example: The Simpsons
ID3 Example: The Simpsons
EXTENSIONS
Issues in Decision Tree Learning
 How deeply to grow the decision tree?
 Handling continuous attributes.
 Choosing an appropriate attribute selection measure.
 Handling training data with missing attribute values.
 Handling attributes with differing costs.
 Improving computational efficiency.
 Successor of ID3, named C4.5 , addresses most of these issues (see [6]).
THE C4.5 ALGORITHM
Avoiding Overfitting the Data
 Definition: Given a hypothesis space H, a hypothesis h є H is said to overfit the training data if there exists some alternative hypothesis h’ є H, such that h’ has smaller error than h’ over the training examples, but h’ has a smaller error than h over the entire distribution of instances. [1]
 ID3 grows each branch of the tree just deeply enough to perfectly classify the training examples.
 This can lead to difficulties when there is noise in the data or when the number of training examples is too small to produce a representative sample of the true target function.
 Overfitting trees could be produced!

Example: Overfitting due to Random Noise
 Original decision tree based on correct data:

Incorrectly labelled data leads to construction of a more complex tree, e.g., {Outlook=Sunny, Temperature=Hot, Humidity=Normal,
Wind=Strong, PlayTennis=No}

ID3 would search for new refinements at the bottom of the left tree.
Strategies to Avoid Overfitting
 Overfitting might occur based on erroneous input data or based on coincidental regularities.
 Different types of strategies:
 Stopping to grow the tree earlier, before it reaches the point where it perfectly classifies the training data.
 Allowing the tree to overfit the data, and then postprune the tree (more successful approach).
 Key question: How to determine the correct final tree size?
 Use of a separate set of examples to evaluate the utility of postpruning nodes from the tree (“Training and Validation Set” – approach); two approaches applied by Quinlain: “Reduced Error Pruning” and “RulePost Pruning”
 Use all available data for training, but apply a statistical test to estimate whether expanding (or pruning) a particular node is likely to produce an improvement beyond the training set.
 Use an explicit measure of the complexity for encoding the training examples and the decision trees, halting growth of the tree when this encoding size is minimized (“Minimum Decision Length” – principle, see [1] Chapter 6)
Rule PostPruning
 Applied in C4.5.

Steps ( [1])

Infer the decision tree from the training set, growing the set until the training data is fit as well as possible and allowing overfitting to occur.

Convert the learned tree into an equivalent set of rules by creating one rule for each path from the root node to a leaf node.

Prune (generalize) each rule by removing any preconditions that result in improving its estimated accuracy.
 Sort the pruned rules by their estimated accuracy, and consider them in this sequence when classifying subsequent instances.


Rule accuracy estimation based on the training set using a pessimistic estimate: C4.5 calculates standard deviation and takes the lower bound as estimate for rule performance.
Incorporating ContinuousValued Attributes
 ID3 is restricted to attributes with discrete values.

Continuousvalued attributes  example:
 C4.5 allows continuousvalued attributes in the decision nodes inside the tree.
 Approach: Dynamic definition of new discretevalued attributes that partition the continuous attribute value into a discrete set of intervals (e.g. A_{c} is true if A < c, false otherwise).

Central question: How to select the best value for a threshold c?
Computing a Threshold
 Goal: Determining a threshold c that produces the greatest information gain.

Approach:

Sort examples according to the continuous attribute A.

Identify adjacent examples that differ in their target classification.

Generate a set of candidate thresholds midway between the corresponding values of A (c must always lie at such a boundary, see [5]).
 Evaluate candidate thresholds by computing the information gain associated with each.

 Example:
 Two candidate thresholds at which the value of PlayTennis changes: (48+60)/2, (80+90)/2

Information gain for the first threshold is higher.
Alternative Measures for Selecting Attributes
 Natural bias in the information gain measure favours attributes with many values over those with few values (e.g. a “Date” – field)
 Example:
 Date – attributes has (too) many possible values; thus it separates the training examples into very small subsets.
 It would have a very high information gain.

But it would be a very poor predictor of the target function over unseen instances

Alternative selection measure: gain ratio (see [5]).
Gain Ratio

Gain ratio penalizes attributes such as Date by incorporating split information (entropy of S with respect to A) that is sensitive to how broadly and uniformly the attribute splits the data:

Definition GainRatio:
Handling Training Examples with Missing Attribute Values
 If data with missing attribute values exist, it is common to estimate the missing attribute value based on other examples for which the attribute has a known value.
 Common strategy: Assign value that is most common among training examples at a certain node n that have the same classification as the current example.
 More complex strategy used in C4.5: Assign a probability to each of the possible values.
 Probabilities can be estimated based on the observed frequencies of various values of a certain attribute among examples at a certain node n, e.g., A(x) = 1: 0.6
 Fractional of probability 0.6 is propagated down the branch of the tree for A=1.
 Fractional examples are used to compute information gain.
 Classification of a new instance is simply the most probable classification (computed by summing weights of instance fragments classified in different ways at leaf nodes of the tree).
SUMMARY
Summary
 Machine learning is a prominent topic in the field of AI.
 Rule learning is a means to learn rules from instance data to classify unseen instances.
 Decision tree learning can be used for concept learning, rule learning, or for learning of other discrete valued functions.
 The ID3 family of algorithms infers decision trees by growing them from the root downward in a greedy manner.
 ID3 searches a complete hypothesis space.
 ID3’s inductive bias includes a preference for smaller trees; it grows trees only as large as needed.
 A variety of extensions to basic ID3 have been developed; extensions include: methods for postpruning trees, handling realvalued attributes, accommodating training examples with missing attribute values, or using alternative selection measures.
Summary (2)
 Rules cover positive examples and should not cover negative examples.
 There are two main approaches for determining rules:
 Generalization

Specification
 RELAX is a presented example of a generalization algorithm
 JoJo combines generalization and specialization and allows the algorithm to traverse the entire search space by either generalizing or specializing rules interchangeably.

JoJo can also be applied to incrementally refine rule sets.
REFERENCES
References
 Mandatory reading:
 [1] Mitchell, T. "Machine Learning", McGraw Hill, 1997. (Section 3)
 [2] Mitchell, T. "The Discipline of Machine Learning" Technical Report CMUML06108, July 2006. Available online: http://www.cs.cmu.edu/~tom/pubs/MachineLearning.pdf
 [3] Fensel, D. “Ein integriertes System zum maschinellen Lernen aus Beispielen” Künstliche Intelligenz (3). 1993.
 Further reading:
 [4] Feigenbaum, E. A. “Expert systems in the 1980s” In: A. Bond (Ed.), State of the art report on machine intelligence. Maidenhea: Pergamoninfotech.
 [5] Quinlan, J. R. “Induction of decision trees”. Machine Learning 1 (1), pp. 81106, 1986.
References

[6] Quinlan, J. R. “C4.5: Programs for Machine Learning” Morgan Kaufmann, 1993.

[7] Fayyad, U. M. “On the induction of decision trees for multiple concept learning” PhD dissertation, University of Michigan, 1991.

[8] Agrawal, Imielinsky and Swami: Mining Association Rules between Sets of Items in Large Databases. ACM SIGMOD Conference, 1993, pp. 207216.

[9] Quinlan: Generating Production Rules From Decision Trees. 10th Int’l Joint Conference on Artificial Intelligence, 1987, pp. 304307.

[10] Fensel and Wiese: Refinement of Rule Sets with JoJo. European Conference on Machine Learning, 1993, pp. 378383.
References
 [11] AQ: Michalski, Mozetic, Hong and Lavrac: The MultiPurpose Incremental Learning System AQ15 and its Testing Application to Three Medical Domains. AAAI86, pp. 10411045.
 [12] C4: Quinlan: Learning Logical Definitions from Relations: Machine Learning 5(3), 1990, pp. 239266.
 [13] CN2: Clark and Boswell: Rule Induction with CN2: Some recent Improvement. EWSL91, pp. 151163.
 [14] CABRO: Huyen and Bao: A method for generating rules from examples and its application. CSNDAL91, pp. 493504.
 [15] FOIL: Quinlan: Learning Logical Definitions from Relations: Machine Learning 5(3), 1990, pp. 239266.
 [16] PRISM: Cendrowska: PRISM: An algorithm for inducing modular rules. Journal ManMachine Studies 27, 1987, pp. 349370.
 [17] RELAX: Fensel and Klein: A new approach to rule induction and pruning. ICTAI91.
References

[18] http://www.ensta.fr/~diam/ro/online/viggo_wwwcompendium/node146.html.

[19] Goodrich, M.T., Mirelli, V. Orletsky, M., and Salowe, J. Decision tree conctruction in fixed dimensions: Being global is hard but local greed is good. Technical Report TR951, Johns Hopkins University, Department of Computer Science, Baltimore, MD 21218, May 1995
References

Wikipedia links:

Machine Learning, http://en.wikipedia.org/wiki/Machine_learning

Rule learning, http://en.wikipedia.org/wiki/Rule_learning

Association rule learning, http://en.wikipedia.org/wiki/Association_rule_learning

ID3 algorithm, http://en.wikipedia.org/wiki/ID3_algorithm

C4.5 algorithm, http://en.wikipedia.org/wiki/C4.5_algorithm
Questions?
Intelligent Systems
Inductive Logic Programming
Agenda
 Motivation
 Technical Solution
 Model Theory of ILP
 A Generic ILP Algorithm
 Proof Theory of ILP
 ILP Systems
 Illustration by a Larger Example
 Summary
 References
MOTIVATION
Motivation
 There is a vast array of different machine learning techniques, e.g.:
 Decision Tree Learning (see previous lecture)
 Neural networks
 and… Inductive Logic Programming (ILP)
 Advantages over other ML approaches
 ILP uses an expressive FirstOrder framework instead of simple attributevalue framework of other approaches
 ILP can take background knowledge into account
=
Inductive Learning ∩ Logic Programming
The inductive learning and logic programming sides of ILP
 From inductive machine learning, ILP inherits its goal: to develop tools and techniques to
 Induce hypotheses from observations (examples)
 Synthesise new knowledge from experience
 By using computational logic as the representational mechanism for hypotheses and observations, ILP can overcome the two main limitations of classical machine learning techniques:
 The use of a limited knowledge representation formalism (essentially a propositional logic)
 Difficulties in using substantial background knowledge in the learning process
The inductive learning and logic programming sides of ILP (cont’)
 ILP inherits from logic programming its
 Representational formalism
 Semantical orientation
 Various wellestablished techniques
 ILP systems benefit from using the results of logic programming
 E.g. by making use of work on termination, types and modes, knowledgebase updating, algorithmic debugging, abduction, constraint logic programming, program synthesis and program analysis
The inductive learning and logic programming sides of ILP (cont’)
 Inductive logic programming extends the theory and practice of logic programming by investigating induction rather than deduction as the basic mode of inference
 Logic programming theory describes deductive inference from logic formulae provided by the user

ILP theory describes the inductive inference of logic programs from instances and background knowledge

ILP contributes to the practice of logic programming by providing tools that assist logic programmers to develop and verify programs
Introduction – Basic example
 Imagine learning about the relationships between people in your close family circle
 You have been told that your grandfather is the father of one of your parents, but do not yet know what a parent is
 You might have the following beliefs ( B ):
 You are now given the following positive examples concerning the relationships between particular grandfathers and their grandchildren ( E + ):

grandfather(X, Y) ← father(X, Z), parent(Z, Y)

father(henry, jane) ←

mother(jane. john) ←

mother(jane, alice) ←

grandfather(henry, john) ←

grandfather(henry, alice) ←
Introduction – Basic example
 You might be told in addition that the following relationships do not hold (negative examples) ( E  )

← grandfather(john, henry)

← grandfather(alice, john)
 Believing B , and faced with examples E + and E  you might guess the following hypothesis H _{1} ∈ H

parent(X, Y) ← mother(X, Y)
 H is the set of hypotheses and contain an arbitrary number of individual speculations that fit the background knowledge and examples
 Several conditions have to be fulfilled by a hypothesis
 Those conditions are related to completeness and consistency with respect to the background knowledge and examples
Introduction – Basic example
 Consistency:
 First, we must check that our problem has a solution:

B ∪ E ⊭ □ (
prior satisfiability
)
 If one of the negative examples can be proved to be true from the background information alone, then any hypothesis we find will not be able to compensate for this. The problem is not satisfiable.
 B and H are consistent with E:

B ∪ H ∪ E ⊭ □ (
posterior satisfiability
)
 After adding a hypothesis it should still not be possible to prove a negative example.
 Completeness:
 However, H allows us to explain E+ relative to B:

B ∪ H ⊧ E+ (
posterior sufficiency
)
 This means that H should fit the positive examples given.
TECHNICAL SOLUTIONS
Model Theory of ILP
Model Theory – Normal Semantics
 The problem of inductive inference:
 Given is background (prior) knowledge B and evidence E
 The evidence E = E+ ∪ E consists of positive evidence E+ and negative evidence E
 The aim is then to find a hypothesis H such that the following conditions hold:
 Prior Satisfiability: B ∪ E ⊭ □
Posterior Satisfiability: B ∪ H ∪ E ⊭ □
Prior Necessity: B ⊭ E+
Posterior Sufficiency: B ∪ H ⊧ E+
 The Sufficiency criterion is sometimes named completeness with regard to positive evidence
 The Posterior Satisfiability criterion is also known as consistency with the negative evidence
 In this general setting, backgroundtheory, examples, and hypotheses can be any (wellformed) formula
Model Theory – Definite Semantics
 In most ILP practical systems background theory and hypotheses are restricted to being definite clauses
 Clause: A disjunction of literals
 Horn Clause: A clause with at most one positive literal
 Definite Clause: A Horn clause with exactly one positive literal
 This setting has the advantage that definite clause theory T has a unique minimal Herbrand model M ^{+}( T )
 Any logical formulae is either true or false in this minimal model (all formulae are decidable and the Closed World Assumption holds)
Model Theory – Definite Semantics
 The definite semantics again require a set of conditions to hold
 We can now refer to every formula in E since they are guaranteed to have a truth value in the minimal model
 Consistency:

Prior Satisfiability: all
e in E
^{}
are false in
M
^{+}(
B
)
 Negative evidence should not be part of the minimal model

Posterior Satisfiability: all
e in E
^{}
are false in
M
^{+}(
B
∪
H
)
 Negative evidence should not be supported by our hypotheses
 Completeness

Prior Necessity: some
e in E
^{+}
are false in
M
^{+}(
B
)
 If all positive examples are already true in the minimal model of the background knowledge, then no hypothesis we derive will add useful information

Posterior Sufficiency: all
e in E
^{+}
are true in
M
^{+}(
B
∪
H
)
 All positive examples are true (explained by the hypothesis) in the minimal model of the background theory and the hypothesis
Model Theory – Definite Semantics
 An additional restriction in addition to those of the definite semantics is to only allow true and false ground facts as examples (evidence)

This is called the example setting
 The example setting is the main setting employed by ILP systems
 Only allows factual and not causal evidence (which usually captures more knowledge)

Example:

B:
grandfather(X, Y) ← father(X, Z), parent(Z, Y)
father(henry, jane) ←
etc. 
E:
grandfather(henry, john) ←
grandfather(henry, alice) ←

B:
grandfather(X, Y) ← father(X, Z), parent(Z, Y)
 Not allowed in example setting:
 ← grandfather(X, X) [Not allowed in definite semantics]
grandfather(henry, john) ← father(henry, jane), mother(jane, john)
Model Theory – Nonmonotonic Semantics
 In the nonmonotonic setting:

The background theory is a set of definite clauses
 The evidence is empty
 The positive evidence is considered part of the background theory

The negative evidence is derived implicitly, by making the closed world assumption (realized by the minimal Herbrand model)

The hypotheses are sets of general clauses expressible using the same alphabet as the background theory
Model Theory – Nonmonotonic Semantics (2)
 Since only positive evidence is present, it is assumed to be part of the background theory:
 B’ = B ∪ E
 The following conditions should hold for H and B’:
 Validity: all h in H are true in M +( B’ )
 All clauses belonging to a hypothesis hold in the database B, i.e. that they are true properties of the data
 Completeness: if general clause g is true in M +( B’ ) then H ⊧ g
 All information that is valid in the minimal model of B’ should follow from the hypothesis
 Additionally the following can be a requirement:
 Minimality: there is no proper subset G of H which is valid and complete
 The hypothesis should not contain redundant clauses
Model Theory – Nonmonotonic Semantics (3)
 Example for B (definite clauses):

male(luc) ←

female(lieve) ←

human(lieve) ←

human(luc) ←
 A possible solution is then H (a set of general clauses):

← female(X), male(X)

human(X) ← male(X)

human(X) ← female(X)

female(X), male(X) ← human(X)
Model Theory – Nonmonotonic Semantics (4)
 One more example to illustrate the difference between the example setting and the nonmonotonic setting
 Consider:
 Background theory B

bird(tweety) ←

bird(oliver) ←
 Examples E^{+}:

flies(tweety)
 For the nonmonotonic setting B’ = B ∪ E^{+} because positive examples are considered part of the background knowledge
Model Theory – Nonmonotonic Semantics (5)
 Example setting:
 An acceptable hypothesis H_{1} would be

flies(X) ← bird(X)
 It is acceptable because if fulfills the completeness and consistency criteria of the definite semantics
 This realizes can inductive leap because flies(oliver) is true in

M
+
( B
∪ H
) = { bird(tweety), bird(oliver), flies(tweety), flies(oliver) }
 Nonmonotonic setting:
 H_{1} is not a solution since there exists a substitution {X ← oliver} which makes the clause false in M +( B’ ) (the validity criteria is violated:

M
+
( B’ ) = { bird(tweety), bird(oliver), flies(tweety) }
{X ← oliver}: flies(oliver) ← bird(oliver)
{X ← tweety}: flies(tweety) ← bird(tweety)
TECHNICAL SOLUTIONS
A Generic ILP Algorithm
ILP as a Search Problem
 ILP can be seen as a search problem  this view follows immediately from the modeltheory of ILP
 In ILP there is a space of candidate solutions, i.e. the set of hypotheses, and an acceptance criterion characterizing solutions to an ILP problem
 Question: how the space of possible solutions can be structured in order to allow for pruning of the search?
 The search space is typically structured by means of the dual notions of generalisation and specialisation
 Generalisation corresponds to induction
 Specialisation to deduction
 Induction is viewed here as the inverse of deduction
Specialisation and Generalisation Rules
 A hypothesis G is more general than a hypothesis S if and only if G ⊧ S
 S is also said to be more specific than G.
 In search algorithms, the notions of generalisation and specialisation are incorporated using inductive and deductive inference rules:
 A deductive inference rule r maps a conjunction of clauses G onto a conjunction of clauses S such that G ⊧ S

r
is called a specialisation rule
 An inductive inference rule r maps a conjunction of clauses S onto a conjunction of clauses G such that G ⊧ S

r
is called a generalisation rule
Pruning the search space
 Generalisation and specialisation form the basis for pruning the search space; this is because:
 When B ∪ H ⊭ e , where e ∈ E^{+}, B is the background theory, H is the hypothesis, then none of the specialisations H’ of H will imply the evidence
 They can therefore be pruned from the search.
 When B ∪ H ∪ {e} ⊧ □ , where e ∈ E^{}, B is the background theory, H is the hypothesis, then all generalisations H’ of H will also be inconsistent with B ∪ E
 We can again drop them
A Generic ILP Algorithm
 Given the key ideas of ILP as search a generic ILP system is defined as:
 The algorithm works as follows:
 It keeps track of a queue of candidate hypotheses QH
 It repeatedly deletes a hypothesis H from the queue and expands that hypotheses using inference rules; the expanded hypotheses are then added to the queue of hypotheses QH, which may be pruned to discard unpromising hypotheses from further consideration

This process continues until the stopcriterion is satisfied
Algorithm – Generic Parameters
 Initialize denotes the hypotheses started from
 R denotes the set of inference rules applied
 Delete influences the search strategy
 Using different instantiations of this procedure, one can realise a depthfirst (Delete = LIFO), breadthfirst Delete = FIFO) or bestfirst algorithm
 Choose determines the inference rules to be applied on the hypothesis H
Algorithm – Generic Parameters (2)
 Prune determines which candidate hypotheses are to be deleted from the queue
 This can also be done by relying on the user (employing an “oracle”)

Combining Delete with Prune it is easy to obtain advanced search
 The Stopcriterion states the conditions under which the algorithm stops

Some frequently employed criteria require that a solution be found, or that it is unlikely that an adequate hypothesis can be obtained from the current queue
TECHNICAL SOLUTIONS
Proof Theory of ILP
Proof Theory of ILP
 Inductive inference rules can be obtained by inverting deductive ones
 Deduction: Given B ⋀ H ⊧ E ^{ + } , derive E ^{ + } from B ⋀ H
 Induction: Given B ⋀ H ⊧ E ^{ + } , derive H from B and B and E ^{ + }
 Inverting deduction paradigm can be studied under various assumptions, corresponding to different assumptions about the deductive rule for ⊧ and the format of background theory B and evidence E ^{ + }

⇒ Different models of inductive inference are obtained
 Example: θ subsumption
 The background knowledge is supposed to be empty, and the deductive inference rule corresponds to θsubsumption among single clauses
θsubsumption

θ subsumes is the simplest model of deduction for ILP which regards clauses as sets of (positive and negative) literals

A clause c _{ 1 } θ subsumes a clause c _{ 2 } if and only if there exists a substitution θ such that c _{ 1 } θ ⊆ c _{ 2 }

c _{ 1 } is called a generalisation of c _{ 2 } (and c _{ 2 } a specialisation of c _{ 1 } ) under θ subsumption

θ subsumes The θ subsumption inductive inference rule is:
θsubsumption
 For example, consider:

c
_{1}
=
{ father(X,Y) ← parent(X,Y), male(X) }
c _{2} = { father(jef,paul) ← parent(jef,paul), parent(jef,ann), male(jef), female(ann) }

With
θ
= {X = jef, Y = paul}
c
_{1}
θ
subsumes c_{2} because

{ father(jef,paul) ← parent(jef, paul), male(jef) }
⊆
father(jef,paul) ← parent(jef,paul), parent(jef,ann), male(jef), female(ann) }
Some properties of θsubsumption
 θ subsumption has a range of relevant properties
 Example: Implication
 If c _{1} θ subsumes c _{2}, then c _{1} ⊧ c _{2}
 Example: See previous slide
 This property is relevant because typical ILP systems aim at deriving a hypothesis H (a set of clauses) that implies the facts in conjunction with a background theory B, i.e. B ∪ H ⊧ E+
 Because of the implication property, this is achieved when all the clauses in E+ are θ subsumed by clauses in B ∪ H
Some properties of θsubsumption
 Example: Equivalence
 There exist different clauses that are equivalent under θ subsumption
 E.g. parent(X,Y) ← mother(X,Y), mother(X,Z) θ subsumes parent(X,Y) ← mother(X,Y) and vice versa
 Two clauses equivalent under θ subsumption are also logically equivalent, i.e. by implication
 This is used for optimization purposes in practical systems
TECHNICAL SOLUTIONS
ILP Systems
Characteristics of ILP systems
 Incremental/nonincremental: describes the way the evidence E (examples) is obtained
 In nonincremental or empirical ILP, the evidence is given at the start and not changed afterwards
 In incremental ILP, the examples are input one by one by the user, in a piecewise fashion.
 Interactive/ Noninteractive
 In interactive ILP, the learner is allowed to pose questions to an oracle (i.e. the user) about the intended interpretation
 Usually these questions query the user for the intended interpretation of an example or a clause.
 The answers to the queries allow to prune large parts of the search space
 Most systems are noninteractive
Concrete ILP implementations
 A well known family of related, popular systems: Progol
 CProgol, PProgol, Aleph
 Progol allows arbitrary Prolog programs as background knowledge and arbitrary definite clauses as examples
 Most comprehensive implementation: CProgol
 Homepage: http://www.doc.ic.ac.uk/~shm/progol.html
 General instructions (download, installation, etc.)
 Background information
 Example datasets
 Open source and free for research and teaching
An ILP system: CProgol
 CProgol uses a covering approach: It selects an example to be generalised and finds a consistent clause covering the example

Basic algorithm for CProgol:

Select an example to be generalized.

Build mostspecificclause. Construct the most specific clause that entails the example selected, and is within language restrictions provided. This is usually a definite clause with many literals, and is called the "bottom clause."

Find a clause more general than the bottom clause. This is done by searching for some subset of the literals in the bottom clause that has the "best" score.

Remove redundant examples. The clause with the best score is added to the current theory, and all examples made redundant are removed. Return to Step 1 unless all examples are covered.
An ILP system: CProgol
 Example: CProgol can be used to learn legal moves of chess pieces (Based on rank and File difference for knight moves)
 Example included in CProgol distrubtion

An ILP system: CProgol

ILLUSTRATION BY A LARGER EXAMPLE
Michalski’s train problem
Michalski’s train problem

Assume ten railway trains: five are travelling east and five are travelling west; each train comprises a locomotive pulling wagons; whether a particular train is travelling towards the east or towards the west is determined by some properties of that train

The learning task: determine what governs which kinds of trains are Eastbound and which kinds are Westbound
Michalski’s train problem (2)

Michalski’s train problem can be viewed as a classification task: the aim is to generate a classifier (theory) which can classify unseen trains as either Eastbound or Westbound

The following knowledge about each car can be extracted: which train it is part of, its shape, how many wheels it has, whether it is open (i.e. has no roof) or closed, whether it is long or short, the shape of the things the car is loaded with. In addition, for each pair of connected wagons, knowledge of which one is in front of the other can be extracted.
Michalski’s train problem (3)
 Examples of Eastbound trains

Positive examples:
eastbound(east1).
eastbound(east2).
eastbound(east3).
eastbound(east4).
eastbound(east5).

Negative examples:
eastbound(west6).
eastbound(west7).
eastbound(west8).
eastbound(west9).
eastbound(west10).
Michalski’s train problem (4)
 Background knowledge for train east1 . Cars are uniquely identified by constants of the form car_xy , where x is number of the train to which the car belongs and y is the position of the car in that train. For example car_12 refers to the second car behind the locomotive in the first train
 short(car_12). short(car_14).
 long(car_11). long(car_13).
 closed(car_12).
 open(car_11). open(car_13). open(car_14).
 infront(east1,car_11). infront(car_11,car_12).
 infront(car_12,car_13). infront(car_13,car_14).
 shape(car_11,rectangle). shape(car_12,rectangle).
 shape(car_13,rectangle). shape(car_14,rectangle).
 load(car_11,rectangle,3). load(car_12,triangle,1).
 load(car_13,hexagon,1). load(car_14,circle,1).
 wheels(car_11,2). wheels(car_12,2).
 wheels(car_13,3). wheels(car_14,2).
 has_car(east1,car_11). has_car(east1,car_12).
 has_car(east1,car_13). has_car(east1,car_14).
Michalski’s train problem (5)
 An ILP systems could generate the following hypothesis:

eastbound(A) ← has_car(A,B), not(open(B)), not(long(B)).

i.e. A train is eastbound if it has a car which is both not open and not long.
 Other generated hypotheses could be:
 If a train has a short closed car, then it is Eastbound and otherwise Westbound
 If a train has two cars, or has a car with a corrugated roof, then it is Westbound and otherwise Eastbound
 If a train has more than two different kinds of load, then it is Eastbound and otherwise Westbound
 For each train add up the total number of sides of loads (taking a circle to have one side); if the answer is a divisor of 60 then the train is Westbound andotherwise Eastbound
Michalski’s train problem – Demo
 Download Progrol

http://www.doc.ic.ac.uk/~shm/Software/progol5.0
 Use the Progol input file for Michalski's train problem

http://www.comp.rgu.ac.uk/staff/chb/teaching/cmm510/michalski_train_data

Generate the hypotheses
SUMMARY
Summary
 ILP is a subfield of machine learning which uses logic programming as a uniform representation for
 Examples
 Background knowledge
 Hypotheses
 Many existing ILP systems
 Given an encoding of the known background knowledge and a set of examples represented as a logical database of facts, an ILP system will derive a hypothesised logic program which entails all the positive and none of the negative examples
 Lots of applications of ILP
 E.g. bioinformatics, natural language processing, engineering
 IPL is an active research filed
REFERENCES
References
 Mandatory Reading:
 S.H. Muggleton. Inductive Logic Programming. New Generation Computing , 8(4):295318, 1991.
 S.H. Muggleton and L. De Raedt. Inductive logic programming: Theory and methods. Journal of Logic Programming , 19,20:629679, 1994.
 Further Reading:
 N. Lavrac and S. Dzeroski. Inductive Logic Programming: Techniques and Applications. 1994.
 http://wwwai.ijs.si/SasoDzeroski/ILPBook
 Wikipedia:
 http://en.wikipedia.org/wiki/Inductive_logic_programming
Questions?
Intelligent Systems
Formal Concept Analysis
Outline
 Motivation
 Technical Solution
 Definitions
 Concept Lattice
 Illustration by a Larger Example
 Extension
 Summary
MOTIVATION
From data to their conceptualization
Introduction
 In many applications we deal with huge amount of data
 E.g. insurance company records
 Data by itself are not useful to support decisions
 E.g. can I make an insurance contract to this new customer or it is too risky?
 Thus we need a set of methods to generate from a data set a “summary” that represent a conceptualization of the data set
 E.g. what are similarities among different customers of an insurance company that divide them in different risk classes?
 Age >25, City=Innsbruck => Low Risk
 This is a common task that is needed in several domains to support data analysis
 Analysis of children suffering from diabetes
 Marketing analysis of store departments or supermarkets
 Formal Concept Analysis is a technique that enables resolution of such problems
TECHNICAL SOLUTION
Formal Concept Analysis
What is a Concept?
 What drives us to call an object a “bird”?
 Every object having certain attributes is called “bird”:
 A bird has feathers
 A bird has two legs
 A bird has a beak
 …
 All objects having these attributes are called “birds”:
 Duck, goose, owl and parrot are birds
 Penguins are birds, too
 …
What is a Concept?

This description of the concept “bird” is based on sets of

Objects, attributes and a relation form a formal concept
What is a Concept?
 So, a formal concept is constituted by two parts
 having a certain relation:
 every object belonging to this concept has all the attributes in B
 every attribute belonging to this concept is shared by all objects in A
 A is called the concept's extent
 B is called the concept's intent
The Universe of Discourse

A repertoire of objects and attri