### Where are we?

 Title 1 Introduction 2 Propositional Logic 3 Predicate Logic 4 Reasoning 5 Search Methods 6 CommonKADS 7 Problem-Solving Methods 8 Planning 9 Software Agents 10 Rule Learning 11 Inductive Logic Programming 12 Formal Concept Analysis 13 Neural Networks 14 Semantic Web and Services

### Textbooks

 G. Görz, C.-R. Rollinger, J. Schnee- berger (Hrsg.) “ Handbuch der künstlichen Intelligenz ” Oldenbourg Verlag, 2003, Fourth edition G. Luger “ Artificial Intelligence – Structures and Strategies for Complex Problem Solving ” Addision- Wesley, 2005, Fifth edition

### Overview of the course: What is the course about?

1. Introduction

2. Propositional logic

3. Predicate logic

4. Reasoning

5. Search methods

7. Problem-solving methods

8. Planning

9. Software Agents

10. Rule learning

11. Inductive logic programming

12. Formal concept analysis

13. Neural networks

14. Semantic Web and Services

### Outline

• Motivation

• What is “Intelligence”?

• What is “Artificial Intelligence” (AI)?

• Strong AI vs. Weak AI

• Technical Solution

• Symbolic AI vs. Subsymbolic AI

• Knowledge-based systems

• Popular AI systems

• Subdomains of AI

• Some relevant people in AI

• Summary

### What is “Intelligence”?

• "Intelligence denotes the ability of an individual to adapt his thinking to new demands; it is the common mental adaptability to new tasks and conditions of life" (William Stern, 1912)

• Being "intelligent" means to be able to cognitively grasp phenomena, being able to judge, to trade of between different possibilities, or to be able to learn.

• An important aspect of "Intelligence" is the way and efficiency how humans are able to adapt to their environment or assimilate their environment for solving problems.

• Intelligence manifests itself in logical thinking, computations, the memory capabilities of the brain, through the application of words and language rules or through the recognition of things and events.

• The combination of information, creativity, and new problem solutions is crucial for acting "intelligent".

### Testing “Intelligence” with the Turing Test

• Turing test is a proposal to test a machine’s ability to demonstrate “intelligence”

### Testing “Intelligence” with the Turing Test (1)

• Turing test proceeds as follows:

• A human judge C engages in a natural language conversation with one human B and one machine A , each of which tries to appear human.

• All participants are placed in isolated locations.

• If the judge C cannot reliably tell the machine A from the human B , the machine is said to have passed the test.

• In order to test the machine's intelligence rather than its ability to render words into audio, the conversation is limited to a text-only channel such as a computer keyboard or screen

• Turing test is an operational test for intelligent behaviour. For more details see [2].

### “Chinese Room”

• The “Chinese room” experiment developed by John Searle in 1980 attempts to show that a symbol-processing machine like a computer can never be properly described as having a ”mind” or “understanding”, regardless of how intelligently it may behave.

• With the “Chinese room” John Searle argues that it is possible to pass the Turing Test, yet not (really) think.

• Source: http://en.wikipedia.org/wiki/Chinese_room

### “Chinese Room” (1)

• The “Chinese room” experiment proceeds as follows:

• Searle, a human, who does not knows Chinese, is locked in a room with an enormous batch of Chinese script.

• Slips of paper with still more Chinese script come through a slot in the wall.

• Searle has been given a set of rules in English for correlating the Chinese script coming through with the batches of script already in the room.

### “Chinese Room” (2)

• Searle is instructed to push back through the slot the Chinese script with which the scripts coming in through the slot are correlated according to the rules.

• Searle identifies the scripts coming in and going out on the basis of their shapes alone. He does not speak Chinese, he does not understand them

• The scripts going in are called ‘the questions’, the scripts coming out are ‘the answers’, and the rules that Searle follows is ‘the program’.

• Suppose also that the set of rules, the program is so good and Searle gets so good at following it that Searle’s answers are indistinguishable from those of a native Chinese speaker.

### “Chinese Room” (3)

• The result:

• It seems clear that Searle nevertheless does not understand the questions or the answers

• But Searle is behaving just a computer does, “performing computational operations on formally specified elements”

• Hence, manipulating formal symbols, which is just what a computer running a program does, is not sufficient for understanding or thinking

### What is “Artificial Intelligence”?

• Many definitions exist, among them:

• “The study of the computations that make it possible to perceive, reason, and act” (Winston, 1992)

• “A field of study that seeks to explain and emulate [human] intelligent behaviour in terms of computational processes” (Schalkoff, 1990)

• It is an interdisciplinary field that is based on results from philosphy, psychology, linguistics, or brain sciences

• Difference to “traditional” computer science: Emphasis on cognition, reasoning, and acting

• Generative theory of intelligence:

• Intelligence emerges from the orchestration of multiple processes

• Process models of intelligent behaviour can be investigated and simulated on machines

### Early developments of Artificial Intelligence

• Two main aspects begin to manifest in the early days of AI

1. Cognitive modelling, i.e., the simulation of cognitive processes through information processing models

2. The construction of “intelligent systems” that make certain aspects of human cognition and reasoning available.

### Strong AI vs. Weak AI

• Strong AI
• “An artificial intelligence system can think and have a mind . “ (John Searle 1986)
• “Machine intelligence with the full range of human intelligence” (Kurzweil 2005)
• Ai that matches or exceeds human intelligence.
• Intelligence can be reduced to information processing.
• “Science Fiction AI”
• Weak AI
• Intelligence can partially be mapped to computational processes.
• Intelligence is information processing
• Intelligence can be simulated

### Information Processing and symbolic representation

• Research on Information Processing in AI by

• Exact formulisations.

• Exemplary realisation via implementations.

• Core aspect: Representation and processing of symbols as a foundation of internal processes.

### Symbolic AI

• Symbols are naming objects which provide access to meaning (Newell, 1958)

• “Spoken words are the symbols of mental experience, and written words are the symbols of spoken words.” (Aristotle) [3]

• Mental abilities of humans can be inspected on a symbolic level independent of neuronal architectures or processes.

• Subject of Symbolic AI is thus the meaning of processes (or their symbolic representations respectively).

• Symbolic AI aims to imitate intelligence via formal models.

• Main persons behind symbolic AI are: Simon, Newell, Minsky

### The “(General) Intelligent Agent”

• Core paradigm of symbolic AI is the “Intelligent Agent” [4]:

• has a memory and the capability to act in his world based on it.

• has sensors to perceive information from his environment.

• has actuators to influence the external world.

• has the capability to probe actions. By that he is able to choose the best possible action.

• has internal memory for methods and the exploration of the world is guided by knowledge kept in it.

### Subymbolic AI

• Subsymbolic AI (SSAI) aims to model intelligence empirically.

• SSAI was inspired by biological systems: A model which imitates neural nets in the brain is the basis for the creation of artificial intelligence.

• Neural nets consist of a network of
neurons which have weighted connections
with each other.

• Early work by Rosenblatt (1962):
the “Perceptron” [6]

• Advantages of artificial neuronal nets:

• Distributed representation

• Representation and processing of fuzziness

• Highly parallel and distributed action

• Speed and fault-tolerance

Image: http://www.neuronalesnetz.de

### Development

1. General Problem Solver

2. Knowledge-is-power hypothesis

3. Knowledge levels

• 3a. Newell’s 3 levels of knowledge

• 3b. Brachman’s 5 levels of knowledge

4. Problem Solving Methods

### 1. General Problem Solver

• The General Problem Solver (GPS) is a universal problem solving approach.

• GPS is the first approach that makes the distinction between knowledge of problems domains and how to solve problems

• GPS is domain and task independent approach.

• GPS does not put any restrictions both on the domain knowledge and on the task.

• Examples of GPS are: automated theorem proving and generic search methods

### Automated theorem proving

• Automatic theorem provers are GPS for which every problem can be expressed as logical inference

• Automated theorem proving is about proving of mathematical theorems by a computer program

### Generic Search Methods

• Generic Search Methods are GPS for which every problem can be expressed as search

• One particular example of a Generic Search Method is the A* algorithm.

• A* works for problems that can be represented as a state space i.e. a graph of states. Initial conditions of the problem are represented as start state , goal conditions are represented as end state

• A* is an informed search or heuristic search approach that uses the estimation function:

• ### f(n)=g(n)+h(n)

• g(n) the cost to get from the star state to current state n

• h(n) estimated cost to get from current state n to end state

• f(n) estimated total cost from start state through current state n to the end state

More in Lecture 5

### 1. General Problem Solver (1)

• However, GPS has a set of limitations :

• It works in theory but in practice works only on toy problems (e.g. Tower of Hanoi)

• Could not solve real-world problems because search was easily lost in the combinatorial explosion of intermediate states

### 2. Knowledge-is-power hypothesis

Knowledge-is-power hypothesis , also called the Knowledge Principle was formulated by E.A. Feigenbaum in 1977:

“knowledge of the specific task domain in which the program is to do its problem solving was more important as a source of power for competent problem solving than the reasoning method employed” [15]

### 2. Knowledge-is-power hypothesis (1)

• The Knowledge-is-power hypothesis shifted the focus on how to build intelligent systems from inference to the knowledge base .

• Problem solving is guided by experiential, qualitative, heuristic knowledge.

• The meaning of intelligence as knowledge is the common meaning in English world.

• The Central Intelligence Agency (CIA) defines intelligence as knowledge.

• The Knowledge-is-power hypothesis lead to the emergence of a new filed i.e. expert systems and a new profession i.e. knowledge engineer

### 3. Knowledge levels

3a. Newell’s 3 levels of knowledge

3b. Brachman’s 5 levels of knowledge

### 3a. Newell’s 3 levels of knowledge [5]

• In his work from 1981, Newell tried to answer questions such as

• How can knowledge be characterised?

• What is the relation of this characterisation and the representation of knowledge?

• What is characteristic about a system which holds knowledge?

• Newell distinguished 3 levels in the context of knowledge representation:

• Knowledge Level

• Logical Level

• Implementation Level

### 3a. Newell’s 3 levels of knowledge (1)

• Knowledge Level

• The most abstract level of representing knowledge.

• Concerns the total knowledge contained in the Knowledge Base

• Example: The automated DB-Information system knows that a trip from Innsbruck to Vienna costs 120€

### 3a. Newell’s 3 levels of knowledge (2)

• Logical Level

• Encoding of knowledge in a formal language.

• Example: Price(Innsbruck, Vienna, 120)

### 3a. Newell’s 3 levels of knowledge (3)

• Implementation Level

• The internal representation of the sentences.

• Example:

• As a String “Price(Innsbruck, Vienna, 120)”

• As a value in a matrix

### 3b. Brachman’s 5 Levels of Knowledge [12]

• Brachman defines 5 levels for different types of representations.

• Levels interpret the transition from data to knowledge.

• Each level corresponds to an explicit set of primitives offered to the knowledge engineer.

• Ordering of knowledge levels from simple/abstract to complex/concrete:

• Implementational Level

• Logical Level

• Epistemological Level

• Conceptual Level

• Linguistic Level

### 3b. Brachman’s 5 Levels of Knowledge (1)

• Implementational Level

• The primitives are pointers and memory cells.

• Allows the construction of data structures with no a priori semantics

### 3b. Brachman’s 5 Levels of Knowledge (2)

• Logical Level

• The primitives are logical predicates, operators, and propositions.

• An index is available to structure primitives.

• A formal semantic is given to primitives in terms of relations among objects in the real world

• No particular assumption is made however as to the nature of such relations

### 3b. Brachman’s 5 Levels of Knowledge (3)

• Epistemological Level

• The primitives are concept types and structuring relations.

• Structuring relations provide structure in a network of conceptual types or units. (i.e. inheritance: conceptual units, conceptual sub-units)

• The epistemological level links formal structure to conceptual units

• It contains structural connections in our knowledge needed to justify conceptual inferences.

### 3b. Brachman’s 5 Levels of Knowledge (4)

• Conceptual Level

• The primitives are conceptual relations, primitive objects and actions.

• The primitives have a definite cognitive interpretation, corresponding to language-independent concepts like elementary actions or thematic roles

### 3b. Brachman’s 5 Levels of Knowledge (5)

• Linguistic Level

• The primitives are words, and (lingustic) expressions.

• The primitives are associated directly to nouns and verbs of a specific natural language

• Arbitrary relations and nodes that exist in a domain

### Problem Solving Methods

• Problem Solving Methods (PSM) abstract from details of the implementation of the reasoning process.

• Characteristics of PSM [10]:

• A PSM specifies which inference actions have to be carried out for solving a given task.

• A PSM determines the sequence in which these actions have to be activated.

• Knowledge roles determine which role the domain knowledge plays in each inference action.

### Heuristic Classification

• Generic inference pattern “Heuristic Classification” describes the problem-solving behaviour of these systems on the Knowledge Level in a generic way.

### Propose & Revise

• The propose & revise method is an efficient method for solving the task of parametric design . (see more details in [14])

• The method dependes on the following inferences:

• propose – derives an initial design based on the requirements;

• C-test – requires knowledge that describes which possible designs are valid (i.e., the domain constraints);

• revise – tries to improve an incorrect design based on the feedback of the C-test step.

More in Lecture 7

### Knowledge-based systems (KBS)

• KBS are realized based on a knowledge base (KB).

• KB contains a model represented in a (logical) formalism which can be interpreted by an interpreter (inference engine) that is able draw conclusions from it.

• KBs typically capture knowledge of a domain.

• Methodologies for the development of KBS: e.g. CommonKADS

• Examples: CYC

One of the first systems that aimed to capture common knowledge in a knowledge base

### Expert systems (ES)

• Special form of a KBS.

• Definition: An expert system is a software application that stores knowledge about a certain domain. It is able to draw conclusions from that knowledge and offers concrete solutions for problems in that domain.

• ES simulate human experts and thus the knowledge base typically consists of highly specialized expert knowledge.

• Reasoning of human experts vs. reasoning in ES:

• Human experts are able to master unforeseen effects and situations.

• Human experts are able to learn from experiences.

• Human experts expand their knowledge continuously.

• Human experts derive new knowledge not only based on drawn. conclusions but via analogy and intuition.

### ELIZA

• Early computer program capable of natural language processing.

• Written by J. Weizenbaum between 1964 and 1966.

• ELIZA simulated a psychotherapist by reformulating questions posed by the user.

• Sample ELIZA conversation:
(Source: Wikipedia)

### Deep Blue

• Chess-playing computer developed by IBM that won against world champion Garry Kasparov in 1997.

• Applied a brute force strategy, processing
was highly parallel.

• Evaluation of 200 million positions per second.

• Deep Blue's knowledge base contained over
4,000 positions and 700,000 grandmaster
games.

• It was fine-tuned by chess grand masters.

• Admission from IBM: „Deep Blue, as it stands

• today, is not a "learning system." It is therefore
not capable of utilizing artificial intelligence to
either learn from its opponent or "think" about
the current position of the chessboard.“

### The Humanoid Robot COG

• Project at the MIT Artificial Intelligence Lab

• The goal of the COG project was to build a robot capable of interacting with humans and objects in a human-like way.

• "As I pondered [this] and thought about HAL, I decided to try to build the first serious attempt at a robot with human-level capabilities, the first serious attempt at a HAL-class being." (Rodney Brooks, Inventor of COG)

### CALO („Cognitive Assistant that Learns and Organizes“)

• DARPA funded project, “Personal assistant that learns” – program

• Involves 25 partners, 300+ researchers, including top researchers in AI

• 500+ publications in first four years

• “The goal of the project is to create cognitive software systems, that is, systems that can reason, learn from experience, be told what to do, explain what they are doing, reflect on their experience, and respond robustly to surprise. “ (calosystem.org)

• CALO assists its user with six high-level functions:

• Organizing and Prioritizing Information

• Preparing Information Artifacts

• Mediating Human Communications

• Scheduling and Reasoning in Time

• Resource allocation

### HAL 9000

• An advanced device capable of performing a variety of tasks and interacting with its human users (companions?).

• The HAL9000 communicates by voice and can control a auxiliary devices on a spaceship.

• It (he?) has an unfortunate tendency towards obsessing over minor details or inconsistencies in the instructions given it, however.

• In the events described in Arthur C. Clarke's “2001: A Space Odyssey,” HAL's tendency toward obsessive literalism led to the unfortunate death of most of its spaceship's human crew

### Further popular applications

• SEAS (“Synthetic Environment for Analysis and Simulation”)

• Can be used to simulate realistic events; has a world model

• SYSTRAN

• Early machine translation system

• Foundation for Yahoo’s Babelfish or Google Translator

• http://www.systransoft.com/

• VirtualWoman

• Virtual-reality based chatbot

• http://virtualwoman.net/

• For further references, see http://en.wikipedia.org/wiki/List_of_notable_artificial_intelligence_projects

### Subdomains of AI

• Cognition as information processing

• Artificial neuronal networks

• Heuristic search methods

• Knowledge representation and logic

• Automatic theorem proving

• Non-monotonic reasoning

• Case-based reasoning

• Planning

• Machine Learning

• Knowledge Engineering

• Natural Language Processing

• Image Understanding

• Cognitive Robotics

• Software Agents

### Cognition

• Deals with complex software systems that directly interact and communicate with human users.

• Characteristics of cognitive systems (CS):

• CS are directly integrated in their environment, act in it, and are able to communicate with it.

• CS are able to direct and adapt their actions based on the environment they are situated in.

• CS typically represent system-relevant aspects of the environment.

• Their information processing capabilities are characterized through learning aptitude and anticipation

• Examples of cognitive system:

• Organisms / biological cognitive systems

• Technical systems such as robots or agents

• Mixed human-machine systems

### Neural networks

• Neural networks are networks

• of neurons as in the real

• biological brain.

• Neurons are highly specialized

• cells that transmit impulses within

• animals to cause a change in a target

• cell such as a muscle effector cell or glandular cell.

• The axon , is the primary conduit through which the neuron transmits impulses to neurons downstream in the signal chain

• Humans: 1011 neurons of > 20 types, 1014 synapses, 1ms-10ms cycle time

• Signals are noisy “spike trains” of electrical potential

### Neural networks (2)

• What we refer to as Neural Networks in the course are mostly Artificial Neural Networks (ANN).

• ANN are approximation of biological neural networks and are built of physical devices, or simulated on computers.

• ANN are parallel computational entities that consist of multiple simple processing units that are connected in specific ways in order to perform the desired tasks.

• Remember: ANN are computationally primitive approximations of the real biological brains .

• Application examples : e.g., handwriting recognition, time series prediction, kernel machines (support vector machines, data compression, financial predication, speech recognition, computer vision, protein structures

### Search Methods

• Search Methods are typically helping humans to solve complex tasks by generating (optimal) plans (i.e. a set of operations / states) that includes sequences / actions to reach a goal state.

• Example problem: Tower of Hanoi

 Initial status: ((123)()()) Goal status: (()()(123))
• Definition: A search method is defined by picking the order of node expansion.

• Search strategies are evaluated according to completeness, time complexity, space complexity, optimality.

• Time and space complexity are measured in terms of maximum branching, depth of the least-cost solution, maximum depth of the state space

• Distinction between informed / uninformed search techniques

### Knowledge Representation and Logic

• The term knowledge representation describes the design and implementation of formalisms, to model a part of the reality (a domain).

• A model represented using a formalisms and implemented by an interpreter is often called a knowledge base.

• A knowledge base is a collection of facts and beliefs.

• Modelling of knowledge bases happens on a conceptual level.

• Intention: To model a domain of discourse and to draw inferences about the objects in the domain (reasoning)

• Logic studies the principles of reasoning and offers

• Formal languages for expressing knowledge

• Well understood formal semantics

• Reasoning methods to make implicit knowledge explicit

### Automatic theorem proving

• Automatic theorem proving deals with the design and implementation of computer programmes that are capable of making mathematical proofs.

• Theorem provers deduce new formulas from given formulas via logical deduction rules until the target formula is found.

• Theoretical foundation of automated theorem proving: mathematical logic; typically first-order-logic.

• Formulas are mathematically precisely defined via interpretations (provide semantics for function and predicate symbols via mappings and relations respectively)

### Non-monotonic reasoning

• Classical logic is monotonic in the following sense: whenever a sentence A is a logical consequence of a set of sentences T, then A is also a consequence of an arbitrary superset of T [13].

• Non-monotonic reasoning:

• Additional information may invalidate conclusions.

• Non-monotonic reasoning is closer to (human) common-sense reasoning.

• Most rules in common-sense reasoning only hold with exceptions (i.e. university_professors_teach)

• Important approaches to formalise non-monotonic reasoning:

• Default-Logics: Non-classical inference rules are use to represent defaults

• The modale approach: Modal operators are used to explicitely declare if sth. is believed in or is consistent.

• Circumscription: Validity can be restricted to specific models.

• Conditional approaches: A conditional junctor is used to represent defaults in a logical language.

### Case-based reasoning

• Definition: A “case” is an experience made during the solving of a problem.

• A case is typically informally given and covers the problem and the solution.

• Experiences (resp. cases) are used to solve newly occurring problems.

• Cases are collected in a so-called case-base (analogous to a knowledge base in KBS)

• Case-based reasoning is inspired by human problem solving capabilities.

• Application scenarios are characterized through:

• A considerable amount of cases has to be available

• Using the cases to solve the problem has to be easier than solving the problem directly.

• Available information is incomplete or unsecure and imprecise.

• The construction of a KBS and the modelling of cases is not easily possible.

• Typical application scenarios can be found in the area of diagnostics, electronic sales, configaration, or planning.

### Planning

• What is Planning?

• “Planning is the process of thinking about the activities required to create a desired goal on some scale” [Wikipedia]

• We take a more pragmatic view – planning is a flexible approach for taking complex decisions:

• decide about the schedule of a production line;

• decide about the movements of an elevator;

• decide about the flow of paper through a copy machine;

• By “flexible” we mean:

• the problem is described to the planning system in some generic language;

• a (good) solution is found fully automatically;

• if the problem changes, all that needs to be done is to change the description.

• Planning looks at methods to solve any problem that can be described in the language chosen for the particular planning system.

• Approaches for the generation of action sequences: action planning and situated activity planning.

### Machine Learning

• Machine Learning (ML) is a central research area in AI to acquire knowledge.

• ML deals with the computer-aided design and realisation of learning problems.

• Learning is defined as the process that enables a systems to perform better during solving of the same or similar tasks in the future (Simon, 1983)

• Reduction of learning to mathematical theories: Deduction, Induction, Abduction.

• Learning task is typically characterized through the description of inputs, expected outputs, and environmental conditions.

• Typical machine learning applications: Data mining, Speech recognition, text analysis, control learning, hidden markov networks, etc.

### Knowledge Engineering

• Knowledge engineering is concerned with the acquisition, management, use and transformation of knowledge.

• Goals are similar to software engineering, i.e. to systematically develop expert systems using existing methods and tools.

• Core process in knowledge engineering: knowledge acquisition; During knowledge acquisition knowledge is formalised, i.e. transformed from a natural language representation to a formal representation.

• Process models for knowledge acquisition: Model by Puppe; model by Buchanan; Harmon/Mauss/Morrissey; Waterman; or MIKE

• Methodical approachs and tools: D3; CommonKADS; MIKE; Protégé-II; RASSI

• Application cases include the development of expert systems, workflow systems or knowledge management

### Natural Language Processing

• Goal: Processing and understanding of speech or written language.

• Challenges include information re-construction from spoken words or information selection and reduction during speech production.

• Application areas: Tools for inter-human communication, tools for text generation or text correction (i.e. identification of grammatical errors based on language models), information classification or filtering, or human-machine communication.

### Image Understanding

• Image Understanding (IU) deals with the analysis and interpretation of visual information. IU denotes the reconstruction and interpretation of scenes based on images.

• Early approaches based on pattern recognition (still one of the most important foundations of this field)

• Prominent application: object recognition of still and moving objects

• Application areas: symbol recognition, medical image analysis, vehicle navigation, image archiving, gesture recognition,

### Cognitive Robotics

• AI deals with the development of robots as autonomous and intelligent systems.

• Robotic covers many sub-areas of AI and involves interdisciplinary work including mechanic and electrical design and cognitive areas.

• Types of robots: static robots, mobile robots, and humanoid robots.

• Application areas: construction, planning, or observation.

### Software Agents

• Definition: A software agent is a long-term operating program whose function can be described as autonomous execution of tasks or tracing of goals via interaction with his environment.

• Agent (see earlier slide)

• has a memory and the capability to act in his world based on it.

• has sensors to perceive information from his environment.

• has actuators to influence the external world.

• has the capability to probe actions

• has internal memory for methods and the exploration of the world is guided by knowledge kept in it.

• Applications: Data collection and filtering, event notification, planning and optimization in various application areas (commerce, production, military, education)

### Some relevant people in AI

• Isaac Asimov (http://www.asimovonline.com/)

• Arthur C. Clark (http://www.clarkefoundation.org/)

• John McCarthy (http://www-formal.stanford.edu/jmc/)

• Marvin Minsky (http://web.media.mit.edu/~minsky/)

• Donald Michie (http://www.aiai.ed.ac.uk/~dm/dm.html)

• Allen Newell (http://www.princeton.edu/~hos/frs122/newellobit.html)

• Herbert A. Simon (http://www.psy.cmu.edu/psy/faculty/hsimon/hsimon.html)

• Alan Turing (http://www.turing.org.uk/turing/)

### Summary

• Birth of AI in the 1950s

• Broad spectrum of subdomains and combination of disciplines

• Distinction between

• Weak and strong AI

• Symbolic and subsymbolic AI

• Central role: symbols and knowledge representation

• Knowledge-based systems and intelligent agents are core concepts in AI

### References

• http://en.wikipedia.org/wiki/List_of_notable_artificial_intelligence_projects

• http://en.wikipedia.org/wiki/Turing_test

• http://en.wikipedia.org/wiki/General_Problem_Solver

### References

• [1] G. Görz, C.-R. Rollinger, J. Schneeberger (Hrsg.) “Handbuch der künstlichen Intelligenz” Oldenbourg Verlag, 2003, Fourth edition

• [2] A. Turing. "Computing Machinery and Intelligence", Mind LIX (236): 433–460, Ocotober, 1950.

• [3] Aristotle “On Interpretation”, 350 B.C.E, see: http://classics.mit.edu/Aristotle/interpretation.html

• [4] A. Newell, H.A. Simon, “Human Problem Solving” Englewood Cliffs, N.J.: Prentice Hall, 1972

• [5] A. Newell. “The Knowledge Level”, AI Magazine 2 (2), 1981, p. 1-20.

• [6] F. Rosenblatt. “Strategic Approaches to the Study of Brain Models” In: Förster, H.: Principles of Self-Organization. Elmsford, N.Y.: Pergamon Press, 1962.

• [7] S. Russell, E.H. Wefald. "Do the Right Thing: Studies in Limited Rationality" MIT Press, 1991.

• [8] C. Beierle and G. Kern-Isberner "Methoden wissensbasierter Systeme. Grundlagen, Algorithmen, Anwendungen" Vieweg, 2005.

### References

• [9] J. Weizenbaum. "ELIZA - A Computer Program For the Study of Natural Language Communication Between Man And Machine", Communications of the ACM 9 (1): p. 36–45, 1966.

• [10] W. Birmingham and G. Klinker “Knowledge Acquisition Tools with Explicit Problem-Solving Methods” The Knowledge Engineering Review 8, 1 (1993), 5-25

• [11] A. Newell and H. Simon "GPS, a program that simulates human thought" In: Computation & intelligence: collected readings, pp. 415 - 428, 1995.

• [12] R. J. Brachman “On the Epistemological Status of Semantic Networks” In: N.V. Findler (ed.): Associative Networks: Representation and Use of Knowledge by Computers. New York: Academic Press, 1979, 3-50.

• [13] G. Brewka, I. Niemelä, M. Truszczynski “Nonmonotonic Reasoning” In: V. Lifschitz, B. Porter, F. van Harmelen (eds.), Handbook of Knowledge Representation, Elsevier, 2007, 239-284

• [14] D. Fensel “Problem-Solving Methods: Understanding, Description, Development and Reuse”,, Springer LNAI 1791, 2000

• [15] E.A. Feigenbaum. “The Art of Artificial Intelligence: Themes and Case Studies of Knowledge Engineering,” Proceedings of the International Joint Conference on Artificial Intelligence, Cambridge, MA, 1977

• [16] W.J. Clancey. “Heuristic Classification”, Artificial Intelligence , 27:289-350, 1985

### Next Lecture

 Title 1 Introduction 2 Propositional Logic 3 Predicate Logic 4 Reasoning 5 Search Methods 6 CommonKADS 7 Problem-Solving Methods 8 Planning 9 Software Agents 10 Rule Learning 11 Inductive Logic Programming 12 Formal Concept Analysis 13 Neural Networks 14 Semantic Web and Services

### Where are we?

 Title 1 Introduction 2 Propositional Logic 3 Predicate Logic 4 Reasoning 5 Search Methods 6 CommonKADS 7 Problem-Solving Methods 8 Planning 9 Software Agents 10 Rule Learning 11 Inductive Logic Programming 12 Formal Concept Analysis 13 Neural Networks 14 Semantic Web and Services

### Outline

• Motivation

• Technical Solution

• Syntax

• Semantics

• Inference

• Illustration by a Larger Example

• Extensions

• Summary

• References

### Logic and Deduction

• Logic is used to formalize deduction

• Deduction = derivation of true statements (called conclusions ) from statements that are assumed to be true (called premises )

• Natural language is not precise, so the careless use of logic can lead to claims that false statements are true, or to claims that a statement is true, even tough its truth does not necessarily follow from the premises

=> Logic provides a way to talk about truth and correctness in a rigorous way, so that we can prove things, rather than make intelligent guesses and just hope they are correct

### Why Propositional Logic?

• Propositional logic is a good vehicle to introduce basic properties of logic; used to:

• Associate natural language expressions with semantic representations

• Evaluate the truth or falsity of semantic representations relative to a knowledge base

• Compute inferences over semantic representations

• One of the simplest and most common logic

• The core of (almost) all other logics

### What is Propositional Logic?

• An unambiguous formal language, akin to a programming language

• Syntax: Vocabulary for expressing concepts without ambiguity

• Semantics: Connection to what we're reasoning about

• Interpretation - what the syntax means

• Reasoning: How to prove things

• What steps are allowed

### Syntax

• Logical constants: true, false

• Propositional symbols: P, Q, S, ...

• Wrapping parentheses: ( … )

• Atomic formulas: Propositional Symbols or logical constants

• Formulas are either atomic formulas, or can be formed by combining atomic formulas with the following connectives:

• ...and [conjunction]

• ...or [disjunction]

• →...implies [implication / conditional]

• ...is equivalent [biconditional]

• ¬ ...not [negation]

### Syntax (cont’)

• A sentence (well formed formula) is defined as follows:

• A symbol is a sentence

• If S is a sentence, then ¬S is a sentence

• If S is a sentence, then (S) is a sentence

• If S and T are sentences, then (S ∨ T), (S ∧ T), (S → T), and (S ↔ T) are sentences

• A sentence results from a finite number of applications of the above rules

### Syntax – BNF Grammar

• Sentence → AtomicSentence | ComplexSentence

• AtomicSentence → True | False | P | Q | R | ...

• ComplexSentence ( Sentence ) | Sentence Connective Sentence | ¬ Sentence

• Connective | | |

• Ambiguities are resolved through precedence ¬ ∧ ∨
or parentheses

• e.g. ¬ P Q R S is equivalent to (¬ P) (Q R)) S

### Syntax – Examples

 P means "It is hot. " ¬ p ∧ ¬ q Q means "It is humid. " ¬(p ∧ q) R means "It is raining. " (p ∧ q) ∨ r (P ∧ Q) → R "If it is hot and humid, then it is raining " p ∧ q ∧ r (((¬ p) ∧ q) → r) ↔ ((¬ r) ∨ p) Q → P "If it is humid, then it is hot " (¬ (p ∨ q) → q) → r ((¬ p) ∨ (¬ q)) ↔ (¬ r) Etc.

### Semantics

• Interpretations
• Equivalence
• Substitution
• Models and Satisfiability
• Validity
• Logical Consequence (Entailment)
• Theory

### Semantics – Some Informal Definitions

• Given the truth values of all symbols in a sentence, it can be “evaluated” to determine its truth value (True or False)

• A model for a KB is a “possible world” (assignment of truth values to propositional symbols) in which each sentence in the KB is True

• A valid sentence or tautology is a sentence that is True under all interpretations, no matter what the world is actually like or how the semantics are defined (example: “It’s raining or it’s not raining”)

• An inconsistent sentence or contradictio n is a sentence that is False under all interpretations (the world is never like what it describes, as in “It’s raining and it’s not raining”)

• P entails Q , written P ⊧ Q, means that whenever P is True, so is Q; in other words, all models of P are also models of Q

### Interpretations

• In propositional logic, truth values are assigned to the atoms of a formula in order to evaluate the truth value of the formula

• An assignment is a function

v : P → {T,F}

v assigns a truth value to any atom in a given formula ( P is the set of all propositional letters, i.e. atoms)
Suppose F denotes the set of all propositional formulas. We can extend an assignment v to a function

v : F → {T,F}

which assigns the truth value v(A) to any formula A in F. v is called an interpretation.

### Interpretations (cont’)

• Example:
• Suppose v is an assignment for which

v(p) = F,           v(q) = T.

• If A = (¬p → q) ↔ (p V q) , what is v(A) ?

Solution:

v(A) = v ((¬ p q ) ( p V q ))

= v p q ) v ( p V q )

= ( v p ) v ( q )) ( v ( p ) V v ( q ))

= (¬ v ( p ) v ( q )) ( v ( p ) V v ( q ))

= (¬F T) (F V T)

= (T T) (F V T)

= T T

= T

### Equivalence

• If A,B are formulas are such that

v ( A ) = v ( B )

for all interpretations v , A is (logically) equivalent to B:

A ≡ B

• Example: ¬p V q ≡ p → q since both formulas are true in all interpretations except when v ( p ) = T, v ( q ) = F and are false for that particular interpretation

• Caution : ≡ does not mean the same thing as :

• A ↔ B is a formula (syntax)

• A ≡ B is a relation between two formula (semantics)

• Theorem : A ≡ B if and only if A ↔ B is true in every interpretation; i.e. A ↔ B is a tautology .

### Equivalence and Substitution – Examples

• Examples of logically equivalent formulas

• Example: Simplify
Solution:

### Models and Satisfiability

• A propositional formula A is satisfiable iff v ( A ) = T in some interpretation v; s uch an interpretation is called a model for A .

• A is unsatisfiable (or, contradictory) if it is false in every interpretation

• A set of formulas U = { A 1 ,A 2 ,…,A n } is satisfiable iff there exists an interpretation v such that v ( A 1 ) = v ( A 2 ) = = v ( A n ) = T; such an interpretation is called a model of U.

• U is unsatisfiable if no such interpretation exists

• Relevant properties:

• If U is satisfiable, then so is U − {Ai} for any i = 1, 2,…, n

• If U is satisfiable and B is valid, then U U { B } is also satisfiable

• If U is unsatisfiable and B is any formula, U U { B } is also unsatisfiable

• If U is unsatisfiable and some A i is valid, then U − {Ai} is also unsatisfiable

### Validity

• A is valid (or, a tautology), denoted ⊧ A , iff v ( A ) = T , for all i nterpretations v

• A is not valid (or, falsifiable), denoted ⊭ A if we can find some interpretation v , such that v(A) = F

• Relationship between validity, satisfiability, falsifiability, and unsatisfiability:

### Validity (cont’)

• Examples:

• Valid (tautology):

• Not valid, but satisfiable:

• Theorem:

• (a) A is valid if and only if ¬A is unsatisfiable

• (b) A is satisfiable if and only if ¬A is falsifiable

### Logical Consequence (i.e. Entailment)

• Let U be a set of formulas and A a formula. A is a (logical) consequence of U , if any interpretation v which is a model of U is also a model for A :

• U A

• Example:

If some interpretation v is a model for the set

it must satisfy but in this interpretation, we also have

### Theory

• A set of formulas T is a theory if it is closed under logical consequence. This means that, for every formula A , if T A , then A is in T

• Let U be a set of formulas. Then, the set of all consequences of U

T ( U ) = { A | U A }

is called the theory of U . The formulas in U are called the axioms for the theory T(U) .

### Inference Methods

• Several basic methods for determining whether a given set of premises propositionally entails a given conclusion

• Truth Table Method

• Deductive (Proof) Systems

• Resolution

### Truth Table Method

• One way of determining whether or not a set of premises logically entails a possible conclusion is to check the truth table for the logical constants of the language

• This is called the truth table method and can be formalized as follows:

• Step 1: Starting with a complete truth table for the propositional constants, iterate through all the premises of the problem, for each premise eliminating any row that does not satisfy the premise

• Step 2: Do the same for the conclusion

• Step 3: Finally, compare the two tables; If every row that remains in the premise table, i.e. is not eliminated, also remains in the conclusion table, i.e. is not eliminated, then the premises logically entail the conclusion

### Example

• Simple sentences:

• Amy loves Pat: lovesAmyPat

• Amy loves Quincy: lovesAmyQuincy

• It is Monday: ismonday

• Premises:

• If Amy loves Pat, Amy loves Quincy:

• lovesAmyPat lovesAmyQuincy

• If it is Monday, Amy loves Pat or Quincy:

• ismonday lovesAmyPat lovesAmyQuincy

• Question:

• If it is Monday, does Amy love Quincy?

• i.e. is ismonday lovesAmyQuincy entailed by the premises?

### Step 1: Truth table for the premises

 lovesAmyPat lovesAmyQuincy ismonday lovesAmyPat → lovesAmyQuincy ismonday → lovesAmyPat ∨ lovesAmyQuincy T T T T T T T F T T T F T F T T F F F T F T T T T F T F T T F F T T F F F F T T

### Step 1: Eliminate non-sat interpretations

 lovesAmyPat lovesAmyQuincy ismonday lovesAmyPat → lovesAmyQuincy ismonday → lovesAmyPat ∨ lovesAmyQuincy T T T T T T T F T T T F T F T T F F F T F T T T T F T F T T F F T T F F F F T T

### Step 2: Truth table for the conclusion

 lovesAmyPat lovesAmyQuincy ismonday ismonday → lovesAmyQuincy T T T T T T F T T F T F T F F T F T T T F T F T F F T F F F F T

### Step 2: Eliminate non-sat interpretations

 lovesAmyPat lovesAmyQuincy ismonday ismonday → lovesAmyQuincy T T T T T T F T T F T F T F F T F T T T F T F T F F T F F F F T

### Step 3: Comparing tables

• Finally, in order to make the determination of logical entailment, we compare the two rightmost tables and notice that every row remaining in the premise table also remains in the conclusion table.

• In other words, the premises logically entail the conclusion.

• The truth table method has the merit that it is easy to understand

• It is a direct implementation of the definition of logical entailment.

• In practice, it is awkward to manage two tables, especially since there are simpler approaches in which only one table needs to be manipulated

• Validity Checking

• Unsatisfability Checking

### Validity checking

• Approach: To determine whether a set of sentences

{ φ1,…,φn }

logically entails a sentence j, form the sentence

( φ1 ∧…∧ φn → φ )

and check that it is valid.

• To see how this method works, consider the previous example and write the tentative conclusion as shown below.

(lovesAmyPat → lovesAmyQuincy) ∧ (ismonday → lovesAmyPat ∨ lovesAmyQuincy) → (ismonday → lovesAmyQuincy)

• Then, form a truth table for our language with an added column for this sentence and check its satisfaction under each of the possible interpretations for our logical constants

### Unsatisfability Checking

• It is almost exactly the same as the validity checking method, except that it works negatively instead of positively.

• To determine whether a finite set of sentences {φ1,…,φn} logically entails a sentence φ, we form the sentence

1 ∧…∧ φn ∧ ¬ φ)

and check that it is unsatisfiable .

• Both the validity checking method and the satisfiability checking method require about the same amount of work as the truth table method, but they have the merit of manipulating only one table

### Example – A truth table

 p q r p → q p → r p → r ∨ q ( p → q ) ∧ ( p → r ) → ( p → r ∨ q ) p → r ∧ q ( p → q ) ∧ ( p → r ) → ( p → r ∧ q ) T T T T T T T T T T T F T F T T F T T F T F T T T F T T F F F F F T F T F T T T T T T T T F T F T T T T T T F F T T T T T T T F F F T T T T T T

### Deductive (proof) systems

• Semantic methods for checking logical entailment have the merit of being conceptually simple; they directly manipulate interpretations of sentences

• Unfortunately, the number of interpretations of a language grows exponentially with the number of logical constants.

• When the number of logical constants in a propositional language is large, the number of interpretations may be impossible to manipulate.

• Deductive (proof) systems provide an alternative way of checking and communicating logical entailment that addresses this problem

• In many cases, it is possible to create a “proof” of a conclusion from a set of premises that is much smaller than the truth table for the language;

• Moreover, it is often possible to find such proofs with less work than is necessary to check the entire truth table

### Schemata

• An important component in the treatment of proofs is the notion of a schema

• A schema is an expression satisfying the grammatical rules of our language except for the occurrence of metavariables in place of various subparts of the expression.

• For example, the following expression is a pattern with metavariables φ and ψ.

φ → (ψ → φ)

• An instance of a sentence schema is the expression obtained by substituting expressions for the metavariables.

• For example, the following is an instance of the preceding schema.

p →( q p )

### Rules of Inference

• The basis for proof systems is the use of correct rules of inference that can be applied directly to sentences to derive conclusions that are guaranteed to be correct under all interpretations

• Since the interpretations are not enumerated, time and space can often be saved

• A rule of inference is a pattern of reasoning consisting of:

• One set of sentence schemata, called premises , and

• A second set of sentence schemata, called conclusions

• A rule of inference is sound if and only if, for every instance, the premises logically entail the conclusions

### E.g. Modus Ponens (MP)

 φ → ψ φ ψ P →(Q →R) P Q →R raining → wet raining wet (P →Q) →R P → Q R

 wet → slippery wet slippery I.e. we can substitute for the metavariables complex sentences Note that, by stringing together applications of rules of inference, it is possible to derive conclusions that cannot be derived in a single step. This idea of stringing together rule applications leads to the notion of a proof.

### Axiom schemata

• The implication introduction schema ( II ), together with Modus Ponens, allows us to infer implications

φ → (ψ → φ)

• The implication distribution schema ( ID ) allows us to distribute one implication over another

(φ → (ψ → c)) → ((φ → ψ) → (φ → c))

• The contradiction realization schemata ( CR ) permit us to infer a sentence if the negation of that sentence implies some sentence and its negation

(ψ → ¬ φ) → ((ψ → φ) → ¬ ψ)

(¬ ψ → ¬ φ) → ((¬ ψ → φ) → ψ)

### Axiom schemata (cont’)

• The equivalence schemata ( EQ ) captures the meaning of the operator

ψ) → (φ → ψ)

ψ) → (ψ → φ)

(φ → ψ) → ((ψ → φ) → (ψ φ))

• The meaning of the other operators in propositional logic is captured in the following axiom schemata

(φ ← ψ) (ψ → φ)

(φ ∨ ψ) (¬φ → ψ)

(φ ∧ ψ) ¬(¬φ ∨ ¬ψ)

• The above axiom schemata are jointly called the standard axiom schemata for Propositional Logic

• They all are valid

### Proofs

• A proof of a conclusion from a set of premises is a sequence of sentences terminating in the conclusion in which each item is either

1. a premise,

2. an instance of an axiom schema, or

3. the result of applying a rule of inference to earlier items in sequence

• Example:

 1. p → q Premise 2. q → r Premise 3. ( q → r ) → ( p → ( q → r )) II 4. p → ( q → r ) MP : 3,2 5. ( p → ( q → r )) → (( p → q ) →( p → r )) ID 6. ( p → q ) → ( p → r ) MP : 5,4 7. p → r MP : 6,1

### Proofs (cont’)

• If there exists a proof of a sentence φ from a set Δ of premises and the standard axiom schemata using Modus Ponens, then φ is said to be provable from Δ, written as

Δ φ

• There is a close connection between provability and logical entailment (⊧):

A set of sentences Δ logically entails a sentence φ if and only if φ is provable from Δ

• Soundness Theorem:

If φ is provable from Δ, then Δ logically entails φ.

• Completeness Theorem:

If Δ logically entails φ, then φ is provable from Δ.

### Proofs (cont’)

• The concept of provability is important because it suggests how we can automate the determination of logical entailment

• Starting from a set of premises Δ, we enumerate conclusions from this set

• If a sentence φ appears, then it is provable from Δ and is, therefore, a logical consequence

• If the negation of φ appears, then ¬φ is a logical consequence of Δ and φ is not logically entailed (unless Δ is inconsistent)

• Note that it is possible that neither φ nor ¬φ will appear

### Resolution

• Propositional resolution is an extremely powerful rule of inference for Propositional Logic

• Using propositional resolution (without axiom schemata or other rules of inference), it is possible to build a theorem prover that is sound and complete for all of Propositional Logic

• The search space using propositional resolution is much smaller than for standard propositional logic

• Propositional resolution works only on expressions in clausal form

• Before the rule can be applied, the premises and conclusions must be converted to this form

### Clausal Forms

• A clause is a set of literals which is assumed (implicitly) to be a disjunction of those literals

• Example:

• Unit clause: clause with only one literal; e.g. {¬ q }

• Clausal form of a formula: Implicit conjunction of clauses

• Example:

Abbreviated notation:

• Notation:

• l -literal, l c -complement of l

• C -clause (a set of literals)

• S -a clausal form (a set of clauses)

### Resolution – Properties of Clausal Forms

(1) If l appears in some clause of S , but l c does not appear in any clause, then, if we delete all clauses in S containing l , the new clausal form S' is satisfiable if and only if S is satisfiable

Example: Satisfiability of is equivalent to satisfiability of

(2) Suppose C = { l } is a unit clause and we obtain S' from S by deleting C and l c from all clauses that contain it; then, S is satisfiable if and only if S' is satisfiable

Example: is satisfiable iff is satisfiable

### Resolution – Properties of Clausal Forms (cont’)

(3) If S contains two clauses C and C' , such that C is a subset of C' , we can delete C‘ without affecting the (un)satisfiability of S

Example: is satisfiable iff is satisfiable

(4) If a clause C in S contains a pair of complementary literals l , l c , then C can be deleted from S without affecting its (un)satisfiability

Example: is satisfiable iff is such

### Converting to clausal form

Theorem : Every propositional formula can be transformed into an equivalent formula in CNF

1. Implications:

φ1 → φ2 → ¬φ1 ∨ φ2

φ1 ← φ2 → φ1 ∨ ¬φ2

φ1 φ2 → (¬φ1 ∨ φ2 ) ∧ (φ1 ∨ ¬φ2)

2. Negations:

¬¬φ → φ

¬(φ1 ∧ φ2 ) → ¬φ1 ∨ ¬φ2

¬(φ1 ∨ φ2 ) → ¬φ1 ∧ ¬φ2

3. Distribution:

φ1 ∨ (φ2 ∧ φ3 ) → (φ1 ∨ φ2) ∧ (φ1 ∨ φ3 )

(φ1 ∧ φ2) ∨ φ3 → (φ1 ∨ φ3 ) ∧ (φ2 ∨ φ3)

(φ1 ∨ φ2) ∨ φ3 → φ1 ∨ (φ2 ∨ φ3)

(φ1 ∧ φ2) ∧ φ3 → φ1 ∧ (φ2 ∧ φ3)

4. Operators:

φ1 ∨... ∨ φn → {φ1,...,φn}

φ1 ∧ ...∧ φn → {φ1}...{φn}

### Example

• Transform the formula

(p → q) → (¬q → ¬p)

into an equivalent formula in CNF

Solution:

### Resolution Rule

• Suppose C 1 ,C 2 are clauses such that l in C 1 , l c in C 2 . The clauses C 1 and C 2 are said to be clashing clauses and they clash on the complementary literals l , l c

C , the resolvent of C 1 ,C 2 is the clause

C 1 and C 2 are called the parent clauses of C.

### Resolution Rule (cont’)

• Example:

The clauses

clash on P, ¬ P

C 1 ,C 2 also clash on Q, ¬ Q so, another way to find a resolvent for these two clauses is

### Resolution (cont’)

• Theorem : Resolvent C is satisfiable if and only if the parent clauses C 1 ,C 2 are simultaneously satisfiable

• Resolution Algorithm:

Input : S – a set of clauses

Output : “S is satisfiable” or “S is not satisfiable”

1. Set S 0 := S

2. Suppose S i has already been constructed

3. To construct S i+1 , choose a pair of clashing literals and clauses C 1 ,C 2 in S (if there are any) and derive

C := Res( C 1 ,C 2 )

S i+1 := S i U {C}

4. If C is the empty clause, output “S is not satisfiable”; if S i+1 = S i , output “S is satisfiable”

5. Otherwise, set i := i + 1 and go back to Step 2

### Resolution (cont’)

• Example: Show that (p → q) → (¬q → ¬p) is a valid formula
Solution: We will show that

¬[(p → q) → (¬q → ¬p)]

is not satisfiable.

(1) Transform the formula into CNF:

(2) Show, using resolution, that

1. C is the empty clause

• A derivation of the empty clause from S is called a refutation of S

### Resolution (cont’)

• Theorem : If the set of a clauses labeling the leaves of a resolution tree is satisfiable, then the clause at the root is satisfiable

• Theorem ( Soundness ): If the empty clause is derived from a set of clauses, then the set of clauses is unsatisfiable

• Theorem ( Completeness ) If a set of clauses is unsatisfiable, then the empty clause can be derived from it using resolution algorithm

### Problem Example

• For each of these sets of premises, what relevant conclusion or conclusions can be drawn? Explain the rules of inference used to obtain each conclusion from the premises.

(a) “If I eat spicy foods, then I have strange dreams.” “I have strange dreams if there is thunder while I sleep.” “I did not have strange dreams.”

(b) “I am dreaming or hallucinating.” “I am not dreaming.” “If I am hallucinating, I see elephants running down the road.”

(c) “If I work, it is either sunny or partly sunny.” “I worked last Monday or I worked last Friday.” “It was not sunny on Tuesday.” “It was not partly sunny on Friday.”

### Solution (a)

• (a) “If I eat spicy foods, then I have strange dreams.” “I have strange dreams if there is thunder while I sleep.” “I did not have strange dreams.”
• The relevant conclusions are: “I did not eat spicy food” and “There is no thunder while I sleep”.
• Let the primitive statements be:
• s, ‘I eat spicy foods’
• d, ‘I have strange dreams’
• t, ‘There is thunder while I sleep’
• Then the premises are translated as: s → d, t → d, and ¬d.
• And the conclusions: ¬s, ¬t.
• Steps        Reason
1. s → d        premise
2. ¬d             premise
3. ¬s             Modus Tollens to Steps 1 and 2
4. t → d        premise
5. ¬t             Modus Tollens to Steps 4 and 2.

### Solution (b)

• (b) “I am dreaming or hallucinating.” “I am not dreaming.” “If I am hallucinating, I see elephants running down the road.”

• The relevant conclusion is: “I see elephants running down the road.”.

• Let the primitive statements be:

• d, ‘I am dreaming’

• h, ‘I am hallucinating’

• e, ‘I see elephants running down the road’

• Then the premises are translated as: d ∨ h, ¬d, and h → e.

• And the conclusion: e.

• Steps           Reason

1. d ∨ h            premise

2. ¬d                premise

3. h                  rule of disjunctive syllogism to Steps 1 and 2

4. h → e          premise

5. e                 Modus Ponens to Steps 4 and 3

### Solution (c)

• (c) “If I work, it is either sunny or partly sunny.” “I worked last Monday or I worked last Friday.” “It was not sunny on Tuesday.” “It was not partly sunny on Friday.”

• There is no single relevant conclusion in this problem, its main difficulty is to to represent the premises so that one is able infer anything at all. One possible relevant conclusion is: “It was sunny or partly sunny last Monday or it was sunny last Friday.”.

• Let the primitive statements be:

• wm, ‘I worked last Monday’

• wf , ‘I worked last Friday’

• sm, ‘It was sunny last Monday’

• st, ‘It was sunny last Tuesday’

• sf , ‘It was sunny last Friday’

• pm, ‘It was partly sunny last Monday’

• pf , ‘It was partly sunny last Friday’

• Then the premises are translated as: wm ∨ wf , wm → (sm ∨ pm), wf → (sf ∨ pf ), ¬st, and ¬pf .

• And the conclusion: sf ∨ sm ∨ pm.

### Solution (c) – Method 1

 Steps Reason 1 wf → (sf ∨ pf ) premise 2 ¬wf ∨ sf ∨ pf expression for implication 3 ¬pf → (¬wf ∨ sf ) expression for implication 4 ¬pf premise 5 ¬wf ∨ sf modus ponens to Steps 3 and 4 6 wf → sf expression for implication 7 wm ∨ wf premise 8 ¬wm → wf expression for implication 9 ¬wm → sf rule of syllogism to Steps 8 and 6 10 wm ∨ sf expression for implication 11 ¬sf → wm expression for implication 12 wm → (sm ∨ pm) premise 13 ¬sf → (sm ∨ pm) rule of syllogism to Steps 11 and 12 14 sf ∨ sm ∨ pm expression for implication.

### Solution (c) – Method 2 (Use the rule of resolution)

 Steps Reason 1 wf → (sf ∨ pf ) premise 2 ¬wf ∨ sf ∨ pf expression for implication 3 ¬pf premise 4 ¬wf ∨ sf rule of resolution to Steps 2 and 3 5 wm ∨ wf premise 6 wm ∨ sf rule of resolution to Steps 4 and 5 7 wm → (sm ∨ pm) premise 8 ¬wm ∨ sm ∨ pm expression for implication 9 sf ∨ sm ∨ pm rule of resolution to Steps 7 and 8

### Extensions

• Propositional logic is not adequate for formalizing valid arguments that rely on the internal structure of the propositions involved

• In propositional logic the smallest atoms represent whole propositions (propositions are atomic)

• Propositional logic does not capture the internal structure of the propositions

• It is not possible to work with units smaller than a proposition

• Example:

• “A Mercedes Benz is a Car” and “A car drives” are two individual, unrelated propositions

• We cannot conclude “A Mercedes Benz drives”

### Extensions

• It is possible to represent everything you want in propositional logic

• But often this is not very efficient

• Basic idea : A proposition is expressed as predicate about (on or more) objects in the world

• Propositions are predicates and arguments

• I.e. Car(Mercedes Benz).

• The most immediate way to develop a more complex logical calculus is to introduce rules that are sensitive to more fine-grained details of the sentences being used

• When the atomic sentences of propositional logic are broken up into terms, variables, predicates, and quantifiers, they yield first-order logic , which keeps all the rules of propositional logic and adds some new ones

### Summary

• Propositional logic is one of the simplest and most common logic and is the core of (almost) all other logics

• Propositional logic commits only to the existence of facts that may or may not be the case in the world being represented

• Propositional logic quickly becomes impractical, even for very small worlds

• This lecture focused on three core aspects of the propositional logic:

• Syntax: Vocabulary for expressing concepts without ambiguity

• Semantics: Connection to what we're reasoning about

• Interpretation - what the syntax means

• Reasoning: How to prove things

• What steps are allowed

### References

• First-Order Logic and Automated Theorem Proofing (2nd edition) by Melvin Fitting

• Mathematical Logic for Computer Science (2nd edition) by Mordechai Ben-Ari

• http://www.springer.com/computer/foundations/book/978-1-85233-319-5

• Propositional Logic at The Internet Encyclopedia of Philosophy

• http://www.iep.utm.edu/p/prop-log.htm

• http://en.wikipedia.org/wiki/Propositional_calculus

### Next Lecture

 Title 1 Introduction 2 Propositional Logic 3 Predicate Logic 4 Reasoning 5 Search Methods 6 CommonKADS 7 Problem-Solving Methods 8 Planning 9 Software Agents 10 Rule Learning 11 Inductive Logic Programming 12 Formal Concept Analysis 13 Neural Networks 14 Semantic Web and Services

### Outline

• Motivation

• Technical Solution

• Syntax

• Semantics

• Inference

• Illustration by Larger Example

• Extensions

• Summary

• References

### Propositional logic is not expressive enough

• Suppose we want to capture the knowledge that

Anyone standing in the rain will get wet.

and then use this knowledge. For example, suppose we also learn that

Jan is standing in the rain.

• We'd like to conclude that Jan will get wet. But each of these sentences would just be a represented by some proposition, say P, Q and R . What relationship is there between these propositions? We can say

P /\ Q R

Then, given P /\ Q , we could indeed conclude R . But now, suppose we were told

Pat is standing in the rain.

### Propositional logic is not expressive enough (cont’)

• We'd like to be able to conclude that Pat will get wet, but nothing we have stated so far will help us do this

• The problem is that we aren't able to represent any of the details of these propositions

• It's the internal structure of these propositions that make the reasoning valid.

• But in propositional calculus we don't have anything else to talk about besides propositions!

⇒ A more expressive logic is needed

⇒ Predicate logic (occasionally referred to as First-order logic (FOL) )

### Objects in Predicate Logic

• Constants

• Names of specific objects

• E.g. doreen, gord, william, 32

• Functions

• Map objects to objects

• E.g. father(doreen), age(gord), max(23,44)

• Variables

• For statements about unidentified objects or general statements

• E.g. x, y, z, …

### Terms

• Terms represent objects

• The set of terms is inductively defined by the following rules:

• Constants: Any constant is a term

• Variables: Any variable is a term

• Functions: Any expression f(t1,…,tn) of n arguments (where each argument is a term and f is a function of arity n ) is a term

• Terms without variables are called ground terms

• Examples:

• c

• f(c)

• g(x,x)

• g(f(c), g(x,x))

### Predicate Symbols and Signatures

• Predicate symbols represent relations between zero or more objects

• The number of objects define a predicate‘s aritiy

• Examples:

• Likes(george, kate)

• Likes(x,x)

• Likes(joe, kate, susy)

• Friends (father_of(david), father_of(andrew))

• Signature: A signature is a collection of constants, function symbols and predicate symbols with specified arities

### Connectives

• FOL formulas are joined together by logical operators to form more complex formulas (just like in propositional logic)

• The basic logical operators are the same as in propositional logic as well:

• Negation: ¬p („it is not the case that p“)

• Conjunction: p ∧ q („p and q“)

• Disjunction: p ∨ q („p or q“)

• Implication: p → q („p implies q“ or “q if p“)

• Equivalence: p ↔ q („p if and only if q“)

### Quantifiers

• Two quantifiers: Universal (∀) and Existential (∃)

• Allow us to express properties of collections of objects instead of enumerating objects by name

• Apply to sentence containing variable

• Universal ∀: true for all substitutions for the variable

• “for all”: ∀

• Existential ∃: true for at least one substitution for the variable

• “there exists”: ∃

• Examples:

• ∃ x: Mother(art) = x

• ∀ x ∀ y: Mother(x) = Mother(y) → Sibling(x,y)

• ∃ y ∃ x: Mother(y) = x

### Formulas

• The set of formulas is inductively defined by the following rules:

1. Preciate symbols : If P is an n-ary predicate symbol and t1,…,tn are terms then P(t1,…,tn) is a formula.

2. Negation : If φ is a formula, then ¬φ is a formula

3. Binary connectives : If φ and ψ are formulas, then (φ → ψ) is a formula. Same for other binary logical connectives.

4. Quantifiers : If φ is a formula and x is a variable, then ∀xφ and ∃xφ are formulas.

• Atomic formulas are formulas obtained only using the first rule

• Example: If f is a unary function symbol, P a unary predicate symbol, and Q a ternary predicate symbol, then the following is a formula:

∀x∀y(P(f(x)) → ¬(P(x)) → Q(f(y), x, x)))

### Formulas

• Any occurrence a variable in a formulate not in the scope of a quantifier is said to be a free occurrence

• Otherwise it is called a bound occurrence

• Thus, if x is a free variable in φ it is bound in ∀xφ and ∃xφ

• A formula with no free variables is called a closed formula

• Example: x and y are bound variables, z is a free variable

∀x∀y(P(f(x)) → ¬(P(x)) → Q(f(y), x, z)))

### BNF for FOL Sentences

• S := <sentence>

• <sentence> := <atomicsentence>
| <sentence> <connective> <sentence>
| <quantifier> <variable> ,... <sentence>
| ¬ <sentence>
| ( <sentence> )

• <atomicsentence> := <predicate> ( <term> , ... )

• <term> := <function> ( <term> , ... )
| <constant>
| <variable>

• <connective> := ∧ | v | → | ↔

• <quantifier> := ∃ | ∀

• <constant> := “c" | “x1" | “john" | ...

• <variable> := "a" | "x" | "s" | ...

• <predicate> := “before" | “hasColor" | “raining" | ...

• <function> := “mother" | “leftLegOf" | ...

### Semantics

• Interpretations

• Models and Satisfiability

• Logical Consequence (Entailment)

### Semantics – Overview

• Interpretation – Maps symbols of the formal language (predicates, functions, variables, constants) onto objects, relations, and functions of the “world” (formally: Domain, relational Structure, or Universe)

• Valuation – Assigns domain objects to variables

• The Valuation function can be used for describing value assignments and constraints in case of nested quantifiers.

• The Valuation function otherwise determines the satisfaction of a formula only in case of open formulae.

• Constructive Semantics – Determines the semantics of complex expressions inductively, starting with basic expressions

### Interpretation

Domain, relational Structure, Universe

D          finite set of Objects      d 1, d 2, ... , d n

R,...      Relations over D       R ⊆ D n

F,...       Functions over D       F: D n → D

Basic Interpretation Mapping

constant            I [c] = d           Object

function            I [f] = F           Function

predicate          I [P] = R          Relation

Valuation V

variable V(x) = d D

Next, determine the semantics for complex terms and formulae constructively, based on the basic interpretation mapping and the valuation function above.

### Interpretation (cont’)

Terms with variables

I [f(t1,...,tn)) = I [f] (I [t1],..., I [tn]) = F(I [t1],..., I [tn]) ∈D

where I[ti] = V(ti) if ti is a variable

Atomic Formula

I [P(t1,...,tn)]           true if (I [t1],..., I [tn]) ∈ I [P] = R

Negated Formula

I [¬α]                      true if I [α] is not true

Complex Formula

I[α∨β]                    true if I [α] or I [β] true

I [α∧β]                   true if I [α] and I [β] true

I [α→β]                 if I [α] not true or I [β] true

### Interpretation (cont’)

Quantified Formula (relative to Valuation function)

 I [∃x:α] true if α is true with V’(x)=d for some d∈D where V’ is otherwise identical to the prior V. I [∀x:α] true if α is true with V’(x)=d for all d∈D and where V’ is otherwise identical to the prior V.

Note: ∀x∃y: α is different from ∃y∀x: α

In the first case ∀x∃y:α , we go through all value assignments V'(x), and for each value assignment V'(x) of x, we have to find a suitable value V'(y) for y.

In the second case ∃y∀x:α, we have to find one value V'(y) for y, such that all value assignments V'(x) make α true.

### Models and Satisfiability

Given is an interpretation I into a domain D with a valuation V, and a formula φ .

We say that:

φ is satisfied in this interpretation       or

this interpretation is a model of φ        iff

I[φ] is true.

That means the interpretation function I into the domain D (with valuation V) makes the formula φ true.

### Logical Consequence (Entailment)

Given a set of formulae Φ and a formula α .

α is a logical consequence of Φ iff

α is true in every model in which Φ is true.

Notation:

Φ α

That means that for every model (interpretation into a domain) in which Φ is true, α must also be true.

### Entailment and Derivation

• Entailment: KB ⊧ Q

• Entailment is a relation that is concerned with the semantics of statements

• Q is entailed by KB (a set of premises or assumptions) if and only if there is no logically possible world in which Q is false while all the premises in KB are true

• Stated positively: Q is entailed by KB if and only if the conclusion is true in every possible world in which all the premises in KB are true

• Provability: KB ⊢ Q

• Provability is a syntactic relation

• We can derive Q from KB if there is a proof consisting of a sequence of valid inference steps starting from the premises in KB and resulting in Q

### Important Properties for Inference

• Soundness: If KB ⊢ Q then KB ⊧ Q

• If Q is derived from a set of sentences KB using a set of inference rules, then Q is entailed by KB

• Hence, inference produces only real entailments, or any sentence that follows deductively from the premises is valid

• Only sentences that logically follow will be derived

• Completeness: If KB ⊧ Q then KB ⊢ Q

• If Q is entailed by a set of sentences KB, then Q can also be derived from KB using the rules of inference

• Hence, inference produces all entailments, or all valid senteces can be proved from the premises

• All the sentences that logically follow can be derived

### Inference rules for Predicate Logic

• Inference rules for propositional logic apply to propositional logic as well

• Modus Ponens, Modus tollens etc.

• New (sound) inference rules for use with quantifiers:

• Modus Ponens, Modus Tollens

• Universal elimination

• Existential elimination

• Existential introduction

• Generalized Modus Ponens (GMP)

### Modus Ponens and Modus Tollens

• Modus Ponens (Law of Detachment )

• Based on claiming 1.) that P is true, and 2.) the implication P Q , we can conclude that Q is true.

• If P, then Q. P, therefore, Q

• Example:

RegularlyAttends(joe, lecture),

RegularlyAttends(joe, lecture) → Pass(joe, lecture)

Allow us to conclude:

Pass(joe, lecture)

### Modus Ponens and Modus Tollens

• Modus Tollens (Denying the consequent)

• We again make two claims. 1.) The implication P Q, and 2.) that Q is false. We can conclude that P is false as well.

• If P , then Q . ¬Q Therefore, ¬P

• Example:

• Detects(alarm, intruder) → GoesOff(alarm),

• ¬GoesOff(alarm)

• Allows us to conclude:

• ¬Detects(alarm, intruder)

### Universal and Existential elimination

• Universal elimination

• If ∀x P(x) is true, then P(c) is true, where c is any constant in the domain of x

• Variable symbol can be replaced by any ground term

• Example:

• ∀x RegularlyAttends(x, lecture) → Pass(x, lecture)

• Allow us to conclude (assuming Joe is in the domain of x):

• RegularlyAttends(joe, lecture) → Pass(joe, lecture)

### Universal and Existential elimination

• Existential elimination

• From ∃x P(x) infer P(c)

• Note that the variable is replaced by a brand-new constant not occurring in this or any other sentence in the KB

• Also known as skolemization; constant is a skolem constant

• In other words, we don’t want to accidentally draw other inferences by introducing the constant

• Convenient to use this to reason about the unknown object, rather than constantly manipulating the existential quantifier

• Example:

• ∃x Pass(x)

• Allows us to conclude:

• Pass(someStudent) , where someStudent is a new constant

### Existential Introduction

• If P(c) is true, then ∃x P(x) is inferred.

• The inverse of existential elimination

• All instances of the given constant symbol are replaced by the new variable symbol

• Note that the variable symbol cannot already exist anywhere in the expression

• Example:

• Eats(Ziggy, IceCream)

• ∃x Eats(Ziggy,x)

### Generalized Modus Ponens (GMP)

• Apply modus ponens reasoning to generalized rules

• Combines Universal-Elimination, and Modus Ponens

• E.g, from P(c) and Q(c) and x (P(x) Q(x)) R(x) derive R(c)

• GMP requires substitutions for variable symbols

• subst(θ, α) denotes the result of applying a set of substitutions defined by θ to the sentence α

• A substitution list θ = {v1/t1, v2/t2, ..., vn/tn} means to replace all occurrences of variable symbol vi by term ti

• Substitutions are made in left-to-right order in the list

• subst ({x/IceCream, y/Ziggy}, eats(y,x)) = eats (Ziggy, IceCream)

### Generalized Modus Ponens (GMP)

• General definition: Given
• atomic sentences P1, P2, ..., PN
• implication sentence (Q1 ∧ Q2 ∧ ... ∧ QN) R
• Q1, ..., QN and R are atomic sentences
• substitution subst(θ, Pi) = subst(θ, Qi) for i=1,...,N
• Derive new sentence: subst( θ , R)
• GMP is usually written formally as following:

### Generalized Modus Ponens (GMP) - Example

• A clause is a disjunction of literals
• Example: C1 ∨ … ∨ Cn
• A definite clause is a clause containing exactly one positive literal
• Example: ¬C1 ∨ … ∨ ¬Cn ∨ H
• GMP is used with a collection of definite clauses
• Example:
Person(John) , Rich(x), ( Person(x) ∧ Rich(x) → Popular(x))
Popular(John)
With the substitution θ = {x/John,y/John} , and
Rich θ = Popular θ

### Completeness & Decidability (1)

• Completeness : If a knowledge-base KB entails a statement S , we can prove S

• Gödel Completeness Theorem : There exists a complete proof system for FOL

• KB ⊨ Q ↔ KB ⊢ Q

• Gödel proved that there exists a complete proof system for FOL.

• Gödel did not come up with such a concrete proof system.

• Robinson’s Completeness Theorem : Resolution is such a concrete complete proof system for FOL

### Completeness & Decidability (2)

• FOL is only semi-decidable : If a conclusion follows from premises, then a complete proof system (like resolution) will find a proof.

• If there’s a proof, we’ll halt with it (eventually)

• However, If there is no proof (i.e. a statement does not follow from a set of premises), the attempt to prove it may never halt

• From a practical point of view this is problematic

• We cannot distinguish between the non-existence of a proof or the failure of an implementation to simply find a proof in reasonable time.

• Theoretical completeness of an inference procedure does not make a difference in this cases

• Does a proof simply take too long or will the computation never halt anyway?

### Knowledge engineering in FOL

• Consider the following english text:

“Anyone passing his Intelligent System exam and winning the lottery is happy. But any student who studies for an exam or is lucky can pass all his exams. John did not study but John is lucky. Anyone who is lucky wins the lottery. Mary did not win the lottery, however Mary passed her IS exam. Gary won the lottery. Gary, John, and Mary are all students.„

• Is John happy?
• Is Mary happy? Is she lucky?
• Did every student pass the IS exam?

### Formalization in FOL

• Knowledge to be captured, questions to be answered

2. Assemble the relevant knowledge

• Relavant: Relation between exams, lottery jackpots, passing of an exam information about individuals, etc.

• Irrelevant: Dates of exams, teacher of the exam, amount of money in the lottery pot, etc

3. Decide on a vocabulary

• Predicates

• Constants symbols

• E.g. Win(x, Lottery) or Win(x, y) ∧ Lottery(y)?

### Formalization in FOL (cont’)

1. Encode general knowledge of the domain

• Anyone passing the IS exams and winning the lottery is happy

∀x Pass(x, IS) ∧ Win(x, Lottery) ⇒ Happy(x)

• Any student who studies or is lucky can pass all his exams

∀x ∀y Student(x) ∧ Exam(y) ∧ (StudiesFor(x, y) ∨ Lucky(x)) ⇒ Pass(x,y)

• Anyone who is lucky wins the lottery

∀x Lucky(x) ⇒ Win(x, Lottery)

### Formalization in FOL (cont’)

1. Encode the specific problem instance

• John did not study, but John is lucky

¬Study(John) ∧ Lucky(John)

• Mary did not win the lottery, however Mary passed her IS exam ¬ Win(Mary, Lottery)

Pass(Mary,IS)

• Gary won the lottery

Win(Gary, Lottery)

• Gary, John, and Mary are all students

Student(Gary), Student(John), Student(Mary)

### Formalization in FOL (cont’)

6. Pose queries to the inference procedure

• Is John happy?

Happy(John) ?

• Is Mary happy? Is she lucky?

Happy(Mary), Lucky(Mary)?

• Did every student pass the IS exam?

∃x Student(x) ∧ Pass(x, IS) ?

• Specific inference procedures (e.g. Resolution ) can be used to systematically answer those queries (see next lecture)

### Extensions

• Ordinary first-order interpretations have a single domain of discourse over which all quantifiers range; many-sorted first-order logic allows variables to have different sorts , which have different domains

• The characteristic feature of first-order logic is that individuals can be quantified, but not predicates; second-order logic extends first-order logic by adding the latter type of quantification

### Extensions (cont’)

• Intuitionistic first-order logic uses intuitionistic rather than classical propositional calculus; for example, ¬¬φ need not be equivalent to φ

• Infinitary logic allows infinitely long sentences; for example, one may allow a conjunction or disjunction of infinitely many formulas, or quantification over infinitely many variables

• First-order modal logic has extra modal operators with meanings which can be characterised informally as, for example "it is necessary that φ" and "it is possible that φ“

### Summary

• Predicate logic differentiates from propositional logic by its use of quantifiers

• Each interpretation of predicate logic includes a domain of discourse over which the quantifiers range.

• There are many deductive systems for predicate logic that are sound (only deriving correct results) and complete (able to derive any logically valid implication)

• The logical consequence relation in predicate logic is only semidecidable

• This lecture focused on three core aspects of the predicate logic: Syntax, Semantics, and Inference

### References

• M. Fitting: First-Order Logic and Automated Theorem Proving, 1996 Springer-Verlag New York (Chapter 5)

• Michael Huth and Mark Ryan, Logic in Computer Science(second edition) , 2004, Cambridge University Press

• http://plato.stanford.edu/entries/model-theory/

• http://en.wikipedia.org/wiki/First-order_logic

### Agenda

• Motivation

• Technical Solution

• Introduction to Theorem Proving and Resolution

• Description Logics

• Logic Programming

• Summary

• References

### Motivation

• Basic results of mathematical logic show:
• We can do logical reasoning with a limited set of simple (computable) rules in restricted formal languages like First-order Logic (FOL)
• Computers can do reasoning

• FOL is interesting for this purpose because:
• It is expressive enough to capture many foundational theorems of mathematics
• Many real-world problems can be formalized in FOL
• It is the most expressive logic that one can adequately approach with automated theorem proving techniques

### Motivation

• Due to its theoretical properties (decidability & complexity) First-Order Logic is not always an ideal solution
• This motivates research towards formalisms with more practically oriented computitional properties and expressivity
• Description Logics
• Syntactic fragments of FOL
• Focus on decidability and optimized algorithms key reasoning tasks (terminological reasoning / schema reasoning)
• Logic Programming
• Provides different expressivity than classical FOL (non-monotonic reasoning with non-classical negation)
• Provides a very intuitive way to model knowledge
• Efficient reasoning procedures for large data-sets

### Introduction – Basic Notions

• A proof system is collection of inference rules of the form:
 P 1 … P n ____________ C

where C is a conclusion sequent, and P i ‘s are premises sequents .

• If an infererence rule does not have any premises that are taken to be true (called an axiom ), its conclusion automatically holds.
• Example:
Modus Ponens: From P, P → Q infer Q,
Universal instantiation: From (∀x)p(x) infer p(A)
• Theorems:
• Expressions that can be derived from the axioms and the rules of inference.

### Resolution - Principle

• Resolution is a refutation system .

• To prove a statement we attempt to refute its negation

• Basic idea: Given a consistent set of axioms KB and goal sentence Q , we want to show that KB Q

• This means that every interpretation I that satisfies KB also satisfies Q

• Any interpretation I only satisfies either Q or ¬ Q, but not both

• Therefore, if in fact KB Q holds , an interpretation that satisfies KB , satisfies Q and does not satisfy ¬ Q

Hence KB { ¬ Q } is unsatisfiable, i.e., it is false under all interpretations

### Resolution - Principle

• Resolution commonly requires sentences to be in clause normal form (also called conjunctive normal form or CNF)
• A clause is a disjunction of literals (implicitly universally quantified)
• E.g. : l1 ∨ … ∨ ln
• A formula in clause normal form conjunction of a set of clauses
• E.g.: C1 ∧ … ∧ Cn = (ln ∨ … ∨ lm) ∧ … ∧ (li ∨ … ∨ lj)
• High level steps in a resolution proof:
• Put the premises or axioms into clause normal form (CNF)
• Add the negation of the to be proven statement, in clause form, to the set of axioms
• Resolve these clauses together, using the resolution rule , producing new clauses that logically follow from them
• Derive a contradiction by generating the empty clause.
• The substitutions used to produce the empty clause are those under which the opposite of the negated goal is true

### Resolution - The Resolution Rule

• The resolution rule is a single inference that produces a new clause implied by two clauses (called the parent clauses) containing complementary literals

• Two literals are said to be complements if one is the negation of the other (in the following, ai is taken to be the complement to bj).
• The resulting clause contains all the literals that do not have complements. Formally:

• Where a1, …, an, b1, …, bm are literals, and ai is the complement to bj
• The clause produced by the resolution rule is called the resolvent of the two input clauses.
• A literal and its negation produce a resolvent only if they unify under some substitution σ.
• σ Is then applied to the resolvent

### Resolution - The Resolution Rule

• When the two initial clauses contain more than one pair of complementary literals, the resolution rule can be applied (independently) for each such pair

• However, only the pair of literals that are resolved upon can be removed: All other pairs of literals remain in the resolvent clause

• Modus ponens (see last lecture) can be seen as a special case of resolution of a one-literal clause and a two-literal clause

• Example: Given clauses

• P(x, f(a)) v P(x, f(y)) v Q(y) and ¬P(z, f(a)) v ¬ Q(z)

• P(x, f(a)) and ¬P(z, f(a)) unify with substitution σ = {x/z}

• Therefore, we derive the resolvent clause (to which σ is applied):

• P(z, f(y)) v Q(y) v ¬ Q(z)

### Resolution – Complete Resolution Algorithm

1. Convert all the propositions of a knowledge base KB to clause normal form (CNF)

2. Negate Q (the to be proven statement) and convert the result to clause form. Add it to the set of clauses obtained in step 1.

3. Repeat until either a contradiction is found, no progress can be made, or a predetermined amount of effort has been expended:

a) Select two clauses. Call these the parent clauses.

b) Resolve them together using the resolution rule . The resolvent will be the disjunction of all the literals of both parent clauses with appropriate substitutions.

c) If the resolvent is the empty clause , then a contradiction has been found. If it is not, then add it to the set of clauses available to the procedure.

### Resolution – Algorithm in Pseudocode

 procedure resolution-refutation(KB, Q) ;; KB a set of consistent, true sentences ;; Q is a goal sentence that we want to derive ;; return success if KB ⊢ Q, and failure otherwise KB = union(KB, ¬Q) while false not in KB do pick 2 sentences, S1 and S2, in KB that contain literals that unify if none return "failure“ resolvent = resolution-rule(S1, S2) KB = union(KB, resolvnt) return "success" 

### Resolution - Converting to Clause Normal Form

• Resolution expects input in clause normal form
• Step 1: Eliminate the logical connectives → and ↔
• a b = (a b) (b a)
• a b = ¬ a ∨ b
• Step 2: Reduce the scope of negation
• ¬ ( ¬ a) = a
• ¬ (a b) = ¬ a ¬ b
• ¬ (a b) = ¬ a ¬ b
• ¬ ( X) a(X) = ( X) ¬ a(X)
• ¬ ( X) b(X) = ( X) ¬ b(X)

### Resolution - Converting to Clause Normal Form

• Step 3: Standardize by renaming all variables so that variables bound by different quantifiers have unique names

• ( X) a(X) ( X) b(X) = ( X) a(X) ( Y) b(Y)

• Step 4: Move all quantifiers to the left to obtain a prenex normal form

• A formula is inn prenex normal form if it is written as a string of quantifiers followed by a quantifier-free part

• Step 5: Eliminate existential quantifiers by using skolemization

### Resolution - Converting to Clause Normal Form

• Step 5: Skolemization
• Skolem constant
• (∃X)(dog(X)) may be replaced by dog(fido) where the name fido is picked from the domain of definition of X to represent that individual X.
• Skolem function
• If the predicate has more than one argument and the existentially quantified variable is within the scope of universally quantified variables, the existential variable must be a function of those other variables.
(∀X)(∃Y)(mother(X,Y)) ⇒ (∀X)mother(X,m(X))
(∀X)(∀Y)(∃Z)(∀W)(foo (X,Y,Z,W))
⇒ (∀X)(∀Y)(∀W)(foo(X,Y,f(X,Y),w))

### Resolution - Converting to Clause Normal Form

• Step 6: Drop all universal quantifiers
• Step 7: Convert the expression to the conjunctions of disjuncts
(a ∧ b) ∨ (c ∧ d)
= (a ∨ (c ∧ d)) ∧ (b ∨ (c ∧ d))
= (a ∨ c) ∧ (a ∨ d) ∧ (b ∨ c) ∧ (b ∨ d)
• step 8: Split each conjunction into a separate clauses, which are just a disjunction of negated and nonnegated predicates, called literals
• step 9: Rename so that no variable symbol appears in more than one clause.
(∀X)(a(X) ∧ b(X)) = (∀X)a(X) ∧ (∀ Y)b(Y)

### Resolution - Complete Example

• (Nearly) Classical example: Prove “Fido will die.” from the statements
• “Fido is a dog.”
• “All dogs are animals.”
• “All animals will die.”
• Changing premises to predicates
• (x) (dog(X) animal(X))
• dog(fido)
• Modus Ponens and {fido/X}
• animal(fido)
• (Y) (animal(Y) die(Y))
• Modus Ponens and {fido/Y}
• die(fido)

### Resolution – Complete Example

• Equivalent proof by Resolution
• Convert predicates to clause normal form
 Predicate form Clause form 1. ∀ (x) (dog(X) → animal(X)) ¬ dog(X) ∨ animal(X) 2. dog(fido) dog(fido) 3. ∀ (Y) (animal(Y) → die(Y)) ¬ animal(Y) ∨ die(Y)
• Negate the conclusion
4. ¬ die(fido)                      ¬ die(fido)

### Resolution – Further Considerations

• Resolution can be used to establish that a given sentence is entailed by a set of axioms
• However, it cannot , in general, be used to generate all logical consequences of a set axioms
• Also, the resolution cannot be used to prove that a given sentence is not entailed by a set of axioms

• Resolution defines a search space
• The decision which clauses will be resolved against which others define the operators in the space
• A search method is required

• Worst case: Resolution is exponential in the number of clauses to resolve

### Resolution – Strategies

• Order of clause combination is important

• N clauses N2 ways of combinations or checking to see whether they can be combined

• Search heuristics are very important in resolution proof procedures

• Strategies

• Set of Support Strategy

• Unit Preference Strategy

• Linear Input Form Strategy

### Concrete System: Prover9

• First-Order theorem prover
• Homepage: http://www.cs.unm.edu/~mccune/prover9/
• Successor of the well known “Otter” theorem prover
• Under active development, available for several platforms and released under the GPL
• Graphical user-interface and extensive documentation make Prover 9 comparatively user friendly
• Core of system builds on resolution / paramodulation as inference method
• Prover9 works in parallel with external component „Mace4“
• Mace4 searches for finite models and counterexamples
• This helps to avoid wasting time searching for a proof by first finding a counterexample

### Limitations of First-OrderTheorem Proving

• Entailment: KB ⊧ Q
• Entailment is a relation that is concerned with the semantics of statements
• Q is entailed by KB (a set of premises or assumptions) if and only if there is no logically possible world in which Q is false while all the premises in KB are true

• Entailment for FOL is only semi-decidable : If a conclusion follows from premises, then a complete proof system (like resolution) will find a proof.
• If there’s a proof, we’ll halt with it (eventually)
• However, If there is no proof (i.e. a statement does not follow from a set of premises), the attempt to prove it may never halt

### Limitations of First-OrderTheorem Proving

• From a practical point of view this is problematic
• We cannot distinguish between the non-existence of a proof or the failure of an implementation to simply find a proof in reasonable time.
• Theoretical completeness of an inference procedure does not make a difference in this cases
• Does a proof simply take too long or will the computation never halt anyway?

• Due to its complexity and remaining limitations FOL is often not suitable for practical applications

• Often less expressive (but decideable) formalisms or formalisms with different expressivity are more suitable:
• Description Logics
• Logic Programming

### Description Logic

• Most Description Logics are based on a 2-variable fragment of First Order Logic
• Classes (concepts) correspond to unary predicates
• Properties correspond to binary predicates
• Restrictions in general:
• Quantifiers range over no more than 2 variables
• Transitive properties are an exception to this rule
• No function symbols (decidability!)
• Most DLs are decidable and usually have decision procedures for key reasoning tasks
• DLs have more efficient decision problems than First Order Logic
• We later show the very basic DL ALC as example
• More complex DLs work in the same basic way but have different expressivity

### Description Logic Syntax

• Concepts/classes (unary predicates/formulae with one free variable)

• E.g. Person, Female

• Roles (binary predicates/formulae with two free variables)

• E.g. hasChild

• Individuals (constants)

• E.g. Mary, John

• Constructors allow to form more complex concepts/roles

• Union ⊔: Man ⊔ Woman

• Intersection ⊓ : Doctor ⊓ Mother

• Existential restriction ∃: ∃hasChild.Doctor (some child is a doctor)

• Value(universal) restriction ∀: ∀hasChild.Doctor (all children are doctors)

• Complement /negation¬: Man ⊑ ¬Mother

• Number restriction ≥n, ≤n

• Axioms

• Subsumption ⊑ : Mother ⊑ Parent

### Description Logic Syntax - Concepts

• Classes/concepts are actually a set of individuals

• We can distinguish different types of concepts:

• Atomic concepts: Cannot be further decomposed (i.e. Person)

• Incomplete concepts (defined by ⊑)

• Complete concepts (defined by ≡)

• Example incomplete concept defintion:

• Man ⊑ Person ⊓ Male

• Intended meaning: If an individual is a man, we can conclude that it is a person and male.

• Man(x) ⇒ Person(x) ∧ Male(x)

• Example complete concept definition:

• Man ≡ Person ⊓ Male

• Intended meaning: Every individual which is a male person is a man, and every man is a male person.

• Man(x) ⇔ Person(x) ∧ Male(x)

### Description Logic Syntax- Roles

• Roles relate to individuals to each other
• I.e. directedBy(Pool Sharks, Edwin Middleton), hasChild(Jonny, Sue)

• Roles have a domain and a range

• Example:
• Domain(directedBy, Movie)
• Range(directedBy, Person)
• Given the above definitions we can conclude that Pool Sharks is a move and that Edwin Middleton is (was) a person.

• Functional Roles
• Roles which have exactly one value
• Usually used with primitive datavalues
• A special case of (unqualified) number restriction ≤1 R

### Description Logic Syntax- Roles

• Transitive Roles
• Example: hasAncestor
Simple in a rule language: hasAncestor(X,Z) :- hasAncestor(X,Y), hasAncestor(Y,Z).
• Requires more than one variable!
• Transitivity can be captured in DLs by role hierarchies and transitive roles:

• Symmetric Roles
• Roles which hold in both directions
• I.e. hasSpouse, hasSibling
• Inverse Roles
• Roles are directed, but each role can have an inverse
• I.e. hasParent ≡ hasChild
hasChild(X,Y) hasChild(Y,X)

### Description Logic Knowledge Bases

• Typically a DL knowledge base (KB) consists of two components
• Tbox (terminology): A set of inclusion/equivalence axioms denoting the conceptual schema/vocabulary of a domain
• Bear ⊑ Animal ⊓ Large
• transitive(hasAncestor)
• hasChild ≡ hasParent

• Abox (assertions): Axioms, which describe concrete instance data and holds assertions about individuals
• hasAncestor(Susan, Granny)
• Bear(Winni Puh)

• From a theoretical point of view this division is arbitrary but it is a useful simplification

### A basic Description Logic - ALC

• Smallest propositionally closed DL is ALC
• Only atomic roles
• Concept constructors: ⊔ , ⊓, ¬
• Restricted use of quantifiers: ∃, ∀

• “Propositionally closed” Logic in general:
• Provides (implicitly or explicitly) conjunction, union and negation of class descriptions

• Example:
• Person ⊓ ∀hasChild.(Doctor ⊔ ∃hasChild.Doctor)

### A basic Description Logic - ALC

• What can we express in ALC?
• ALC concept descriptions can be constructed as following:

### A basic Description Logic - ALC

• Individual assertions :
• a ∈ C
• Mary is a Woman.

• Role assertions:
• 〈a, b〉 ∈ R
• E.g. Marry loves Peter.

• Axioms:
• C ⊑ D
• C ≡ D, because C ≡ D ⇔ C ⊑ D and D ⊑ C
• E.g.: A Dog is an animal. A man is a male Person.

### The Description Logic Family

• Description Logics are actually a family of related logics
• Difference in expressivity and features, as well as complexity of inference
• Description Logics follow a naming schema according to their features
• ALC = Attributive Language with Complements
• S often used for ALC extended with transitive roles
• Additional letters indicate other extensions, e.g.:
• H for role hierarchy
• O for nominals, singleton classes
• I for inverse roles (e.g., isChildOf ≡ hasChild–)
• N for number restrictions
• Q for qualified number restrictions
• F for functional properties
• R for limited complex role inclusion axioms, role disjointness
• (D) for datatype support

### Description Logic - Semantics

• Semantics follow standard FOL model theory

• Description Logics are a fragment of FOL

• The vocabulary is the set of names (concepts and roles) used

• I.e. Mother, Father, Person, knows, isRelatedTo, hasChild, …

• An interpretation I is a tuple (∆I, ⋅I)

• I is the domain (a set)

• I is a mapping that maps:

• Names of objects (individuals) to elements of the domain

• Names of unary predicates (classes/concepts) to subsets of the domain

• Names of binary predicates (properties/roles) to subsets of ∆I x ∆I

### Description Logic - Semantics

• The semantics of DL are based on standard First Order Model theory

• A translation is usually very straightforward, according to the following correspondences (for ALC):

• A description is translated to a first-order formula with one free variable

• An individual assertion is translated to a ground atomic formula

• An axiom is translated to an implication, closed under universal implication

• More complex DLs can be handled in a similar way

### Description Logic - Semantics

• Mapping ALC to First Order Logic:

### Description Logic - Reasoning

• Main reasoning tasks for DL systems:

• Satisfiability : Check if the assertions in a KB have a model

• Instance checking : Check if an instance belongs to a certain concept

• Concept satisfiability : Check if the definition of a concept can be satisfied

• Subsumption : Check if a concept A subsumes a concept B (if every individual of a concept B is also of concept A)

• Equivalence : A ≡ B ⇔ B ⊑ A and A ⊑ B

• Retrieval : Retrieve a set of instances that belong to a certain concept

### Description Logic - Reasoning

• Reasoning Task are typically reduced to KB satisfiability sat(A) w.r.t. to a knowledge base A

• Instance checking : instance(a,C, A) ⇔¬sat(A ⋃ {a: ¬ C})

• Concept satisfiability : csat(C) ⇔ sat(A ⋃ {a: ¬ C})

• Concept subsumption : B ⊑ A ⇔ A ⋃ {¬B ⊓ C} is not satisfiable ⇔ ¬sat(A ⋃ {¬B ⊓ C})

• Retrieval : Instance checking for each instance in the Abox

• Note: Reduction of reasoning tasks to one another in polynomial time only in propositionally closed logics

• DL reasoners typically employ tableaux algorithms to check satisfiability of a knowledge base

### Concrete System: Pellet

• Description Logic reasoner
• Homepage: http://clarkparsia.com/pellet
• Written in Java and available from Dual-licensed AGPL license for open-source applications
• Proprietary license available for commercial applications
• Sound and complete reasoner aimed at OWL-DL inference based on tableaux procedure
• Covers all constructs in OWL-DL
• Supporting major part of OWL 2 specification
• Integrates with popular toolkits and editors
• E.g. Jena, Protege, TopBraid Composer
• Comprehensive hands-on tutorial available:
• http://clarkparsia.com/pellet/tutorial/iswc09

### Concrete System: Pellet

• Pellet supports expected standard DL reasoning tasks

• Consistency, classification, realization

• Concept satisfiability

• SPARQL-DL queries

• Datatype reasoning

• User-defined datatypes

• N-ary datatype predicates

• Rules support (DL-safe SWRL rules)

• Explanation and debugging features

• Axiom pinpointing service

### Concrete System: Pellet

• Explanation & Debugging support

• Motivation: It is hard to understand large and/or complex ontologies

• Examples:

• Why is a specific subclass relation inferred?

• Why is an ontology inconsistent?

• Pellet provides axiom pinpointing service:

• For any inference, returns the (minimal set of) source axioms that cause the inference

• Applications can track additional provenance information

• Axiom pinpointing is the first step to explanations

• Precise justifications (pinpoint parts of axioms) are ongoing work in Pellet‘s developement

### Concrete System: Pellet

• Pellet is integrated in OwlSight and performs classification of loaded ontologies
• Lightweight , web-based ontology browser: http://pellet.owldl.com/owlsight/

### Concrete System: Pellet

• Classification Results in the Protege Editor

### Logic Programming

• What is Logic Programming?
• Various different perspectives and definitions possible:
• Computations as deduction
• Use formal logic to express data and programs
• Theorem Proving
• Logic programs evaluated by a theorem prover
• Derivation of answer from a set of initial axioms
• High level (non-precedural) programming language
• Logic programs do not specifcy control flow
• Instead of specifying how something should be computed, one states what should be computed
• Procedural interpretation of a declarative specification of a problem
• A LP systems procedurally interprets (in some way) a general declarative statement which only defines truth conditions that should hold

### Logic Programming Basics

• Logic Programming is based on a subset of First Order Logic called Horn Logic

• Horn Logic can serve as a simple KR formalism and allows to express

• IF <condition> THEN <result> rules

• Under certain restrictions reasoning over knowledge bases based on such rules is decideable (in contrast to general ATP within First Order Logic)

### Logic Programming Basics – Horn Logic

• Syntactically a LP rule is a First Order Logic Horn Clause
• The semantics of LP are different form the standard Tarski style FOL semantics (details later)
• Herbrand semantics differ from FOL smeantics in the structures it considers to be models (details later)
→ Minimal model semantics
• A FOL Horn clause is a disjunction of literals with one positive literal, with all variables universally quantified:
• (∀) ¬C1 ∨ … ∨ ¬Cn ∨ H
• This can be rewritten to closer correspond to a rule-like form:
• (∀) C1 ∧ … ∧ Cn → H
• In LP systems usually the following (non First Order Logic) syntax is used:
• H :- C1,...,Cn
• Such rules can be evaluated very efficiently , since the resolution of two Horn clauses is a again a Horn clause

### Logic Programming – Syntax

• The LP vocabulary consists of:
• Constants: b, cow, “somestring”
• Predicates: p, loves
• Function symbols: f, fatherOf
• Variables: x, y
• Terms can be:
• Constants
• Variables
• Constructed terms (i.e. function symbol with arguments)
• Examples:
• cow, b, Jonny,
• loves(John)

### Logic Programming – Syntax

• From terms and predicates we can build atoms:
• For n-ary predicate symbol p and terms t1, ..., tn, p(t1, ..., tn) is an atom
• A ground atom is an atom without variables
• Examples:
• p(x)
• loves(Jonny, Mary), worksAt(Jonny, SomeCompany)
• worksAt(loves(Mary), SomeCompany)
• Literals
• A literal is a an atom or its negation
• A positive literal is an atom
• A negative literal is a negated atom
• A ground literals is a literal without variables

### Logic Programming – Syntax

• Rules
• Given a rule of the form H :- B1,...,Bn we call
• H the head of the rule (its consequent)
• B1 …. Bn the body of the rule (the antecedent or conditions)
• The head of the rule consists of one positive literal H
• The body of the rule consists of a number of literals B1, ..., Bn
• B1, …, Bn are also called subgoals
• Examples:
• parent(x) :- hasChild(x,y)
• father(x) :- parent(x), male(x)
• hasAunt(z,y) :- hasSister(x,y), hasChild(x,z)

### Logic Programming – Syntax

• Facts denote assertions about the world:
• A rule without a body (no conditions)
• A ground atom
• Examples:
• hasChild(Jonny, Sue)
• Male(Jonny)).
• Denoted as a rule without a head:
• ?- B1,...,Bn.
• Examples:
• ? - hasSister(Jonny,y), hasChild(Jonny , z) gives all the sisters and children of Jonny
• ? - hasAunt(Mary,y) gives all the aunts of Mary
• ?- father(Jonny) ansers if Jonny is a father

### Logic Programming – Semantics

• There are two main approaches to define the semantics of LP
1. Model theoretic semantics
2. Computional semanitcs
• Model-theoretic semantics
• Defines the meaning of a model in terms of its minimal Herbrand model
• Computational semantics (proof theoretic semantics)
• Define the semantics in terms of an evaluation strategy which describes how to compute a model
• These two semantics are different in style, but agree on the minimal model
• LP semantics is only equivalent to standard FOL semantics
• Concerning ground entailment
• As long as LP is not extended with negation

### Logic Programming – Semantics

• Semantics of LP vs Semantics of FOL
• Semantics LP defined in terms of minimal Herbrand model
• Only one minimal model
• Semantics FOL defined in terms of First-Order models
• Typically, infinitely many First-Order models
• The minimal Herbrand model is a First-Order model
• Every Herbrand model is a First-Order model
• There exist First-Order models which are not Herbrand models
• Example:
• p(a), p(x) → q(x)
• The Minimal Herbrand model: {p(a), q(a)}
• This is actually the only Herbrand model!
• First-Order models: {p(a), q(a)}, {p(a), q(a), p(b)}, etc.

### Logic Programming – Semantics

• Recall:
• Terms not containing any variables are ground terms
• Atoms not containing any variables are ground atoms
• The Herbrand Universe U is the set of all ground terms which can be formed from
• Constancts in a program
• Function symbols in a program
• Example: a, b, c, f(a)
• The Herbrand Base B is the set of all ground atoms which can be built from
• Predicate symbols in a program
• Ground terms from U
• Example: p(a), q(b), q(f(a))

### Logic Programming – Semantics

• A Herbrand Interpretation I is a subset of the Herbrand Base B for a program

• The domain of a Herbrand interpretation is the Herbrand Universe U

• Constants are assigned to themselves

• Every function symbol is interpreted as the function that applies it

• If f is an n-ary function symbol (n>0) then the mapping from Un to U defined by (t1, …, tn) → f(t1, …, tn) is assigned to f

### Logic Programming – Semantics

• A Herbrand Model M is a Herbrand Interpretation which makes every formula true, so:
• Every fact from the program is in M
• For every rule in the program: If every positive literal in the body is in M, then the literal in the head is also in M
• The model of a Logic Program P is the minimal Herbrand Model
• This least Herbrand Model is the intersection of all Herbrand Models
• Every program has a Herbrand Model. Thus every model also has a minimal Herbrand Model.
• This model is uniquely defined only for programs without negation
⇒ A very intuitive and easy way to capture the semantics of LP
• As soon as negation is allowed a unique minimal model is not guaranteed anymore

### Logic Programming - Negation

• How do we handle negation in Logic Programs?
• Horn Logic only permits negation in limited form
• Consider (∀) ¬C1 ∨ … ∨ ¬Cn ∨ H
• Special solution: Negation-as-failure (NAF):
• Whenever a fact is not entailed by the knowledge base, its negation is entailed
• This is a form of “Default reasoning”
• This introduces non-monotonic behavior (previous conclusions might need to be revised during the inference process)
• NAF is not classical negation and pushes LP beyond classical First Order Logic
• This allows a form of negation in rules:
• (∀) C1 ∧ … ∧ Ci ∧ not Cn → H
• H :- B1, … Bi, not Bn

### Logic Programming - Negation

• Logic Programs can also contain recursion

• Example:
• ancestor (x,y) :- hasParent(x, y)
• ancestor(x,z) :- ancestor(x,y), ancestor(y,z).

• This is a problem as soon as negation is allowed since the minimal model is not uniquely defined anymore

• It is useful to consider this using a dependency graph
• A predicate is a node in the graph
• There is a directed edge between predicates q and p if they occur in a rule where q occurs in the head and p in the body.
• If the dependency graph contains a cycle then the program is recursive

### Logic Programming - Negation

• As soon as negation is allowed, cycles in a dependency graph become problematic.
• Example: What is the meaning of win(x) :- not win(x) ?

• A Solution: Stratification
• Mark edges with negation in the dependency graph
• Separate predicates which are connected through a positive edge in a individual stratum
• Strata can be (partially) ordered
• If each predicate occurs only in one stratum, then the program is called stratifiable
• Each stratum can be evaluated as usual and independently from other strata

→ This guarantees a unique interpretation of a Logic Program using negation

### Logic Programming - Subsets

• Classical Logic Programming
• Allows function symbols
• Does not allow negation
• Is turing complete

• Full Logic Programming is not decideable
• Prolog programs are not guaranteed to terminate

• Several ways to guarantee the evaluation of a Logic Program
• One is to enforce syntactical restrictions
• This results in subsets of full logic programming
• Datalog is such a subset

### Logic Programming - Datalog

• Datalog is a syntactic subset of Prolog
• Originally a rule and query language for deductive databases

• Considers knowledge bases to have two parts
• Extensional Database (EDB) consists of facts

• Intentional Database(IDB) consists of non-ground rules

• Restrictions:

1. Datalog disallows function symbols

2. Imposes stratification restrictions on the use of recursion + negation

3. Allows only range restricted variables ( safe variables )

• Safe Variables:
• Only allows range restricted variables, i.e. each variable in the conclusion of a rule must also appear in a not negated clause in the premise of this rule.
• This limits evaluation of variables to finitely many possible bindings

### Logic Programming - Reasoning Tasks

• Ground queries, i.e. ?- loves(Mary, Joe)
• Non-ground query, i.e. ?- loves(Mary, x)

• Non-ground queries can be reduced to a series of ground queries
• ?- loves(Mary, x)
• Replace x by every possible value

• In Logic Programming ground queries are equivalent to entailment of facts
• Answering ?- loves(Mary, Joe) w.r.t. a knowledge base A is equivalent to checking

A ⊧ loves(Mary, Joe)

### Concrete Logic Programming System: IRIS

• Java based Datalog reasoner
• Developed at STI Innsbruck
• Freely available open source project
• Homepage: http://www.iris-reasoner.org/

• Extensions:
• Stratified / Well-founded default negation
• XML Schema data types
• Various built-in predicates (Equality, inequality, assignment, unification, comparison, type checking, arithmetic, regular expressions,... )

• Highly modular and includes different reasoning strategies
• Bottom-up evaluation with Magic Sets optimizations (forward-chaining)
• Top-down evaluation by SLDNF resolution (backward-chaining)

### Concrete Logic Programming System: IRIS

 An example of a concrete combination of components within IRIS: Program Optimization Rewriting techniques from deductive DB research (e.g. Magic sets rewriting) Safety Processing & Stratification Ensure specific syntactic restrictions Rule Re-ordering Minimize evaluation effort based on dependencies between expressions Rule Optimizations Join condition optimization Literal re-ordering Rule compilation Pre-indexing Creation of „views“ on required parts of information

### Concrete Logic Programming System: IRIS

• Example program (also see online demo at http://www.iris-reasoner.org/demo):
man('homer').
woman('marge').
hasSon('homer','bart').
isMale(?x) :- man(?x).
isFemale(?x) :- woman(?x).
isMale(?y) :- hasSon(?x,?y).
• Query:
?-isMale(?x).
• Output:
Init time: 14ms
----------------------------------
Query: ?- isMale(?x). ==>> 2 rows in 1ms
Variables: ?x
('bart')
('homer')

### Concrete Logic Programming System: IRIS

• IRIS performing computations using semi-naiv evaluation in combination with stratification and complex datatypes

### Theorem Proving Summary

• Resolution is a sound and complete inference procedure for First-Order Logic

• Due to its complexity and remaining limitations FOL is often not suitable for practical applications

• Often formalisms expressivity and complexity results are more appropriate:

• Description Logics

• Logic Programming

### Description Logic Summary

• Description Logics are a syntactic fragment of First Order Logic, based on basic building blocks
• Concepts
• Roles
• Individuals

• Limited constructs for building complex concepts, roles, etc.
• Many different Description Logics exist, depending on choice of constructs

• Inference in Description Logics focuses on consistency checking and classification
• Usually reasoning tasks in DLs can all be reduced to satisfiablity checking

• Efficient Tbox (schema) reasoning
• ABox reasoning (query answering) do not scale so well

### Logic Programming Summary

• Logic Programming (without negation) is syntactically equivalent to Horn subset of First Order Logic
• The semantics of a Logic Program are however based on its minimal Herbrand Model

• Logic Programming comes in various variants for different applications (as programming language, for knowledge representation)
• Full Logic Programming including function symbols is not decidable
• Datalog is a syntactic restriction of LP, with desirable computational properties
• Negation-as-failure introduced non-monotonic behavior and pushes LP beyond the expressivity of First Order Logic

• A typical inference task for LP engines is conjunctive query answering

### References and Further Information

• Schöning, U., Logic for Computer Scientists (2nd edition), 2008, Birkhäuser
• Chapter 1 & 2:Normal Forms, Resolution
• Chapter 3: Horn Logic, Logic Programming
• Baader, F. et al., The Description Logic Handbook, 2007, Cambridge University Press
• Chapter 2: Basic Description Logics
• Lloyd, J.W. , Foundations of logic programming, 1984, Springer
• Robinson, A. and Voronkov, A. Handbook of Automated Reasoning, Volume I, 2001, MIT Press
• Chapter 2: Resolution Theorem Proving
• Ullman, J. D., Principles of Database and Knowledge-Base Systems, Volume I, 1988, Computer Science Press
• Chapter 3: Logic as a Data Model (Logic Programming & Datalog)
• http://en.wikipedia.org/wiki/Logic_programming
• http://en.wikipedia.org/wiki/Description_logic
• http://en.wikipedia.org/wiki/Theorem_proving

### Outline

• Motivation
• Technical Solution
• Uninformed Search
• Depth-First Search
• Informed Search
• Best-First Search
• Hill Climbing
• A*
• Illustration by a Larger Example
• Extensions
• Summary

### Motivation

• One of the major goals of AI is to help humans in solving complex tasks
• How can I fill my container with pallets?
• Which is the shortest way from Milan to Innsbruck?
• Which is the fastest way from Milan to Innsbruck?
• How can I optimize the load of my freight to maximize my revenue?
• How can I solve my Sudoku game?
• What is the sequence of actions I should apply to win a game?
• Sometimes finding a solution is not enough, you want the optimal solution according to some “cost” criteria
• All the example presented above involve looking for a plan
• A plan that can be defined as the set of operations to be performed of an initial state, to reach a final state that is considered the goal state
• Thus we need efficient techniques to search for paths, or sequences of actions, that can enable us to reach the goal state, i.e. to find a plan
• Such techniques are commonly called Search Methods

### Examples of Problems: Towers of Hanoi

 3 pegs A, B, C 3 discs represented as natural numbers (1, 2, 3) which correspond to the size of the discs The three discs can be arbitrarily distributed over the three pegs, such that the following constraint holds: d i is on top of d j → d i < d j Initial status: ((123)()()) Goal status: (()()(123)) Operators: Move disk to peg Applying: Move 1 to C (1 → C) to the initial state ((123)()()) a new state is reached ((23)()(1)) Cycles may appear in the solution!

### Examples of Problems: Blocksworld

 Objects: blocks Attributes (1-ary relations): cleartop(x), ontable(x) Relations: on(x,y) Operators: puttable(x) where x must be cleartop; put(x,y), where x and y must be cleartop Initial state: ontable(E), cleartop(E) ontable(A), cleartop(A) ontable(B), cleartop(B) ontable(C) on(D,C), cleartop (D) Applying the move put(E,A): on(E,A), cleartop(E) ontable(A) ontable(B), cleartop(B) ontable(C) on(D,C), cleartop (D)

### Search Space Representation

 Representing the search space is the first step to enable the problem resolution Search space is mostly represented through graphs A graph is a finite set of nodes that are connected by arcs A loop may exist in a graph, where an arc lead back to the original node In general, such a graph is not explicitly given Search space is constructed during search

### Search Space Representation

 A graph is undirected if arcs do not imply a direction, direct otherwise A graph is connected if every pair of nodes is connected by a path A connected graph with no loop is called tree A weighted graph , is a graph for which a value is associated to each arc

### Example: Towers of Hanoi*

* A partial tree search space representation

### Example: Towers of Hanoi*

* A complete direct graph representation
[http://en.wikipedia.org/wiki/Tower_of_Hanoi]

### Search Methods

• A search method is defined by picking the order of node expansion
• Strategies are evaluated along the following dimensions:
• completeness: does it always find a solution if one exists?
• time complexity: number of nodes generated
• space complexity: maximum number of nodes in memory
• optimality: does it always find the shortest path solution?
• Time and space complexity are measured in terms of
• b: maximum branching factor of the search tree
• d: depth of the shortest path solution
• m: maximum depth of the state space (may be ∞)

### Search Methods

• Uninformed techniques
• Systematically search complete graph, unguided
• Also known as brute force, naïve, or blind
• Informed methods
• Use problem specific information to guide search in promising directions

### Uninformed Search

• A class of general purpose algorithms that operates in a brute force way
• The search space is explored without leveraging on any information on the problem
• Also called blind search, or naïve search
• Since the methods are generic they are intrinsically inefficient

• E.g. Random Search
• This method selects randomly a new state from the current one
• If the goal state is reached, the search terminates
• Otherwise the methods randomly select an other operator to move to the next state
• Prominent methods:
• Depth-First Search
• Uniform-Cost Search

### Depth-First Search

• Depth-First Search (DFS) begins at the root node and exhaustively search each branch to it maximum depth till a solution is found
• The successor node is selected going in depth using from right to left (w.r.t. graph representing the search space)
• If greatest depth is reach with not solution, we backtrack till we find an unexplored branch to follow
• DFS is not complete
• If cycles are presented in the graph, DFS will follow these cycles indefinitively
• If there are no cycles, the algorithm is complete
• Cycles effects can be limited by imposing a maximal depth of search (still the algorithm is incomplete)
• DFS is not optimal
• The first solution is found and not the shortest path to a solution
• The algorithm can be implemented with a Last In First Out (LIFO) stack or recursion

### Depth-First Search: Algorithm

List open, closed, successors={};
Node root_node, current_node;
insert-first( root_node,open)

while not-empty (open );

current_node= remove-first(open);
insert-first ( current_node,closed);
if ( goal (current_node)) return current_node;
else
successors= successorsOf (current_node);
for(x in successors)
if( not-in (x,closed)) insert-first (x,open);
endIf
endWhile

N.B.= this version is not saving the path for simplicity

### Depth-First Search: Example

Result is: S->A->B->F

### Depth-First Search: Complexity

 Time Complexity assume (worst case) that there is 1 goal leaf at the RHS so DFS will expand all nodes =1 + b + b2+ ......... + bm = O (b m ) where m is the max depth of the tree Space Complexity how many nodes can be in the queue (worst-case)? at each depth l < d we have b-1 nodes at depth m we have b nodes total = (d-1)*(b-1) + b = O(bm)

• Breadth-First Search (BFS) begins at the root node and explore level-wise al the branches
• BFS is complete
• If there is a solution, BFS will found it
• BFS is optimal
• The solution found is guaranteed to be the shortest path possible
• The algorithm can be implemented with a First In First Out (FIFO) stack

List open, closed, successors={};
Node root_node, current_node;
insert-last( root_node,open)

while not-empty (open );

current_node= remove-first(open);
insert-last ( current_node,closed);
if ( goal (current_node)) return current_node;
else
successors= successorsOf (current_node);
for(x in successors)
if( not-in (x,closed)) insert-last (x,open);
endIf
endWhile

N.B.= this version is not saving the path for simplicity

Result is: S->A->F

• Time complexity is the same magnitude as DFS
• O (b m )
• where m is the depth of the solution
• Space Complexity
• how many nodes can be in the queue (worst-case)?
• assume (worst case) that there is 1 goal leaf at the RHS
• so BFS will store all nodes

=1 + b + b 2 + ......... + b m
= O (b m )

### Further Uninformed Search Strategies

• Depth-limited search (DLS) : Impose a cut-off (e.g. n for searching a path of length n -1), expand nodes with max. depth first until cut-off depth is reached (LIFO strategy, since it is a variation of depth-first search).

• Bidirectional search (BIDI) : forward search from initial state & backward search from goal state, stop when the two searches meet. Average effort O(b d/2 ) if testing whether the search fronts intersect has constant effort

• In AI, the problem graph is typically not known. If the graph is known, to find all optimal paths in a graph with labelled arcs, standard graph algorithms can be used

### Informed Search

• Blind search methods take O(bm) in the worst case

• May make blind search algorithms prohibitively slow where d is large

• How can we reduce the running time?
• Use problem-specific knowledge to pick which states are better candidates

### Informed Search

• Also called heuristic search

• In a heuristic search each state is assigned a “heuristic value” (h-value) that the search uses in selecting the “best” next step

• A heuristic is an operationally-effective nugget of information on how to direct search in a problem space

• Heuristics are only approximately correct

### Informed Search: Prominent methods

• Best-First Search

• A*

• Hill Climbing

### Cost and Cost Estimation

f(n)=g(n)+h(n)

• g(n) the cost (so far) to reach the node n

• h(n) estimated cost to get from the node to the goal

• f(n) estimated total cost of path through n to goal

### Informed Search: Best-First Search

• Special case of breadth-first search
• Uses h(n) = heuristic function as its evaluation function
• Ignores cost so far to get to that node (g(n))
• Expand the node that appears closest to goal

• Best First Search is complete
• Best First Search is not optimal
• A solution can be found in a longer path (higher h(n) with a lower g(n) value)

• Special cases:
• uniform cost search: f(n) = g(n) = path to n
• A* search

### Best-First Search: Algorithm

List open, closed, successors={};
Node root_node, current_node;
insert-last( root_node,open)

while not-empty (open );

current_node= remove-first(open);
insert-last ( current_node,closed);
if ( goal (current_node)) return current_node;
else
successors= estimationOrderedSuccessorsOf (current_node);
for(x in successors)
if( not-in (x,closed)) insert-last (x,open);
endIf
endWhile

estimationOrderedSuccessorsOf
returns the list of direct descendants of the current node in shortest cost order

N.B.= this version is not saving the path for simplicity

### Best-First Search: Example

In this case we estimate the cost as the distance from the root node (in term of nodes)

### Best-First Search: Example

Result is: S->A->F!
If we consider real costs, optimal solution is: S->B->F

### A*

• Derived from Best-First Search

• Uses both g(n) and h(n)

• A* is optimal

• A* is complete

### A* : Algorithm

List open, closed, successors={};
Node root_node, current_node, goal;
insert-back( root_node,open)

while not-empty (open );

current_node= remove-front(open);
insert-back ( current_node,closed);
if (current_node==goal) return current_node;
else
successors= totalEstOrderedSuccessorsOf (current_node);
for(x in successors)
if( not-in (x,closed)) insert-back (x,open);
endIf
endWhile

totalEstOrderedSuccessorsOf
returns the list of direct descendants of the current node in shortest total estimation order

N.B.= this version is not saving the path for simplicity

### A* : Example

In this case we estimate the cost as the distance from the root node (in term of nodes)

### A* : Example

Result is: S->B->F!

### Hill Climbing

• Special case of depth-first search
• Uses h(n) = heuristic function as its evaluation function
• Ignores cost so far to get to that node (g(n))
• Expand the node that appears closest to goal

• Hill Climbing is not complete
• Unless we introduce backtracking
• Hill Climbing is not optimal
• Solution found is a local optimum

### Hill Climbing: Algorithm

List successors={}; Node root_node, current_node, nextNode;
current_node=root_node
while (current_node!=null)
if ( goal (current_node)) return current_node;
else
successors=successorsOf(current_node);
nextEval = -∞; nextNode=null;
for(x in successors)
if( eval (x) > nextEval)
nexEval=eval(x);
nextNode=x;
current_node=nextNode,
endIf
endWhile

N.B.= this version is not saving the path for simplicity

### Hill Climbing: Example

In this case we estimate the cost as the distance from the root node (in term of nodes)

### Hill Climbing: Example

Result is: S->A->B->F!

Not optimal, more if at step 1 h(S)=2 we would have completed without funding a solution

### Informed Search Algorithm Comparison

 Algorithm Time Space Optimal Complete Derivative Best First Search O(bm) O(bm) No Yes BFS Hill Climbing O(∞) O(b) No No A* O(2N) O(bd) Yes Yes Best First Search

b , branching factor
d , tree depth of the solution
m , maximum tree depth

### Route Search

 Start point: Milan End point: Innsbruck Search space: Cities Nodes: Cities Arcs: Roads Let’s find a possible route!

### Graph Representation

 We start from the root node, and pick the leaves The same apply to each leaves But we do not reconsider already used arcs The first node picked is the first node on the right

### Depth-First Search

N.B.: by building the tree, we are exploring the search space!

N.B.: by building the tree, we are exploring the search space!

### Depth-First Search vs Breadth-First search

• Distance
• DFS: 464 km
• BFS: 358 km
• Q1: Can we use an algorithm to optimize according to distance?
• Time
• DFS: 4 hours 37 mins
• BFS: 5 hours 18 mins
• Q2: Can we use an algorithm to optimize according to time?
• Search space:
• DFS: 5 expansions
• BFS: 26 expansions
• Not very relevant… depends a lot on how you pick the order of node expansion, never the less BFS is usually more expensive
• To solve Q1 and Q2 we can apply for example and Best-First Search
• Q1: the heuristic maybe the air distance between cities
• Q2: the heuristic maybe the air distance between cities x average speed (e.g. 90km/h)

### Best-First search

N.B.: by building the tree, we are exploring the search space!

### Variants to presented algorithms

• Combine Depth First Search and Breadth First Search, by performing Depth Limited Search with increased depths until a goal is found
• Enrich Hill Climbing with random restart to hinder the local maximum and foothill problems
• Stochastic Beam Search: select w nodes randomly; nodes with higher values have a higher probability of selection
• Genetic Algorithms: generate nodes like in stochastic beam search, but from two parents rather than from one

### Summary

• Uninformed Search
• If the branching factor is small, BFS is the best solution
• If the tree is depth IDS is a good choice
• Informed Search
• Heuristic function selection determines the efficiency of the algorithm
• If actual cost is very expensive to be computed, then Best First Search is a good solution
• Hill climbing tends to stack in local optimal solutions

### References

• Chap. 4: Görz et. al. (eds.): Handbuch der Künstlichen Intelligenz , 2000
• Chap. 3: Russell, Stuart J.; Norvig, Peter: Artificial Intelligence: A Modern Approach (2nd ed.), 2003
• Chap.2-3: M. Tim Jones: Artificial Intelligence: A Systems Approach
• http://en.wikipedia.org/wiki/Depth-first_search
• http://en.wikipedia.org/wiki/Best-first_search
• http://en.wikipedia.org/wiki/A*_search_algorithm
• http://en.wikipedia.org/wiki/Hill_climbing

### Agenda

1. Motivation
2. Technical solution, illustrations and extensions
2. Knowledge model components
3. Template knowledge models
4. Knowledge model construction
5. Knowledge elicitation techniques
3. Example
4. Summary
5. References
All slides are based on the book: Guus Schreiber, Hans Akkermans, Anjo Anjewierden, Robert de Hoog, Nigel Shadbolt, Walter Van de Velde and Bob Wielinga. Knowledge Engineering and Management: The CommonKADS Methodology , MIT Press, ISBN 0262193000. 2000.

### Knowledge engineering

• Process of
• eliciting,
• structuring,
• formalizing,
• operationalizing
• information and knowledge involved in a knowledge-intensive problem domain,
• in order to construct a program that can perform a difficult task adequately

### Knowledge engineering problems

• Complex information and knowledge is difficult to observe
• Experts and other sources differ
• Multiple representations:
• textbooks
• graphical representations
• skills

### Importance of proper knowledge engineering

• Knowledge is valuable and often outlives a particular implementation
• knowledge management

• Errors in a knowledge-base can cause serious problems

• Heavy demands on extendibility and maintenance
• changes over time

### TECHNICAL SOLUTION AND ILLUSTRATIONS

• CommonKADS: a comprehensive methodology for KBS development

• Knowledge engineering is not some kind of `mining from the expert's head', but consists of constructing different aspect models of human knowledge

• The knowledge-level principle: in knowledge modeling, first concentrate on the conceptual structure of knowledge, and leave the programming details for later

• Knowledge has a stable internal structure that is analyzable by distinguishing specific knowledge types and roles.

• Domain
• some area of interest
• banking, food industry, photocopiers, car manufacturing
• something that needs to be done by an agent
• monitor a process; create a plan; analyze deviant behavior
• Agent
• the executor of a task in a domain
• typically either a human or some software system
• Application
• The context provided by the combination of a task and a domain in which this task is carried out by agents
• Application domain
• The particular area of interest involved in an application
• The (top-level) task that needs to be performed in a certain application

• knowledge system (KS)
• system that solves a real-life problem using knowledge about the application domain and the application task

• expert system
• knowledge system that solves a problem which requires a considerable amount of expertise, when solved by humans.

### Model Set Overview (1)

• Organization model
• supports analysis of an organization,
• Goal: discover problems, opportunities and possible impacts of KBS (knowledge-based system) development.
• describes tasks that are performed or will be performed in the organizational environment
• Agent model
• describes capabilities, norms, preferences and permissions of agents (agent = executor of task).

### Model Set Overview (2)

• Knowledge model
• gives an implementation-independent description of knowledge involved in a task.
• Communication model
• models the communicative transactions between agents.
• Design model
• describes the structure of the system that needs to be constructed.

### Models exist in various forms

• Model template
• predefined, fixed structure, can be configured
• Model instance
• objects manipulated during a project.
• Model versions
• versions of a model instance can exist.
• Multiple model instances
• separate instances can be developed
• example: ''current'' and ''future'' organization

### The Product

• Instantiated models
• represent the important aspects of the environment and the delivered knowledge based system.
• information not represented in the filled model templates (e.g. project management information)
• Software

### Roles in knowledge-system development

• knowledge provider
• knowledge engineer/analyst
• knowledge system developer
• knowledge user
• project manager
• knowledge manager

many-to-many relations between roles and people

### Knowledge provider/specialist

• person with extensive experience in an application domain
• can provide also plan for domain familiarization
• “where would you advise a beginner to start?”
• inter-provider differences are common
• need to assure cooperation

### Knowledge engineer

• specific kind of system analyst

• should avoid becoming an "expert"

• plays a liaison function between application domain and system

### Knowledge-system developer

• person that implements a knowledge system on a particular target platform

• needs to have general design/implementation expertise

• needs to understand knowledge analysis
• but only on the “use”-level

### Knowledge user

• interact with the prospective system or are affected indirectly by the system
• Level of skill/knowledge is important factor
• May need extensive interacting facilities
• explanation
• His/her work is often affected by the system
• consider attitude / active role

### Project manager

• responsible for planning, scheduling and monitoring development work

• liaises with client

• typically medium-size projects (4-6 people)

• profits from structured approach

### Knowledge manager

• background role
• monitors organizational purpose of
• system(s) developed in a project
• knowledge assets developed/refined
• initiates (follow-up) projects
• should play key role in reuse
• may help in setting up the right project team

### Knowledge model

• specialized tool for specification of knowledge-intensive tasks
• abstracts from communication aspects
• real-world oriented
• reuse is central theme

### Knowledge categories

• Domain knowledge
• relevant domain knowledge and information
• static
• Inference knowledge
• basic reasoning steps that can be made in the domain knowledge and are applied by tasks
• goal-oriented
• functional decomposition

### Knowledge Categories: domain knowledge

• domain schema
• schematic description of knowledge and information types
• defined through domain constructs
• knowledge base
• set of knowledge instances

### Constructs for domain schema

• Concept
• cf. object class (without operations)
• Relation
• cf. association
• Attribute
• primitive value
• Rule type
• introduces expressions

### Example: car concepts

 gas dial value: gas dial fuel tank status: { full, almost-empty, empty} CONCEPT gas dial;    ATTRIBUTES:        value: dial-value; END CONCEPT gas-dial; VALUE-TYPE dial-value;     VALUE-LIST: {zero, low, normal};     TYPE: ORDINAL; END VALUE-TYPE dial-value; CONCEPT fuel-tank;    ATTRIBUTES:       status: {full, almost-empty,empyt}; END CONCEPT fuel-tank;

### Modelling rules

• knowledge analysis is focused on finding rules with a common structure
• a rule as an instance of a rule type
• models a relation between expressions about feature values (e.g. attribute values)
• gas-dial.value = zero -> fuel-tank.status = empty
• models set of real-world “rules” with a similar structure

### Example rule type

person.income <= 10,000
RESTRICTS
loan.amount <= 2,000

person.income > 10,000 AND person.income <= 20,000
RESTRICTS
loan.amount <= 3,000

### Rule type structure

• <antecedent> <connection-symbol> <consequent>
• example rule:
• fuel-supply.status = blocked
CAUSES
gas-in-engine.status = false;
• flexible use for almost any type of dependency
• multiple types for antecedent and consequent

### Inference knowledge

• describes the lowest level of functional decomposition
• basic information-processing units:
• inference ⇒ reasoning
• transfer function ⇒ communication with other agents
• why special status?
• indirectly related to domain knowledge
• enables reuse of inference

### Example inference: cover

fuel tank is empty leads to lack of gas in engine
if there is no gas in the the engine, then the car does not start

• describes goals
• assess a mortgage application in order to minimize the risk of losing money
• find the cause of a malfunction of a photocopier in order to restore service.
• design an elevator for a new building.
• describes strategies that can be employed for realizing goals.
• typically described in a hierarchical fashion

• Description of the input/output

• Main distinction with traditional functions is that the data manipulated by the task are (also) described in a domain-independent way.
• example, the output of a medical diagnosis task would not be a “disease” but an abstract name such as “fault category”

### Lessons

• Knowledge models partially reused in new applications

• Type of task = main guide for reuse

### The need for reuse

• prevent "re-inventing the wheel"

• cost/time efficient

• decreases complexity

• quality-assurance

• reusable combination of model elements
• (provisional) inference structure
• typical control structure
• typical domain schema from task point-of-view
• specific for a task type
• supports top-down knowledge modeling

• system pre-exists
• it is typically not completely "known"
• input: some data about the system,
• output: some characterization of the system
• system does not yet exist
• input: requirements about system to be constructed
• output: constructed system description

### Structure of template description in catalog

• General characterization
• typical features of a task
• Default method
• roles, sub-functions, control structure, inference structure
• Typical variations
• frequently occurring refinements/changes
• Typical domain-knowledge schema
• assumptions about underlying domain-knowledge structure

### Classification

• establish correct class for an object
• object should be available for inspection
• "natural" objects
• examples: rock classification, apple classification
• terminology: object, class, attribute, feature
• one of the simplest analytic tasks; many methods
• other analytic tasks: sometimes reduced to classification problem especially diagnosis

### Assessment

• find decision category for a case based on domain-specific norms.
• typical domains: financial applications (loan application), community service
• terminology: case, decision, norms
• some similarities with monitoring
• differences:
• timing: assessment is more static
• different output: decision versus discrepancy

### Diagnosis

• find fault that causes system to malfunction
• example: diagnosis of a copier
• terminology:
• complaint/symptom, hypothesis, differential, finding(s)/evidence, fault
• nature of fault varies
• state, chain, component
• should have some model of system behavior
• default method: simple causal model
• sometimes reduced to classification task
• direct associations between symptoms and faults
• automation feasible in technical domains

### Monitoring

• analyze ongoing process to find out whether it behaves according to expectations
• terminology:
• parameter, norm, discrepancy, historical data
• main features:
• dynamic nature of the system
• output "just" discrepancy => no explanation
• often: coupling monitoring and diagnosis
• output monitoring is input diagnosis

### Prediction

• analytic task with some synthetic features
• analyses current system behavior to construct description of a system state at future point in time.
• example: weather forecasting
• also found in knowledge-intensive modules of teaching systems e.g. for physics.
• inverse: retrodiction: big-bang theory

### Synthesis

• Given a set of requirements, construct a system description that fulfills these requirements

### “Ideal” synthesis method

• Operationalize requirements
• preferences and constraints
• Generate all possible system structures
• Select sub-set of valid system structures
• obey constraints
• Order valid system structures
• based on preferences

### Design

• system to be constructed is physical artifact
• example: design of a car
• can include creative design of components
• creative design is too hard a nut to crack for current knowledge technology
• sub-type of design which excludes creative design => configuration design

### Configuration design

• given predefined components, find assembly that satisfies requirements + obeys constraints
• example: configuration of an elevator; or PC
• terminology: component, parameter, constraint, preference, requirement (hard & soft)
• form of design that is well suited for automation
• computationally demanding

### Assignment

• create mapping between two sets of objects
• allocation of offices to employees
• allocation of airplanes to gates
• mapping has to satisfy requirements and be consistent with constraints
• terminology
• subject, resource, allocation
• can be seen as a “degenerative” form of configuration design

### Planning

• shares many features with design
• main difference: "system" consists of activities plus time dependencies
• examples: travel planning; planning of building activities
• automation only feasible, if the basic plan elements are predefined
• consider use of the general synthesis method (e.g therapy planning) or the configuration-design method

### Scheduling

• Given a set of predefined jobs, each of which consists of temporally sequenced activities called units, assign all the units to resources at time slots
• production scheduling in plant floors
• Terminology: job, unit, resource, schedule
• Often done after planning (= specification of jobs)
• Take care: use of terms “planning” and “scheduling” differs

### In applications: typical task combinations

• monitoring + diagnosis
• Production process
• monitoring + assessment
• diagnosis + planning
• Troubleshooting devices
• classification + planning
• Military applications

### Process & Product

• so far: focus on knowledge model as product
• bottleneck for inexperienced knowledge modelers
• how to undertake the process of model construction.
• solution: process model
• as prescriptive as possible
• process elements: stage, activity, guideline, technique
• but: modeling is constructive activity
• no single correct solution nor an optimal path
• support through a number of guidelines that have proven to work well in practice.
• knowledge modeling is specialized form of requirements specification
• general software engineering principles apply

### Stages in Knowledge-Model Construction

 STAGES TYPICAL ACTIVITIES knowledge identification domain familiarization (information sources, glossary, scenarios) list potential model components for reuse (task- and domain-related components) knowledge specification choose task template (provides initial task decomposition) construct initial domain conceptualization (main domain information types) complete knowledge-model specification (knowledge model with partial knowledge bases) knowledge refinement validate knowledge model (paper simulation, prototype of reasoning system) knowledge-base refinement (complete the knowledge bases)

### Stage 1: Knowledge identification

• goal
• survey the knowledge items
• prepare them for specification
• input
• main knowledge items identified.
• assessment, configuration, combination of task types
• activities
• explore and structure the information sources
• study the nature of the task in more detail

### Exploring information sources

• Factors
• Nature of the sources
• well-understood?, theoretical basis?
• Diversity of the sources
• no single information source (e.g. textbook or manual)
• diverse sources may be conflicting
• multiple experts is a risk factor.
• Techniques
• text marking in key information sources
• some structured interviews to clarify perceived holes in domain
• main problem:
• find balance between learning about the domain without becoming a full

### Guidelines

• Talk to people in the organization who have to talk to experts but are not experts themselves

• Avoid diving into detailed, complicated theories unless the usefulness is proven

• Construct a few typical scenarios which you understand at a global level

• Never spend too much time on this activity. Two person weeks should be maximum.

### Results exploration

• Tangible
• Listing of domain knowledge sources, including a short characterization.
• Summaries of selected key texts.
• Glossary/lexicon
• Description of scenarios developed.
• Intangible
• your own understanding of the domain
• most important result

### List potential components

• goal: pave way for reusing components
• two angles on reuse:
• build a list of task templates
• Domain dimension
• type of the domain: e.g. technical domain
• look for standardized descriptions

AAT for art objects ontology libraries, reference models, product model libraries

### Stage 2: Knowledge specification

• goal: complete specification of knowledge except for contents of domain models
• domain models need only to contain example instances
• activities
• Construct an initial domain conceptualization.
• Specify the three knowledge categories

• baseline: strong preference for a knowledge model based on an existing application.
• efficient, quality assurance
• selection criteria: features of application task
• nature of the output: fault category, plan
• nature of the inputs: kind of data available
• nature of the system: artifact, biological system
• constraints posed by the task environment:
• required certainty, costs of observations.

### Guidelines for template selection

• prefer templates that have been used more than once
• empirical evidence

• construct annotated inference structure (and domain schema)

• if no template fits: question the knowledge-intensity of the task

### Guidelines

• use as much as possible existing data models:
• useful to use at least the same terminology basic constructs
• makes future cooperation/exchange easier
• limit use of the knowledge-modeling language to concepts, sub-types and relations
• concentrate on "data"
• similar to building initial class model
• If no existing data models can be found, use standard SE techniques for finding concepts and relations
• use “pruning” method
• Constructing the initial domain conceptualization should be done in parallel with the choice of the task template
• otherwise: fake it

### Complete model specification

• Route 1: Middle-out
• Preferred approach
• Precondition: task template provides good approximation of inference structure.
• Route 2: Middle-in
• Start in parallel with task decomposition and domain modeling
• More time-consuming
• Needed if task template is too coarse-grained

### Guidelines

• inference structure is detailed enough, if the explanation it provides is sufficiently detailed
• inference structure is detailed enough if it is easy to find for each inference a single type of domain knowledge that can act as a static role for this inference

### Guidelines for specifying task knowledge

• begin with the control structure
• "heart" of the method
• neglect details of working memory
• design issue
• choose role names that clearly indicate role
• "modeling is naming"
• do not include static knowledge roles
• real-time applications: consider using a different representation than pseudo code

### Guidelines for specifying domain knowledge

• domain-knowledge type used as static role not required to have exactly the “right’” representation
• design issue;
• key point: knowledge is available.
• scope of domain knowledge is typically broader than what is covered by inferences
• requirements of communication, explanation

### Stage 3: Knowledge Refinement

• Validate knowledge model
• Fill contents of knowledge bases

### Fill contents of knowledge bases

• schema contains two kinds of domain types:
• information types that have instances that are part of a case
• knowledge types that have instances that are part of a domain model
• goal of this task: find (all) instances of the latter type
• case instances are only needed for a scenario

### Guidelines for filling contents

• filling acts as a validation test of the schema
• usually not possible to define full, correct knowledge base in the first cycle
• knowledge bases need to be maintained
• knowledge changes over time
• techniques:
• incorporate editing facilities for KB updating, trace transcripts, structured interview, automated learning, map from existing knowledge bases

### Validate knowledge model

• internally and externally
• verification = internal validation
• “is the model right?”
• validation = validation against user requirements
• "is it the right model?"

### Validation techniques

• Internal
• structured walk-troughs
• software tools for checking the syntax and find missing parts
• External
• usually more difficult and/or more comprehensive.
• main technique: simulation
• paper-based simulation
• prototype system

### Paper-based simulation

 the user says: “the car does not start” DIAGNOSIS: Complaint: engine- behavior.status = does-not-start Complaint is received, for which a diagnostic task is started a possible cause is that the fuel tank is empty COVER: hypothesis; fuel- tank.status = empty One of the three possible causes is produced in that case we would expect the gas indicator to be low PREDICT: expected-finding: gas-dial.value = zer The expected finding provides us with a way of getting supporting evidence for hypothesis

### Maintenance

• model development is a cyclic process
• models act as information repositories
• continuously updated
• but: makes requirements for support tools stronger
• transformation tools

### Domain Documentation Document

• Knowledge model specification
• list of all information sources used.
• list of model components that we considered for reuse.
• scenarios for solving the application problem.
• results of the simulations undertaken during validation
• Elicitation material (appendices)

### Summary process

• Knowledge identification
• familiarization with the application domain
• Knowledge specification
• detailed knowledge analysis
• supported by reference models
• Knowledge refinement
• completing the knowledge model
• validating the knowledge model
• Feedback loops may be required
• simulation in third stage may lead to changes in specification
• Knowledge bases may require looking for additional knowledge sources.
• general rule: feedback loops occur less frequently, if the application problem is well-understood and similar problems have been tackled

### Elicitation of expertise

• Time-consuming
• Multiple forms
• e.g. theoretical, how-to-do-it
• Multiple experts
• Heuristic nature
• distinguish empirical from heuristic
• Managing elicitation efficiently
• knowledge about when to use particular techniques

### Expert types

• Regards domain as having a logical structure
• Talks a lot
• Emphasis on generalizations and laws
• Feels a need to present a consistent “story”: teacher
• Often remote from day-to-day problem solving
• Practitioner
• Heavily into day-to-day problem solving
• Implicit understanding of the domain
• Emphasis on practical problems and constraints
• Many heuristics

### Human limitations and biases

• Limited memory capacity
• Context may be required for knowledge recollection
• Prior probabilities are typically under-valued
• Limited deduction capabilities

### Elicitation techniques

• Interview
• Self report / protocol analysis
• Concept sorting
• Repertory grids

### Interview: Session preparation

• Establish goal of the session
• Consider added value for expert
• Describe for yourself a profile of the expert
• List relevant questions
• Write down opening and closing statement
• Check recording equipment
• audio recording is usually sufficient
• Make sure expert is aware of session context: goal, duration, follow-up, et cetera

### Interview: Start of the session

• Introduce yourself (if required)
• Clarify goal and expectations
• Indicate how the results will be used
• Ask permission for tape recording
• Privacy issues
• Check whether the expert has some questions left
• Create as much as possible a mutual trust

### Interview: During the session

• Avoid suggestive questions
• Clarify reason of question
• Phrase questions in terms of probes
• e.g, “why …”
• Pay attention to non-verbal aspects
• Be aware of personal biases
• Give summaries at intermediate points

### Interview: End of the session

• Restate goal of the session
• Indicate what will be the next steps
• Make appointments for the next meetings
• Process interview results ASAP.
• Organize feedback round with expert
• Distribute session results

### Unstructured interview

• No detailed agenda
• Few constraints
• Delivers diverse, incomplete data
• Used in early stages: feasibility study, knowledge identification
• Useful to establish a common basis with expert
• s/he can talk freely

### Structured interview

• Knowledge engineer plans and directs the session
• Takes form of provider-elicitor dialogue
• Delivers more focused expertise data
• Often used for “filling in the gaps” in the knowledge base
• knowledge refinement phase
• Also useful at end of knowledge identification or start of knowledge specification
• Always create a transcript

### Interview structure for domain-knowledge elicitation

• should be relatively small task, e.g. an inference
• Take each rule, and ask when it is useful and when not
• Use fixed set of probes:
• “Why would you do that?”
• “How would you do that?”
• “When would you do that?”
• “What alternatives are there for this action?”
• “What if …?”
• “Can you tell me more about ..?”

### Interview pitfalls

• Experts can only produce what they can verbalize
• Experts seek to justify actions in any way they can
• “spurious justification”
• Therefore: supplement with techniques that observe expertise “in action”
• e.g. self report

### Self report

• Expert performs a task while providing a running commentary
• expert is “thinking aloud”
• Session protocol is always transcribed
• input for protocol analysis
• Variations:
• shadowing: one expert performs, a second expert gives a running commentary
• retrospection: provide a commentary after the problem-solving session
• Theoretical basis: cognitive psychology

### Requirements for self-report session

• Knowledge engineer must be sufficiently acquainted with the domain
• only a few problems can be tackled
• selection typically guided by available scenario’s and templates
• Expert should not feel embarrassed
• consider need for training session

### Analyzing the self-report protocol

• Use a reference model as a coding scheme for text fragments
• Look out for “when”-knowledge
• Annotations and mark-ups can be used for domain-knowledge acquisition
• Consider need for tool support

### Self report guidelines and pitfalls

• Present problems in a realistic way
• Transcribe sessions as soon as possible
• Avoid long sessions (maximum = 20 minutes)
• Presence of knowledge engineer is important
• Be aware of scope limitations
• Verbalization may hamper performance
• Knowledge engineer may lack background knowledge to notice distinctions

### Use of self reports

• Knowledge specification stage
• Validation of the selection of a particular reference model
• Refining / customizing a task template for a specific application
• If no adequate task template model is available: use for bottom-up reasoning model construction
• but: time-consuming

• Organizing entities in a hierarchy
• Hierarchies are meant as pre-formal structures
• Nodes can be of any type
• class, process, relation, ….
• Useful for the initial phases of domain-knowledge structuring
• in particular knowledge identification
• Can be done by expert
• tool support

### Concept sorting

• Technique:
• present expert with shuffled set of cards with concept names
• expert is asked to sort cards in piles
• Helps to find relations among a set of concepts
• Useful in case of subtle dependencies
• Simple to apply
• Complementary to repertory grids
• concept sort: nominal categories
• repertory grid: ordinal categories

### Repertory grid

• Based on personal construct theory (Kelly, 1955)
• Subject: discriminate between triads of concepts
• Mercury and Venus versus Jupiter
• Subject is asked for discriminating feature
• E.g. “planet size”
• Re-iterate until no new features are found
• Rate all concepts with respect to all features
• Matrix is analyzed with cluster analysis
• Result: suggestions for concept relations
• Tool support is required

### When to use which technique?

• Knowledge identification
• Knowledge specification
• Domain schema: concept sorting, repertory grid
• Template selection: self report
• Task & inference knowledge: self report
• Knowledge refinement
• Structured interview

### Housing application

• An application for assigning houses to potential renters
• We now sketch the organization, task and agent model and build the knowledge model on top.

### Problem description

• Local government institution is responsible for assignment of rental houses to applicants
• Transparent assignment procedure
• two-weekly magazine with house offers
• publication of results
• Partially automated process
• Existing databases of applicants and residences

### Organization model 1

 Organization Model Problems and Opportunities Worksheet Organization model 1 Problems and opportunities assessment takes too much time not sufficient time for urgent cases Organizational context Mission: transparency of procedure, clear applicant responsibility External actors: local council, public opinion, national regulations, … Strategy: broaden scope of market Solutions Automated assessment system & Training program for assessors to be come urgency handlers

### Organization model 2

 Organization model Variant aspects: Worksheet Organization model 2 Resources Existing database of applicants and residences Priority calculator for computing a priority list of applicants for a residence. Knowledge Assessment criteria: knowledge for judging correctness of individual applications Assignment rules: knowledge used for selecting an applicant for a particular house. Urgency rules: special rules and regulations for urgent cases (e.g., handicapped people). Culture & power Hierarchical organization Employees view the future with some trepidation Management style: history as civil servant

### Organization model 3

 Task Performed by Where Knowledge asset(s) KI? Signifi- cance 1. Magazine production Magazine editor Public service - No 3 2. Data entry applications Data typist / automated telephone Residence assignment - No 2 3. Application assessment Assigner Residence assignment Assessment criteria Yes 5 4. Residence assignment Assigner Residence Assignment Assignment & urgency rules Yes 5

### Organization model 4

• Knowledge asset:
• “general residence-application norms”
• right form?
• no, should be also in electronic form
• right place, time, quality?
• yes

• handles inputs and delivers desired outputs
• in a structured and controlled way
• consumes resources;
• requires (and provides) knowledge/skills
• adheres to quality and performance criteria
• carried out by responsible and accountable agents

### Agent model

• OM and TM => process/task perspective
• AM: perspective of individual agents
• staff, software systems
• large part: rearrangement of information already in other worksheets
• just a single worksheet
• agent view useful for judging impact
• See attitude matrix
• important input for communication model

### Agent model

 Name Assigner Organization Residence-assignment department Involved In 3. Application assessment 4. Residence assignment Communicates with Database Priority calculator Knowledge Assessment criteria Assignment rules Urgency rules Other competencies Ability to handle problematic non-standard cases Responsibilities & constraints Make sure that people are treated equally (no favors). This has been a problem in the past

### Knowledge model

• Reading the two-weekly magazine in detail
• organizational goal of transparent procedure makes life easy
• Reading the original report of the local government for setting up the house assignment procedure
• identification of detailed information about handling urgent cases
• Short interviews/conversations
• staff member of organization
• two applicants (the “customers”)
• Now we look into
• Domain model
• Inference structure

### Summary

• Knowledge model components
• Knowledge model:
• specialized tool for specification of knowledge-intensive tasks
• abstracts from communication aspects
• real-world oriented
• reuse is central theme
• goal-oriented
• functional decomposition
• Domain knowledge
• relevant domain knowledge and information
• static
• Inference knowledge
• basic reasoning steps that can be made in the domain knowledge and are applied by tasks
• Template knowledge models
• Knowledge models partially reused in new applications
• Type of task = main guide for reuse
• reusable combination of model elements
• (provisional) inference structure
• typical control structure
• typical domain schema from task point-of-view
• specific for a task type
• supports top-down knowledge modeling

### Summary (cont‘d)

 STAGES TYPICAL ACTIVITIES knowledge identification domain familiarization (information sources, glossary, scenarios) list potential model components for reuse (task- and domain-related components) knowledge specification choose task template (provides initial task decomposition) construct initial domain conceptualization (main domain information types) complete knowledge-model specification (knowledge model with partial knowledge bases) knowledge refinement validate knowledge model (paper simulation, prototype of reasoning system) knowledge-base refinement (complete the knowledge bases)

### Summary (cont‘d)

• Knowledge elicitation techniques
• Interview
• Self report / protocol analysis
• Concept sorting
• Repertory grids
• Automated learning techniques
• Induction

### References

• Guus Schreiber, Hans Akkermans, Anjo Anjewierden, Robert de Hoog, Nigel Shadbolt, Walter Van de Velde and Bob Wielinga. Knowledge Engineering and Management: The CommonKADS Methodology , MIT Press, ISBN 0262193000. 2000.
• Chapters 1, 2, 4, 6-8

### References

• Guus Schreiber, Hans Akkermans, Anjo Anjewierden, Robert de Hoog, Nigel Shadbolt, Walter Van de Velde and Bob Wielinga. Knowledge Engineering and Management: The CommonKADS Methodology , MIT Press, ISBN 0262193000. 2000.

### References

• http://en.wikipedia.org/wiki/Knowledge_engineering
• http://en.wikipedia.org/wiki/Knowledge-based_systems
• http://en.wikipedia.org/wiki/Knowledge_modeling
• http://en.wikipedia.org/wiki/Expert_system

### Agenda

• Motivation
• Technical Solution
• Development of Knowledge-based systems towards Problem-Solving Methods
• Problem-Solving Methods
• Illustration by a Larger Example
• Extensions
• Summary
• References

### Motivation

• In order to allow automation in the achievement of complex problems we should like a general solution with the following characteristics:

Knowledge
• Based on reasoning with knowledge;
• Works with a declarative , rather than an algorithmic, representation of knowedge;
Process
• Represents knowledge on the problem-solving process , i.e., the dynamics of the solution;
Reuse
• Allows reuse of reasoning knowledge;
• Abstracts from implementation and domain to increase reusability.

### Motivating Example

• As a motivating example we consider the task (i.e., goal) of parametric design , which can be defined over:

• Design space – the space that contains all possible designs;

• Requirements – a finite set of which are assumed to be provided by the user;

• Constraints – a finite set of which model additional conditions for a valid design;

• Preference – a function over the design space which can be used to discriminate between different solutions.

### Motivating Example

• Domain knowledge is implicit in the parametric design description in several places:

• Design space – the concrete definition of the design space is domain specific knowledge

[Requirements]

• Constraints – domain knowledge concerning regularities in the domain in which the design is constructed;

[Preference]

### Motivating Example

• Formally, a design artifact is described by a set of attribute-value pairs.

Let A1, ..., An be a fixed set of parameters with fixed ranges R1, ..., Rn:

• Design space is the cartesian product R1 × ... × Rn;

• Requirements – a relation R which defines a subset of this space;

• Constraints – a relation C which defines a subset of this space;

• Preference – a partial function P having all possible designs as its domain and preference values as its range.

### Motivating Example

• Several types of design are implicitly defined by these aspects of the problem:
• Possible design – An element in design space (∈ R1 × ... × Rn);
• Desired design – A design that fulfills all requirements (∈ R);
• Valid design – A design that fulfills all constraints (∈ C);
• Solution – A design which is desired and valid (∈ R C);
• Optimal solution - A solution for which no other solution which has a higher preference value exists
(any solution x such that ∀y ∈ R1 × ... × Rn . P(y) ≤ P(x)).
• There is also the possible extension:
• Acceptable solution - Given a threshold t, an acceptable solution is any solution x such that P(x) > t.

### Motivating Example

• An inefficient naive approach to parametric design is called generate & test , which depends on the following inferences:

• generate – requires knowledge that describes what constitutes a possible design (i.e., the design space);

• R-test – requires knowledge that describes which possible designs are desired (i.e. the user requirements);

• C-test – requires knowledge that describes which possible designs are valid (i.e., the domain constraints);

• select – requires knowledge that evaluates solutions (i.e., knowledge that describes what constitutes a preferred design).

### Motivating Example

• This naive solution can also be represented as:
possible design := generate all ;
valid design := C-test (possible design);
desired design := R-test (possible design);
solution := valid design desired design;
optimal solution := select (solution)
• Using the definition of acceptable solution this can be made somewhat more efficient as:
repeat
possible design := generate one ;
valid design := C-test (possible design);
desired design := R-test (possible design);
solution := valid design desired design;
acceptable solution := select (solution)
until ø ≠ acceptable solution

### Lessons from Example

• generate & test has the following characteristics:
• it separates the different types of knowledge;
• it is not efficient (all possible designs are generated);
• It may not terminate if the design space is infinite.
• From the literature on expert systems [Stefik et al., 1983]:
• “an important issue is the distribution of knowledge between the generator and the tester: putting as much knowledge as possible into the generator often leads to a more efficient search.”
• A much more clever strategy is therefore to use these knowledge types to guide the generation of possible designs.
• However to do so requires strong assumptions about the domain knowledge.

### Development

1. General Problem Solver

2. Knowledge-is-power hypothesis

3. Knowledge levels

3a. Newell’s 3 levels of knowledge

3b. Brachman’s 5 levels of knowledge

4. Problem Solving Methods

### 1. General Problem Solver

• The General Problem Solver (GPS) is a universal problem solving approach.

• GPS is the first approach that makes the distinction between knowledge of problems domains and how to solve problems

• GPS is domain and task independent approach.

• GPS does not put any restrictions both on the domain knowledge and on the task.

• Examples of GPS are: automated theorem proving and generic search methods

### Automated theorem proving

• Automatic theorem provers are GPS for which every problem can be expressed as logical inference
• Automated theorem proving is about proving of mathematical theorems by a computer program

### Generic Search Methods

• Generic Search Methods are GPS for which every problem can be expressed as search

• One particular example of a Generic Search Method is the A* algorithm.

• A* works for problems that can be represented as a state space i.e. a graph of states. Initial conditions of the problem are represented as start state , goal conditions are represented as end state

• A* is an informed search or heuristic search approach that uses the estimation function:

f(n)=g(n)+h(n)

• g(n) the cost to get from the star state to current state n

• h(n) estimated cost to get from current state n to end state

• f(n) estimated total cost from start state through current state n to the end state

See Lecture 5

### 1. General Problem Solver (1)

• However, GPS has a set of limitations :

• It works in theory but in practice works only on toy problems (e.g. Tower of Hanoi)

• Could not solve real-world problems because search was easily lost in the combinatorial explosion of intermediate states

### 2. Knowledge-is-power hypothesis

Knowledge-is-power hypothesis , also called the Knowledge Principle was formulated by E.A. Feigenbaum in 1977:

“knowledge of the specific task domain in which the program is to do its problem solving was more important as a source of power for competent problem solving than the reasoning method employed” [Feigenbaum, 1977]

### 2. Knowledge-is-power hypothesis (1)

• The Knowledge-is-power hypothesis shifted the focus on how to build intelligent systems from inference to the knowledge base .

• Problem solving is guided by experiential, qualitative, heuristic knowledge.

• The meaning of intelligence as knowledge is the common meaning in English world.

• The Central Intelligence Agency (CIA) defines intelligence as knowledge.

• The Knowledge-is-power hypothesis lead to the emergence of a new filed i.e. expert systems and a new profession i.e. knowledge engineer

### 3. Knowledge levels

3a. Newell’s 3 levels of knowledge

3b. Brachman’s 5 levels of knowledge

### 3a. Newell’s 3 levels of knowledge [Newell, 1981]

• In his work from 1981, Newell tried to answer questions such as:

• How can knowledge be characterised?

• What is the relation of this characterisation and the representation of knowledge?

• What is characteristic about a system which holds knowledge?

• Newell distinguished 3 levels in the context of knowledge representation:

• Knowledge Level

• Logical Level

• Implementation Level

### 3a. Newell’s 3 levels of knowledge (1)

• Knowledge Level

• The most abstract level of representing knowledge.

• Concerns the total knowledge contained in the Knowledge Base

• Example:

The automated DB-Information system knows that a trip from Innsbruck to Vienna costs 120€

### 3a. Newell’s 3 levels of knowledge (2)

• Logical Level

• Encoding of knowledge in a formal language.

• Example:

Price(Innsbruck, Vienna, 120)

### 3a. Newell’s 3 levels of knowledge (3)

• Implementation Level

• The internal representation of the sentences.

• Example:
• As a String “Price(Innsbruck, Vienna, 120)”
• As a value in a matrix

### 3b. Brachman’s 5 Levels of Knowledge [Brachman, 1979]

• Brachman defines 5 levels for different types of representations.
• Levels interpret the transition from data to knowledge.
• Each level corresponds to an explicit set of primitives offered to the knowledge engineer.
• Ordering of knowledge levels from simple/abstract to complex/concrete:
• Implementational Level
• Logical Level
• Epistemological Level
• Conceptual Level
• Linguistic Level

### 3b. Brachman’s 5 Levels of Knowledge (1)

• Implementational Level

• The primitives are pointers and memory cells.

• Allows the construction of data structures with no a priori semantics

### 3b. Brachman’s 5 Levels of Knowledge (2)

• Logical Level

• The primitives are logical predicates, operators, and propositions.

• An index is available to structure primitives.

• A formal semantic is given to primitives in terms of relations among objects in the real world

• No particular assumption is made however as to the nature of such relations

### 3b. Brachman’s 5 Levels of Knowledge (3)

• Epistemological Level

• The primitives are concept types and structuring relations.

• Structuring relations provide structure in a network of conceptual types or units. (i.e. inheritance: conceptual units, conceptual sub-units)

• The epistemological level links formal structure to conceptual units

• It contains structural connections in our knowledge needed to justify conceptual inferences.

### 3b. Brachman’s 5 Levels of Knowledge (4)

• Conceptual Level

• The primitives are conceptual relations, primitive objects and actions.

• The primitives have a definite cognitive interpretation, corresponding to language-independent concepts like elementary actions or thematic roles

### 3b. Brachman’s 5 Levels of Knowledge (5)

• Linguistic Level

• The primitives are words, and (lingustic) expressions.

• The primitives are associated directly to nouns and verbs of a specific natural language

• Arbitrary relations and nodes that exist in a domain

### 4. Problem Solving Methods

• Problem Solving Methods (PSM) abstract from details of the implementation of the reasoning process.

• Characteristics of PSM [Birmingham&Klinker, 1993] :
• A PSM specifies which inference actions have to be carried out for solving a given task.
• A PSM determines the sequence in which these actions have to be activated.
• Knowledge roles determine which role the domain knowledge plays in each inference action.

### Technical Solution Overview

• What are problem-solving methods (PSMs) ?
• “Reasoning strategies that gain efficiency through assumptions.
[Fensel, 2000]

### Technical Solution Overview

• Problem-solving methods achieve an efficient realization of functionality by making assumptions .
• Assumptions put restrictions on the context of the problem-solving method, such as the domain knowledge and the possible inputs of the method or the precise definition of the functionality.
• Assumptions play two roles:
• they formulate requirements on reasoning support that is assumed by PSMs;
• they put restrictions on the reasoning support that is provided by PSMs.
• In consequence, assumptions link PSMs with the domain knowledge they use and tasks they are applied to.

### Example Revisited

• We consider again the task of parametric design.

• A more efficient method is named propose & revise and depends on the following inferences:
• propose – derives an initial design based on the requirements;
• C-test – as before;
• revise – tries to improve an incorrect design based on the feedback of the C-test step.

• In other words, instead of proposing a complete design which is then repaired, we can also incrementally develop a design and repair it at each step where a constraint violation occurs.

### Example Revisited

• A parameter which should receive a value in the next propose step is nondeterministically selected:
• The selection process does not make further assumptions about knowledge that could guide this second selection step
• The implicit assumption is that this selection does not a
affect the performance of the problem solving process and the quality of its result
• These are very strong assumptions because to improve performance, heuristic methods are definitely needed
• At any time there is either precisely one applicable propose rule or one user input to derive the value of the selected parameter
• A parameter should not depend on itself (no recursive derivation rules)

### Example Revisited

• revise is decomposed into:
• select-violation - nondeterministically selects a constraint violation from those detected by C-test; implicit assumption is that this selection does not influence the performance of the problem solving method and the quality of the result; strong assumption again
• derive-fixes - computes the set of all possible fix combinations that could possibly resolve the selected constraint violation; each combination must be finite
• select-fix - selects a fix combination, guided by a cost-function
• apply-fix - applies a fix combination

### Example Revisited

• Test – propose & revise does not require an explicit R-test, the method assumes that:
• propose derives only desired designs;
• revise delivers designs that are desired or that requirement violations that it does not x must be accepted.

• Selection – does not contain such a process concerning user preferences:
• It assumes that the propose step and the revise step deliver acceptable (or optimal) solutions or that the functionality of the task is reduced to finding an arbitrary solution.

### Description of Problem-Solving Methods

• The main elements of a specification in the PSM framework are:
• the task – specifies the goals that should be solved in order to solve a given problem. A second part of a task specification is the definition of requirements on domain knowledge;

• the problem-solving method – describes an reasoning steps to perform a task as an operational specification separately from a description of the competence, and a description of the requirements on domain knowledge;

• the domain model – usually ontology-based description in three parts, a meta-level characterisation of properties, the domain knowledge, and (external) assumptions of the domain model;

• the adapter – maps the different terminologies of the task definition, problem-solving method and domain model. Moreover, gives further requirements and assumptions needed to relate the competence of the PSM with the functionality of the task.

### Description of Problem-Solving Methods

• Several proof obligations follow the conceptual model of such a specification of a knowledge-based system:
• PO-i : the consistency of the task definition to ensure that a model exists, otherwise one could define an unsolvable problem;
• PO-ii : that the operational specification of the PSM describes a terminating process which has the competence as specified;
• PO-iii : the internal consistency of the domain knowledge and domain model, also that the assumptions on domain knowledge implies its meta-level characterisation;
• PO-iv : relationships between the specification elements –
1. the requirements of the adapter imply the knowledge requirements of the PSM and task,
2. the adapter’s additional requirements on domain knowledge and assumption guarantee that the competence of the PSM is strong enough for task,
3. The requirements of the adapter are implied by the meta knowledge of the domain model.

### PSM Framework Example

• To illustrate the description of PSMs we consider search
• Local search can be refined into other versions using adapters, specifically:
• Hill-climbing : a local search algorithm which stops when it has found a local optimum , based on recursively considering the successors to a start object, selecting better at each stage;
• Set-Minimizer : finds a minimal but still correct subset of a given set, with respect to hill-climbing –
• generic ‘object’ becomes a set,
• ‘successor’ relationship is hard-wired,
• preference is implicit;
• Abductive Diagnosis : receives a set of observations as input and delivers a complete (explains all input data) and parsimonious (no subset of hypotheses explains all observations) explanation .

### PSM Framework Example

• Local search can be represented in the following operational specification:

•  operational specification local search output := local search(input); local search(X) begin currents := select1(X); output := recursion(currents); end recursion(X) begin successors := generate(X); new := select2(X, successors); if X = new then output := select3(X); else recursion(new); end select1(x) ⊆ x x ∈ generate(y) ↔ x ∈ input ∧ ∃z.(z ∈ y ∧ successor(z, x)) select2(y, y’) ⊆ y ∪ y’ ¬∃x, z . (z ∈ (y ∪ y’) \ select2(y, y’) ∧ x < z) y ∪ y’ ≠ ∅ → ∃x.(x ∈ select2(y, y’)) select3(y) ⊆ y ¬∃x, z . (z ∈ (y \ select3(y)) ∧ x ∈ select3(y) ∧ x < z) y ≠ ∅ → ∃x.(x ∈ select3(y)) endoperational spec

### PSM Framework Example

• Adapters between refinement levels can be represented as follows:
PSM refinement adapter set-minimizer -> abduction-method
correct(x) = complete(x);
input = {h | h is hypothesis};
H ⊆ H’ → explain(H) ⊆ explain(H’)
PSM refinement adapter hill-climbing -> set-minimizer
correct(input);
select1(x) = {x};
successor(x, y) ↔ ∃z . (z ∈ x ∧ y = x \ {z});
x < y ↔ correct(y) ∧ y ⊂ x
PSM refinement adapter local-search -> hill-climbing
|select1(x)| = 1;
¬∃z . (z ∈ y’ ∧ y < z) → select2({y}, y’) = {y};
|select2({y}, y’)| = 1

### Sisyphus-I

• The Sisyphus-I office allocation problem was the first of a series of test applications for approaches to building knowledge-based systems.

• The aim is to automate the problem-solving behaviour of ‘Siggi’, a hypothetical domain expert .

• Specifically Siggi is required to make assignments of the staff of a research group ‘YQT’ to offices.

• The 4-page problem statement describes the office layout and relevant data about the 15 group members.

• A ‘protocol’ is provided, describing the steps Siggi takes.

### Sisyphus-I YQT Members Data

• From the protocol can be drawn data such as:

### Sisyphus-I PSM-Based Approach

• The approach used rests on mapping from task concepts to method concepts as shown [Motta, 1999] :

### Sisyphus-I Values Ranges

• From the protocol can be drawn value ranges, based on given justifications in the protocol:

### Sisyphus-I Requirements and Constraints

• From the protocol can be drawn the following requirements and constraints:

### Sisyphus-I Preferences

• From the protocol can be drawn the following preferences:

### Sisyphus-I Cost Function

• Cost function produces a 4-dimensional vector,

< n1, n2, n3, n4 > , where:

• n1 measures the distance between the room of the head of group and that of the secretaries;

• n2 measures the distance between the manager’s room and the rooms of the head of group and the secretaries;

• n3 measures the distance between the heads of projects and the head of group and secretaries;

• n4 provides a measure of the ‘project synergy’ afforded by a solution.

### Sisyphus-I Cost Function (cntd.)

• A design model in the Sisyphus-I domain d1, with cost function
< n11, n12, n13, n14 > is cheaper than a design model d2, with cost
< n21, n22, n23, n24 > , iff one or more of the following conditions are satisfied:

• n11 < n21

• n11 = n21 and n12 < n22

• n11 = n21 and n12 = n22 and n13 < n23

• n11 = n21 and n12 = n22 and n13 = n23 and n14 < n24

### Sisyphus-I PSM-based Solutions

• Solving by Gen-design-psm (Generic Model for Parametric Design), which enhances simple depth-first search via:
• Focus selection – a DSR strategy analyses
• Value ranges associated with each parameter,
• The most constrained parameter in the current design model ;
• Operator selection – operator chosen according to given ordering.
• Solving by HC-design (Hill climbing) – same competence as Gen-design-psm, but much less efficient (measured at 10%, as compared to 78%).
• Solving by A*-design, based on heuristic function using estimated cost function – better competence (optimal solution), but comparable efficiency to HC-design.

### Application to Web Service Models

• A standard language was developed for the description of PSMs called the Unified Problem-Solving Method Language (UPML) [Fensel et al, 2002]

• UPML was applied to the modelling of Web Services in the Web Service Modelling Framework (WSMF) [Fensel & Bussler, 2002]

• The WSMF approach was encoded in the Web Service Modeling Ontology (WSMO) – wherein ontology-based models are builts for goals (~tasks), services (~methods) and these are linked by mediators (~adapters) [Fensel et al., 2007]

• WSMO was encoded in a family of ontology languages, WSML, and an open-source implementation was carried out in WSMX.

### Summary

• Problem-solving methods offer a means to structure and reuse elements of knowledge-based systems by abstracting from their domain.

• Efficiency is achieved by introducing assumptions that either restrict the size of the problem or that postulate strong requirements on the available domain knowledge.

• Adapters link tasks and PSMs to domains, allowing reuse of both, and express refinements between PSMs, allowing libraries to be built of these.

### References

• [Fensel, 2000] “Problem-Solving Methods: Understanding, Description, Development and Reuse”, D. Fensel, Springer LNAI 1791, 2000

• [Motta, 1999] “Reusable Components for Knowledge Modelling: Case Studies in Parametric Design”, E. Motta, IOS Press, 1999
• [Schreiber et al., 1994] “CommonKADS: A Comprehensive Methodology for KBS Development”, A. Th. Schreiber, B. Wielinga and J. M. Akkermans, W. Van de Vedle, and
• R. De Hoog, IEEE Expert , 9(6):28–37, 1994
• [Stefik et al., 1983] “Basic Concepts for Building Expert Systems”, M. Stefik, J. Aitkins, R. Balzer, J. Benoit, L. Birnbaum, F. Hayes-Roth, and E. Sacerdoti in Building Expert Systems , Addison-Wesley, 1983

### References

• [Smith & Lowry, 1990] “Algorithm Theories and Design Tactics”, D. R. Smith and M. R. Lowry in Science of Computer Programming , 14:305–321Addison-Wesley, 1990

• [Fensel et al, 2002] “The Unified Problem-solving Method Development Language UPML”, D. Fensel, E. Motta, R. Benjamins, M. Crubezy, S. Decker, M. Gaspari, R. Groenboom, W. Grosso, F. van Harmelen, M. Musen, E. Plaza, G. Schreiber, R. Studer, B. Wielinga, Knowledge and Information Systems, 5(1), 83–131, 2002

• [Fensel & Bussler, 2002] “The Web Service Modeling Framework (WSMF)”, D. Fensel and C. Bussler, Electronic Commerce Research and Applications, 1(2), 2002

• [Fensel et al., 2007] “Enabling Semantic Web Services:The Web Service Modeling Ontology (WSMO)”, D. Fensel, H. Lausen, A. Polleres, J. de Bruijn, M. Stollberg, D. Roman, and J. Domingue, Springer, 2007

### References

• [Brachman, 1979] “On the Epistemological Status of Semantic Networks” R. J. Brachman. In: N.V. Findler (ed.): Associative Networks: Representation and Use of Knowledge by Computers. New York: Academic Press, 1979, 3-50.

• [Birmingham&Klinker, 1993] “Knowledge Acquisition Tools with Explicit Problem-Solving Methods”, W. Birmingham and G. Klinker. The Knowledge Engineering Review 8, 1 (1993), 5-25

• [Feigenbaum, 1977] “The Art of Artificial Intelligence: Themes and Case Studies of Knowledge Engineering,” E.A. Feigenbaum. Proceedings of the International Joint Conference on Artificial Intelligence, Cambridge, MA, 1977

• [Newell, 1981] “The Knowledge Level” Newell, AI Magazine 2 (2), 1981, p. 1-20

### References

• http://en.wikipedia.org/wiki/Problem_solving

• http://www.wsmo.org/

• http://www.wsmo.org/wsml/

• http://www.wsmx.org/

• http://www.wsmo.org/papers/publications/wsmf.paper.pdf

### Agenda

• Motivation

• Technical Solution

• Illustration by a Larger Example

• Extensions

• Summary

• References

### Motivation

• What is Planning?
• “Planning is the process of thinking about the activities required to create a desired goal on some scale” [Wikipedia]
• We take a more pragmatic view – planning is a flexible approach for taking complex decisions:
• decide about the schedule of a production line;
• decide about the movements of an elevator;
• decide about the flow of paper through a copy machine;
• By “flexible” we mean:
• the problem is described to the planning system in some generic language;
• a (good) solution is found fully automatically;
• if the problem changes, all that needs to be done is to change the description.
• We look at methods to solve any problem that can be described in the language chosen for the particular planning system.

### An ‘easy’ planning example

• Goal: Buy 1 liter of milk

• It may sound like a simple task, but if you break it down, there are many small tasks involved:

•  obtain car keys, obtain wallet, exit home, start car, drive to store, find and obtain milk, purchase milk, …

### Logistics

• During the 1991 Gulf War, US forces deployed an AI logistics planning and scheduling program ( Dynamic Analysis and Replanning Tool ) that involved up to 50,000 vehicles, cargo, and people
• This one application alone reportedly more than offset all the money DARPA had funneled into AI research in the last 30 years.

### Space Exploration

• Deep Space 1
• NASA's on-board autonomous planning program controlled the scheduling of operations for a spacecraft

### Space Exploration (cont’d)

• Mars Exploration Rover (MER)
• Autonomous planning, scheduling, control for Mars Exploration Rover.
• MER uses a combination of planning techniques including:
• Mixed-initiative planning that involves significant human participation
• Constraint-based - quantitative reasoning with metric time and other numerical quantities

### Planning as State Space Search

• This lecture will consider planning as state-space search, for which there are several options:
• Forward from I (initial state) until G (goal state) is satisfied;
• Backward from G until I is reached;
• Use a Heuristic function to estimate G- or I-distance, respectively – prefer states that have a lower heuristic value.

• It will also introduce partial-order planning as a more flexible approach

• In all cases there are three implicit concerns to consider:
• Representation – how the problem/state space and goal is defined;
• Algorithm – e.g. which (combination) of these approaches is used;
• Complexity and decidability – the feasibility of the approach.

### Planning – basic terminology

• A problem to be solved by a planning system is called a planning problem

• The world in which planning takes place is often called the domain or application domain.

• A state is a point in the search space of the application

• A planning problem is characterized by an initial state and a goal state .

• The initial state is the state of the world just before the plan is executed

• The goal state is the desired state of the world that we want to reach

• after the plan has been executed. A goal state is also refer as simply goal

### STRIPS

• Stanford Research Institute Problem Solver (STRIPS) is an automated planner developed by Richard Fikes and Nils Nilsson in 1971 [Nilsson & Fikes, 1970]

• STRIPS is also the name of the planning language used for the STRIPS planner

### STRIPS

• A state (excluding the goal state) is represented in STRIPS as a conjunction of function-free ground literals

e.g. on(A, Table) ^ on (B, A) ^ ¬ clear(A) ^ clear(B)

(block A is on the Table, block B is on block A, block A has something on top, block B has nothing on top)

• A conjunction of function-free ground literals is also know as a fact

• A goal is represented in STRIPS as a conjunction of literals that may contain variables

e.g. on(B, Table) ^ on (A, B) ^ clear(A) ^ ¬ clear(B)

(block B is on the Table, block A is on block B, block B has something on top, block A has nothing on top)

### STRIPS

Definition – Action

Let P be a set of facts. An action a is a triple a = ( pre(a ), add(a ), del(a )) of subsets of P , where add(a ) \ del(a ) = Ø

pre(a ) , add(a ) , and del(a ) are called the action precondition , add list , and delete list , respectively

• In other words, an action is defined in terms of:
• Preconditions - conjunction of literals that need to be satisfiable before the action is performed
• Effects - conjunction of literals that need to satisfiable after the action is performed. An effect description includes:
• Add List: list of formulas to be added to the description of the new state resulting after the action is performed
• Delete List: list of formulas to be deleted from the description of state before the action is performed

### STRIPS

• The action pickup in the Blocks World domain can be modeled as follows in STRIPS

pickup(x)

Preconditions : on(x, Table) ^ handEmpty ^ clear(x)

Delete List : on(x, Table) ^ handEmpty ^ clear(x)

### STRIPS

A task is a quadruple (P,A,I,G) where P is a (finite) set of facts, A is a (finite) set of actions, and I and G are subsets of P, I being the initial state and G the goal state.

• In other words, a task is defined in terms of:
• A (finite) set of facts
• A (finite) set of actions that the planner can perform. Each action is defined in terms of preconditions and effects (i.e. add list and delete list)
• An initial state i.e. the state of the world before the execution of the task
• A goal state i.e. the desired state of the world that we want to reach

### STRIPS

Definition – Result
Let P be a set of facts, s P a state, and a an action. The result result(s , ‹a› ) of applying a to s is:
result(s , ‹a› ) = ( s add(a)) \ del(a) iff pre(a) s ; undefined otherwise
In the first case, the action a is said to be applicable in s . The result of applying an action sequence a 1 , ... , a n of arbitrary length to s is defined by result(s , a 1 , . . . , a n ) = result(result(s , a 1 , . . . , a n-1 ), a n ) and result(s , ‹› ) = s .

### STRIPS

• In other words, the result of applying an action a in a given state s is a new state of the world that is described by the initial set of formulas describing state s to which the formulas from the add list are added and the formulas from delete list are removed. Action a can be performed only if action’s preconditions are fulfilled in state s
• e.g.
state s = on(A, B) ^ handEmpty ^ clear(A)
action a = unstack (x,y)
Preconditions : on(x, y) ^ handEmpty ^ clear(x)
Add List : holding(x) ^ clear(y)
Delete List : on(x, y) ^ handEmpty ^ clear(x)

### STRIPS

result(s , ‹a› ) = ( s add(a)) \ del(a)
= ( on(A , B) ^ handEmpty ^ clear(A ) holding(x ) ^ clear(y )) \ on(x , y ) ^ handEmpty ^ clear(x )
= holding(A ) ^ clear(B )
Precondition on(x , y ) ^ handEmpty ^ clear(x ) is fulfilled in state s = on(A , B) ^ handEmpty ^ clear(A )

For convenience let’s denote result(s , ‹a› ) with s1

Given a second action a1
action a1 = stack (x,y)
Preconditions : hoding(x) ^ clear(y)
Add List : on(x,y) ^ handEmpty ^ clear(x)
Delete List : holding(x) ^ clear(y)

### STRIPS

result(s , ‹a , a1› ) = result(result(s , a ), ‹a1› ) = result(s1, ‹a1› ) = s1 add(a1)) \ del(a1)

= ((holding (A) ^ clear(B ) holding(x ) ^ on(x,y ) ^ handEmpty ^ clear(x )) \ holding(x ) ^ clear(y )

= on(A,B ) ^ handEmpty ^ clear(A )

Precondition hoding(x ) ^ clear(y ) is fulfilled in state s1 = holding(A ) ^ clear(B )

### STRIPS

• Definition – Plan
Let (P,A,I,G) be a task. An action sequence ‹a1, ... ,an› is a plan for the task iff G ⊆ result(I, ‹a1, ... ,an›).
• In other words, a plan is an organized collection of actions.
• A plan is said to be a solution to a given problem if the plan is applicable in the problem’s initial state, and if after plan execution, the goal is true.
• Assume that there is some action in the plan that must be executed first. The plan is applicable if all the preconditions for the execution of this first action hold in the initial state.
• A task is called solvable if there is a plan, unsolvable otherwise

### STRIPS

Definition – Minimal Plan

Let (P,A,I,G) be a task. A plan ‹a1, . . . ,an› for the task is minimal if ‹a1, . . . , ai-1, ai+1, . . . ,an› is not a plan for any i.

Definition – Optimal Plan

Let (P,A,I,G) be a task. A plan ‹a1, . . . ,an› for the task is optimal if there is no plan with less than n actions

### STRIPS

 Example: “The Block World” Domain: Set of cubic blocks sitting on a table Actions: Blocks can be stacked Can pick up a block and move it to another location Can only pick up one block at a time Goal: To build a specified stack of blocks

### STRIPS

 Representing states and goals Initial State: on(A, Table) ^ on(B, Table) ^ on(C, B) ^ clear(A) ^ clear(C) ^ handEmpty Goal: on(B, Table) ^ on(C, B) ^ on(A, C) ^ clear(A) ^ handEmpty

### STRIPS

• Actions in “Block World”
• pickup(x)
• picks up block ‘ x ’ from table
• putdown(x )
• if holding block ‘ x ’, puts it down on table
• stack(x,y )
• if holding block ‘ x ’, puts it on top of block ‘ y
• unstack(x,y )
• picks up block ‘ x ’ that is currently on block ‘ y
• Action stack(x,y ) in STRIPS
stack (x,y)
Preconditions : hoding(x) ^ clear(y)
Add List : on(x,y) ^ handEmpty ^ clear(x)
Delete List : holding(x) ^ clear(y)

### PDDL

• Planning Domain Description Language (PDDL)
• Based on STRIPS with various extensions
• Created, in its first version, by Drew McDermott and others
• Used in the biennial International Planning Competition (IPC) series
• The representation is lifted, i.e., makes use of variables these are instantiated with objects
• Actions are instantiated operators
• Facts are instantiated predicates
• A task is specified via two files: the domain file and the problem file
• The problem file gives the objects, the initial state, and the goal state
• The domain file gives the predicates and the operators; these may be re-used for different problem files
• The domain file corresponds to the transition system, the problem files constitute instances in that system

### PDDL

Blocks World Example domain file :

(define (domain blocksworld)
(:predicates (clear ?x)
(holding ?x)
(on ?x ?y))
(
:action stack
:parameters (?ob ?underob)
:precondition (and (clear ?underob) (holding ?ob))
:effect (and (holding nil) (on ?ob ?underob)
(not (clear ?underob)) (not (holding ?ob)))
)

### PDDL

Blocks World Example problem file :

(define (problem bw-xy)
(:domain blocksworld)
(:objects nil table a b c d e)
(:init (on a table) (clear a)
(on b table) (clear b)
(on e table) (clear e)
(on c table) (on d c) (clear d)
(holding nil)
)
(:goal (and (on e c) (on c a) (on b d))))

### Algorithms

• We now present particular algorithms related to (direct) state-space search:

• Progression -based search also know as forward search

• Regression -based search also know as backward search

• Heuristic Functions and Informed Search Algorithms

### Algorithms

Definition – Search Scheme

A search scheme is a tuple (S, s0, succ, Sol):

1. the set S of all states s ∈ S,

2. the start state s0 ∈ S,

3. the successor state function succ : S → 2S, and

4. the solution states Sol ⊆ S.

Note:
• Solution paths s0 →, ... , → sn ∈ Sol correspond to solutions to our problem

### Progression

• Definition Progression
Let (P,A,I,G) be a task. Progression is the quadruple (S, s0, succ, Sol):
• S = 2P is the set of all subsets of P,
• s0 = I,
• succ : S → 2S is defined by succ(s) = {s0 ∈ S | ∃a ∈ A: pre(a) ⊆ s, s0 = result(s, ‹a›)}
• Sol = {s ∈ S | G ⊆ s}
• Progression algorithm:
1. Start from initial state
2. Find all actions whose preconditions are true in the initial state (applicable actions)
3. Compute effects of actions to generate successor states
4. Repeat steps 2-3, until a new state satisfies the goal
Progression explores only states that are reachable from I ; they may not be relevant for G

### Progression

Progression in “Blocks World“ – applying the progression algorithm

Step 1:
Initial state s0 = I = on(A, Table) ^ on(B, Table) ^ on(C, B) ^ clear(A) ^ clear(C) ^ handEmpty
Step 2:
Applicable actions: unstack(C,B), pickup(A)
Step 3:
result(s0, ‹unstack(C,B)›) = on(A, Table) ^ on(B, Table) ^ on(C, B) ^ clear(A) ^ clear(C) ^ handEmpty holding(x) ^ clear(y) \ on(x, y) ^ handEmpty ^ clear(x) = on(A, Table) ^ on(B, Table) ^ clear(A) ^ clear(B) ^ holding(C)
result(s0, ‹pickup(A)›) = on(A, Table) ^ on(B, Table) ^ on(C, B) ^ clear(A) ^ clear(C) ^ handEmpty holding(x) \ on(x, Table) ^ handEmpty ^ clear(x) = on(B, Table) ^ on(C, B) ^ clear(C) ^ holding(A)

### Regression

Definition – Regression
Let P be a set of facts, s P , and a an action. The regression regress(s , a) of s through a is:

regress(s , a) = ( s \ add(a )) pre(a ) if add(a) s Ø, del(a ) s = Ø ; undefined otherwise

In the first case, s is said to be regressable through a.
• If s \ add(a ) = Ø ; then a contributes nothing
• If s \ del(a ) Ø ; then we can’t use a to achieve p s p

### Regression

Regression algorithm:

2. Choose an action that will result in the goal

3. Replace that goal with the action’s preconditions

4. Repeat steps 2-3 until you have reached the initial state

• Regression explores only states that are relevant for G; they may not be reachable from I
• Regression is not as non-natural as you might think: if you plan a trip to Thailand, do you start with thinking about the train to Frankfurt?

### Regression

Regression in “Blocks World“ – applying the regression algorithm
Step 1:
Goal state s n = G = on(B, Table) ^ on(C, B) ^ on(A, C) ^ clear(A) ^ handEmpty
Step 2:
Applicable actions: stack(A,C)
Step 3:
regress(G, stack(A,C)) = G \ add(stack ) pre(stack ) = on(B, Table) ^ on(C, B) ^ on(A, C) ^ clear(A) ^ handEmpty \ on(A,C) ^ handEmpty ^ clear(A) hoding(A) ^ clear(C) = on(B, Table) ^ on(C, B) ^ holding(A) ^ clear(C)

### Heuristic Functions

Definition – Heuristic Function

Let (S, s 0 , succ , Sol) be a search scheme. A heuristic function is a function h : S → N 0 { } from states into the natural numbers including 0 and the symbol. Heuristic functions , or heuristics , are efficiently computable functions that estimate a state’s “solution distance”.

### Heuristic Functions

A heuristic function for Blocks World domain could be:

h1(s) = n sum(b(i )) , i =1… n , where

b(i )=0 in state s if the block i is resting on a wrong thing

b(i )=1 in state s if the block i is resting on the thing it is supposed to be resting on;

n is the total number of blocks

h1(I) = 3 – 1 (because of block B) – 0 (because of block C) – 0 (because of block A) = 2

h1(G) = 3 – 1 (because of block B) – 1 (because of block C) – 1 (because of block A) = 0

### Heuristic Functions

Definition – Solution Distance

Let (S, s 0 , succ , Sol) be a search scheme. The solution distance sd(s ) of a state s ∈ S is the length of a shortest succ path from s to a state in Sol , or 1 if there is no such path. Computing the real solution distance is as hard as solving the problem itself.

### Heuristic Functions

In Progression search:
sd(I ) = 4, sd(result(I , pickup(C) )) = 3, sd(result(result(I , pickup(C) )), ‹stack (C,B) ) = 2, … for all other s ∈ succ(I)

In Regression search:
sd(G ) = 4, sd(regress(G , stack (A,C) )) = 3, sd(regress(regress(G , stack (A,C) )), pickup (A ) ) = 2, …

### Heuristic Functions

• Definition – Admissible Heuristic Function

Let (S, s0 , succ , Sol) be a search scheme. A heuristic function h is admissible if h(s ) sd(s ) for all s S .

• In other words, an admissible heuristic function is a heuristic function that never overestimates the actual cost i.e. the solution distance.

• e.g. h1 heuristic function defined below is admissible

h1(s) = n sum(b(i )) , i =1… n , where

• b(i )=0 in state s if the block i is resting on a wrong thing

• b(i )=1 in state s if the block i is resting on the thing it is supposed to be resting on;

• n is the total number of blocks

### Complexity and Decidability

• Its time complexity : in the worst case, how many search states are expanded?

• Its space complexity : in the worst case, how many search states are kept in the open list at any point in time?

• Is it complete , i.e. is it guaranteed to find a solution if there is one?

• Is it optimal , i.e. is it guaranteed to find an optimal solution?

### Complexity and Decidability

Definition - PLANSAT

Let PLANSAT denote the following decision problem. Given a STRIPS task (P, A, I, G), is the task solvable?

Theorem

PLANSAT is PSPACE-complete. Proof in see [Bylander, 1994]

Definition – Bounded- PLANSAT

Let Bounded-PLANSAT denote the following decision problem. Given a STRIPS task (P, A, I, G) and an integer b . Is there a plan with at most b actions?

### Progression

Progression in “Blocks World“ – applying the progression algorithm
Step1:
Initial state s 0 = I = on(A, Table) ^ on(B, Table) ^ on(C, Table) ^ on(D, C) ^ clear(A) ^ clear(B) ^ clear(D) ^ handEmpty
Step 2:
Applicable actions: pickup(A), pickup(B), unstack(D,C)
Step 3:
s11 = result(s 0 , ‹pickup(A)›) = on(A, Table) ^ on(B, Table) ^ on(C, Table) ^ on(D, C) ^ clear(A) ^ clear(B) ^ clear(D) ^ handEmpty holding(x) \ on(x, Table) ^ handEmpty ^ clear(x) = on(B, Table) ^ on(C, Table) ^ on(D, C) ^ clear(B) ^ clear(D) ^ holding(A)

### Progression

s12 = result(s0, ‹pickup(B)›) = on(A, Table) ^ on(B, Table) ^ on(C, Table) ^ on(D, C) ^ clear(A) ^ clear(B) ^ clear(D) ^ handEmpty holding(x) \ on(x, Table) ^ handEmpty ^ clear(x) = on(A, Table) ^ on(C, Table) ^ on(D, C) ^ clear(A) ^ clear(D) ^ holding(B)

s13 = result(s0, ‹unstack(D,C)›) = on(A, Table) ^ on(B, Table) ^ on(C, Table) ^ on(D, C) ^ clear(A) ^ clear(B) ^ clear(D) ^ handEmpty holding(x) ^ clear(y) \ on(x, y) ^ handEmpty ^ clear(x) = on(A, Table) ^ on(B, Table) ^ on(C, Table) ^ clear(A) ^ clear(B) ^ clear(C) ^ holding(D)

### Progression

2 nd iteration
Step1:
s11 = on(B, Table) ^ on(C, Table) ^ on(D, C) ^ clear(B) ^ clear(D) ^ holding(A)
Step 2:
Applicable actions: putdown (A) , stack(A,D)
Step 3:
s111 = result(s11, ‹putdown (A)›) = on(B, Table) ^ on(C, Table) ^ on(D, C) ^ clear(B) ^ clear(D) ^ holding(A) on(x, Table) ^ handEmpty ^ clear(x) \ holding(x) = on(A, Table) ^ on(B, Table) ^ on(C, Table) ^ on(D, C) ^ clear(A) ^ clear(B) ^ clear(D) ^ handEmpty
we are back in the initial state!

### Progression

3 rd iteration
Step1:
s11 = on(B, Table) ^ on(C, Table) ^ on(D, C) ^ clear(B) ^ clear(D) ^ holding(A)
Step 2:
Applicable actions: putdown (A), stack(A,D )
Step 3:
s112 = result(s11, ‹stack (A,D)›) = on(B, Table) ^ on(C, Table) ^ on(D, C) ^ clear(B) ^ clear(D) ^ holding(A) on(x, y) ^ handEmpty ^ clear(x) \ holding(x) ^ clear(y)= on(B, Table) ^ on(C, Table) ^ on(D, C) ^ on(A,D) ^ handEmpty ^ clear(A) ^ clear(B)

### Progression

4 th iteration
Step1:
s12 = on(A, Table) on(C, Table) on(D, C) ^ clear(A) ^ clear(D) ^ holding(B)
Step 2:
Applicable actions: putdown (B), stack(B,D)
Step 3:
s121 = result(s12, 〈putdown (B)〉) = on(A, Table) on(C, Table) on (D, C) ^ clear(A) ^ clear(D) ^ holding(B) ∪ on(x, Table) ^ handEmpty ^ clear(x) \ holding(x) = on(A, Table) on(B, Table) on (C, Table) on(D, C) ^ clear(A) ^ clear(B) ^ clear(D) ^ handEmpty

we are back in the initial state!

### Progression

5 th iteration
Step1:
s12 = on(A, Table) on(C, Table) on(D, C) ^ clear(A) ^ clear(D) ^ holding(B)
Step 2:
Applicable actions: putdown (B), stack(B,D)
Step 3:
s122 = result(s11, 〈stack (B,D)〉) = on(A, Table) on(C, Table) on(D, C) ^ clear(A) ^ clear(D) ^ holding(B) ∪ on(x, y) ^ handEmpty ^ clear(x) \ holding(x) ^ clear(y)= on(A, Table) on(C, Table) on(D, C) on(B,D) ^ handEmpty ^ clear(A) ^ clear(B)

### Progression

6 th iteration
Step1:
s13 = on(A, Table) on(B, Table) on(C, Table) ^ clear(A) ^ clear(B) ^ clear(C) ^ holding(D)
Step 2:
Applicable actions: putdown(D), stack(D,A), stack(D,B), stack(D,C)
Step 3:
s131 = result(s13, 〈putdown(D)〉) = on(A, Table) on(B, Table) on(C, Table) ^ clear(A) ^ clear(B) ^ clear(C) ^ holding(D) ∪ on(x, Table) ^ handEmpty ^ clear(x) \ holding(x) = on(A, Table) on(B, Table) on (C, Table) on(D, Table) ^ clear(A) ^ clear(B) ^ clear(C) ^ clear(D) ^ handEmpty
In iterations 7,8,9, new states are obtained from s13 by applying the actions stack(D,A), stack(D,B), stack(D,C)

### Progression

10 th iteration
Step1:
s131 = on(A, Table) on(B, Table) on(C, Table) on(D, Table) ^ clear(A) ^ clear(B) ^ clear(C) ^ clear(D) ^ handEmpty
Step 2:
Applicable actions: pickup(A), pickup(B), pickup(C) , pickup(D)
Step 3:
s 1311 = result(s 131 , 〈putdown(C)〉) = on(A, Table) on(B, Table) on (C, Table) on(D, Table) ^ clear(A) ^ clear(B) ^ clear(C) ^ clear(D) ^ handEmpty ∪ holding(x) \ on(x, Table) ^ clear(x) ^ handEmpty = on (A, Table) on(B, Table) on(D, Table) ^ clear(A) ^ clear(B) ^ clear (D) ^ holding(C)

### Progression

n th iteration
Step1:
s n-1 = on(A, Table) on(D, Table) on(B,C) ^ clear(A) ^ clear(B) ^ holding(C)
Step 2:
Applicable actions: putdown(C), stack(C,A) , stack(C,B)
Step 3:
s n = result(s n-1 , 〈stack(C,A)〉) = on(A, Table) on(D, Table) on (B,C) ^ clear(A) ^ clear(B) ^ holding(C) ∪ on(x,y) ^ clear(x) ^ handEmpty \ clear(y) ^ holding(x) = on(A, Table) on(D, Table) on (C, A) on(B,D) ^ clear(C) ^ clear(B) ^ handEmpty

s n is the goal state G

### Regression

Regression in “Blocks World“ – applying the regression algorithm
1 th iteration
Step1:
Goal state sn = G = on(A, Table) ôn(D, Table) ôn(C, A) ôn(B,D) ^ clear(C) ^ clear(B) ^ handEmpty
Step 2:
Applicable actions: stack(C,A), stack(B,D)

### Regression

Step 3:

sn-1=regress(G, stack(C,A)) = G \ add(stack ) pre(stack ) = on(A, Table) ^ on(D, Table) ^ on(C, A) ^ on(B,D) ^ clear(C) ^ clear(B) ^ handEmpty \ on(C,A) ^ handEmpty ^ clear(C) holding(C) ^ clear(A) = on(A, Table) ^ on(D, Table) ^ on(B, D) ^ clear(B) ^ clear(A) ^ holding(C)

sn-2=regress(G, stack(B,D)) = G \ add(stack ) pre(stack ) = on(A, Table) ^ on(D, Table) ^ on(C, A) ^ on(B,D) ^ clear(C) ^ clear(B) ^ handEmpty \ on(B,D) ^ handEmpty ^ clear(B) holding(B) ^ clear(D) = on(A, Table) ^ on(D, Table) ^ on(C, A) ^ clear(C) ^ clear(D) ^ holding(B)

### Regression

2 nd iteration
Step1:
s n-1 = on(A, Table) on(D, Table) on(B, D) ^ clear(B) ^ clear(A) ^ holding (C)
Step 2:
Applicable actions: pickup(C) , unstack(C,A)
Step 3:
s n-2 =regress(s n-1 , pickup(C)) = s n-1 \ add(pickup) ∪ pre(pickup) =on(A, Table) on(D, Table) on(B, D) ^ clear(B) ^ clear(A) ^ holding(C) \ holding (C) ∪ clear(C) on(C, Table) ^ handEmpty = on(A, Table) on(D, Table) ^ on(B, D) ^ clear(B) ^ clear(A) ^ clear (C) on(C, Table) ^ handEmpty

### Regression

3 rd iteration
Step1:
sn-1 = on(A, Table) on(D, Table) on(B, D) ^ clear(B) ^ clear(A) ^ holding (C)
Step 2:
Applicable actions: pickup(C), unstack(C,A)
Step 3:
sn-3=regress(sn-1, unstack(C,A)) = sn-1 \ add(unstack) ∪ pre(unstack) = on (A, Table) on(D, Table) on(B, D) ^ clear(B) ^ clear(A) ^ holding(C) \ holding(C) ^ clear(A) ∪ clear(C) on(C, A) ^ handEmpty = on(A, Table) ^ on(D, Table) on(B, D) ^ clear(B) ^ clear (C) on(C, A)
we are back in the goal state!

### Regression

4 th iteration
Step1:
sn-2 = on(A, Table) on(D, Table) on(B, D) ^ clear(B) ^ clear(A) ^ clear (C) on(C, Table) ^ handEmpty
Step 2:
Applicable actions: putdown(C), putdown(A), stack(B,D)
Step 3:
sn-4=regress(sn-2, stack(B,D)) = sn-2 \ add(stack) ∪ pre(stack) = on(A, Table) ^ on(D, Table) on(B, D) ^ clear(B) ^ clear(A) ^ clear (C) on(C, Table) ^ handEmpty \ on(B,D) ^ clear(B) ^ handEmpty ∪ clear(D) ^ holding(B) = on (A, Table) on(D, Table) on(C, Table) ^ clear(A) ^ clear (C) ^ clear(D) ^ holding(B)

### Regression

n th iteration
Step1:
s1 = on(A, Table) on(B, Table) on(C, Table) ^ clear(A) ^ clear(B) ^ clear (C) ^ holding(D)
Step 2:
Applicable actions: pickup(D), unstack(D,C)
Step 3:
s0=regress(s1, unstack(D,C)) = s1 \ add(unstack) ∪ pre(unstack) =on(A, Table) on(B, Table) on(C, Table) ^ clear(A) ^ clear(B) ^ clear (C) ^ holding(D) \ holding(D) ^ clear(C) ∪ on(D,C) ^ clear(D) ^ handEmpty = on (A, Table) on(B, Table) on(C, Table) on(D, C) ^ clear(A) ^ clear(B) ^ clear(D) ^ handEmpty

### Partial-Order Planning

• Forward and backward state-space search are both forms of totally-ordered plan search – explore linear sequences of actions from initial state or goal.

• As such they cannot accomodate problem decomposition and work on subgraphs separately.

• A partial-order planner depends on:
• A set of ordering constraints , wherein A ≺ B means that A is executed some time before B (not necessarily directly before)
• A set of causal links , A –p-> B, meaning A achieves p for B
• A set of open preconditions not achieved by any action in the current plan

### Partial-Order Plans

 Partial Order Plan: Total Order Plans:

### Constraint-Based Search

• Compared with a classical search procedure, viewing a problem as one of constraint satisfaction can reduce substantially the amount of search.

• Constrain-based search involved complex numerical or symbolic variable and state constraints

• Constraint-Based Search:

1. Constraints are discovered and propagated as far as possible.

2. If there is still not a solution, then search begins, adding new constraints.

### Constraint-Based Search

• Constrain the task by a time horizon, b

• Formulate this as a constraint satisfaction problem and use backtracking methods to solve it: “do x at time t yes/no”, “do y at time t yes/no”,

• If there is no solution, increment b

• Inside backtracking : constraint propagation, conflict analysis are used

• Constraint-Based Searcg is an “undirected” search i.e. there is no fixed time-order to the decisions done by backtracking

### Planning Summary

• Planning is a flexible approach for taking complex decisions:
• the problem is described to the planning system in some generic language;
• a (good) solution is found fully automatically;
• if the problem changes, all that needs to be done is to change the description.
• Following concerns must be addressed:
• Syntax - we have examined representations in PDDL and STRIPS
• Semantics – we have examined progression, regression and the use of heuristics in algorithms
• Complexity and decidability

### References

• D. McDermott. “PDDL – the planning domain definition language”, Technical Report CVC TR-98-003/DCS TR-1165, Yale Center for Computational Vision and Control, 1998.
• N. Nilsson and R. Fikes. “STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving”, SRI Technical Note 43, 1970. http://www.ai.sri.com/pubs/files/tn043r-fikes71.pdf
• S. Russell and P. Norvig. “AI: A Modern Approach” (2nd Edition), Prentice Hall, 2002
• G. Malik; D.S. Nau, P. Traverso (2004), Automated Planning: Theory and Practice, Morgan Kaufmann, ISBN 1-55860-856-7
• T .Bylander,The Computationa lComplexity of Propositional STRIPS Planning, Artificial Intelligence Journal, 1994
• http://en.wikipedia.org/wiki/Automated_planning_and_scheduling
• http://en.wikipedia.org/wiki/STRIPS
• http://en.wikipedia.org/wiki/Planning_Domain_Definition_Language

### Agenda

1. Motivation
2. Technical solution and illustrations
• Definitions
• Properties
• Environments
• Agents as intentional systems
• Abstract architecture
• Multi-agent systems
3. Large example: Robocup
4. Extensions
5. Summary
6. References

### Motivation

• Do tasks that normally human user needs to do automatically.
• Agents are supposed to carry out (probably complex) tasks autonomously.
• Idea: “intelligent agents” react to changes and act in order to reach specific goals.

### Motivation

• The agent perceives the environment via sensors and influences it through actuators.
• Agents can carry out tasks autonomously and react to changes in its environment.

### Motivating example

 Assuming the tasks a taxi driver has to complete:   Perception: Camera, speedometer, GPS Actions: steer, change gear, break, talk to guest Goals: Safe, fast, legal, comfortable ride, maximize profit Environment: streets, other actors, guests

### Definition

• There are many different definitions from various areas, such as software engineering, classical logics, logic programming, robotic:
• Genesereth/Ketchpel: A program is a software agent if it communicates correctly in an agent language, such as ACL (Agent Communication Language) or KQML (Knowledge Query and Manipulation Language).
• BDI (Belief, Desire and Intentions) agents are described by their believes, desires, and intentions. This complies with three modalities of a complex modal logic, which can be found in the data structures of the system.

### Definition

• Kowalski follows a traditional approach of logic based agents and uses logic programming for the implementation of agents.
• Shoham‘s definition is more focused: A hard- or software is an agent if one analyses it with the help of mental terms.
• Wooldridge/Jennings consider hard- or software an agent if it is:
• autonomous (independently follows its goals)
• social (is cooperating with a human or other agents)
• pro-active (takes initiative) und
• reactive (perceives its environment and reacts to changes).
• An agent is a computer system capable of autonomous action in some environment in order to meet its design objectives.

### Definition

• Autonomous entities that perceive their environment and act upon it.
• Autonomy is the ability to control their own behavior and act without human intervention.
• Agents pursue goals in a such a way as to optimize some given performance measure
• They operate flexibly and rationally in a variety of circumstances
• Does NOT include omniscience, omnipotence, or perfection

### Properties

1. Interaction
2. Reactivity
3. Proactiveness
4. Balancing Reactive and Goal-Oriented Behavior
5. Social Ability
6. Mobility
7. Veracity
8. Benevolence
9. Rationality

### Interaction

• Agents may be affected by other agents (including humans) in pursuing their goals
• May take place directly via a communication language
• May take place indirectly via the environment
• Agents sense the actions of other agents and react accordingly

### Reactivity

• If a program’s environment is guaranteed to be fixed, the program need never worry about its own success or failure – program just executes blindly
• Example of fixed environment: compiler
• The real world is not like that: things change, information is incomplete. Many (most?) interesting environments are dynamic
• Software is hard to build for dynamic domains: program must take into account possibility of failure – ask itself whether it is worth executing!
• A reactive system is one that maintains an ongoing interaction with its environment, and responds to changes that occur in it (in time for the response to be useful)

### Proactiveness

• Reacting to an environment is easy (e.g., stimulus → response rules)
• But we generally want agents to do things for us
• Hence goal directed behavior
• Pro-activeness = generating and attempting to achieve goals; not driven solely by events; taking the initiative
• Recognizing opportunities

### Balancing Reactive and Goal-Oriented Behavior

• We want our agents to be reactive, responding to changing conditions in an appropriate (timely) fashion
• We want our agents to systematically work towards long-term goals
• These two considerations can be at odds with one another
• Designing an agent that can balance the two remains an open research problem

### Social Ability

• The real world is a multi-agent environment: we cannot go around attempting to achieve goals without taking others into account
• Some goals can only be achieved with the cooperation of others
• Similarly for many computer environments: witness the Internet
• Social ability in agents is the ability to interact with other agents (and possibly humans) via some kind of agent-communication language, and perhaps cooperate with others

### Other Properties

Other properties, sometimes discussed in the context of agents:

• mobility : the ability of an agent to move around an electronic network
• veracity : an agent will not knowingly communicate false information
• benevolence : agents do not have conflicting goals, and that every agent will therefore always try to do what is asked of it
• rationality : agent will act in order to achieve its goals, and will not act in such a way as to prevent its goals being achieved — at least insofar as its beliefs permit
• learning/adaption : agents improve performance over time

### Environments – Accessible vs. inaccessible

• An accessible environment is one in which the agent can obtain complete, accurate, up-to-date information about the environment’s state
• Most moderately complex environments (including, for example, the everyday physical world and the Internet) are inaccessible
• The more accessible an environment is, the simpler it is to build agents to operate in it

### Environments –Deterministic vs. non-deterministic

• A deterministic environment is one in which any action has a single guaranteed effect — there is no uncertainty about the state that will result from performing an action
• The physical world can to all intents and purposes be regarded as non-deterministic
• Non-deterministic environments present greater problems for the agent designer

### Environments - Episodic vs. non-episodic

• In an episodic environment, the performance of an agent is dependent on a number of discrete episodes, with no link between the performance of an agent in different scenarios
• Episodic environments are simpler from the agent developer’s perspective because the agent can decide what action to perform based only on the current episode — it need not reason about the interactions between this and future episodes

### Environments - Static vs. dynamic

• A static environment is one that can be assumed to remain unchanged except by the performance of actions by the agent
• A dynamic environment is one that has other processes operating on it, and which hence changes in ways beyond the agent’s control
• Other processes can interfere with the agent’s actions (as in concurrent systems theory)
• The physical world is a highly dynamic environment

### Environments – Discrete vs. continuous

• An environment is discrete if there are a fixed, finite number of actions and percepts in it
• Russell and Norvig give a chess game as an example of a discrete environment, and taxi driving as an example of a continuous one
• Continuous environments have a certain level of mismatch with computer systems
• Discrete environments could in principle be handled by a kind of “lookup table”

### Agents as Intentional Systems

• When explaining human activity, it is often useful to make statements such as the following:
• Janine took her umbrella because she believed it was going to rain.
• Michael worked hard because he wanted to possess a PhD.
• These statements make use of a folk psychology, by which human behavior is predicted and explained through the attribution of attitudes, such as believing and wanting (as in the above examples), hoping, fearing, and so on.
• The attitudes employed in such folk psychological descriptions are called the intentional notions.

### Agents as Intentional Systems

• The philosopher Daniel Dennett coined the term intentional system to describe entities ‘whose behavior can be predicted by the method of attributing belief, desires and rational acumen’
• Dennett identifies different ‘grades’ of intentional system:‘A first-order intentional system has beliefs and desires (etc.) but no beliefs and desires about beliefs and desires. …A second-order intentional system is more sophisticated; it has beliefs and desires (and no doubt other intentional states) about beliefs and desires (and other intentional states) — both those of others and its own’

### Agents as Intentional Systems

• The intentional notions are thus abstraction tools, which provide us with a convenient and familiar way of describing, explaining, and predicting the behavior of complex systems
• Remember: most important developments in computing are based on new abstractions:
• procedural abstraction
• abstract data types
• objects
• Agents, and agents as intentional systems, represent a further, and increasingly powerful abstraction
• So agent theorists start from the (strong) view of agents as intentional systems: one whose simplest consistent description requires the intentional stance

### Agents as Intentional Systems

• This intentional stance is an abstraction tool — a convenient way of talking about complex systems, which allows us to predict and explain their behavior without having to understand how the mechanism actually works
• Now, much of computer science is concerned with looking for abstraction mechanisms (witness procedural abstraction, ADTs, objects,…)
• So why not use the intentional stance as an abstraction tool in computing — to explain, understand, and, crucially, program computer systems?
• This is an important argument in favor of agents

### Abstract Architecture for Agents

• Assume the environment may be in any of a finite set E of discrete, instantaneous states:
• Agents are assumed to have a repertoire of possible actions available to them, which transform the state of the environment:
• A run , r , of an agent in an environment is a sequence of interleaved environment states and actions:

### Abstract Architecture for Agents

• Let:
• R be the set of all such possible finite sequences (over E and Ac )
• R Ac be the subset of these that end with an action
• R E be the subset of these that end with an environment state

### State Transformer Functions

• A state transformer function represents behavior of the environment:
• Note that environments are…
• history dependent
• non-deterministic
• If τ( r )=∅, then there are no possible successor states to r . In this case, we say that the system has ended its run
• Formally, we say an environment Env is a triple Env = ⟨ E , e0 , τ ⟩ where: E is a set of environment states, e0 E is the initial state, and τ is a state transformer function

### Agents

• Agent is a function which maps runs to actions:
• An agent makes a decision about what action to perform based on the history of the system that it has witnessed to date. Let AG be the set of all agents

### Systems

• A system is a pair containing an agent and an environment
• Any system will have associated with it a set of possible runs; we denote the set of runs of agent Ag in environment Env by R (Ag, Env)
• (We assume R (Ag, Env) contains only terminated runs)

### Systems

• Formally, a sequence
represents a run of an agent Ag in environment Env =⟨ E , e0 ,τ ⟩ if:
1. e0 is the initial state of Env
2. α0 = Ag ( e0 ); and
3. For u > 0,

### Purely Reactive Agents

• Some agents decide what to do without reference to their history — they base their decision making entirely on the present, with no reference at all to the past
• We call such agents purely reactive :
• A thermostat is a purely reactive agent

### Perception

• Now introduce perception system:

### Perception

• The see function is the agent’s ability to observe its environment, whereas the action function represents the agent’s decision making process
• Output of the see function is a percept :
• see : E Per
which maps environment states to percepts, and action is now a function
action : Per* A
which maps sequences of percepts to actions

### Agents with State

• We now consider agents that maintain state:

### Agents with State

• These agents have some internal data structure, which is typically used to record information about the environment state and history.Let I be the set of all internal states of the agent.
• The perception function see for a state-based agent is unchanged:
• see : E Per
The action-selection function action is now defined as a mapping
action : I Ac
from internal states to actions. An additional function next is introduced, which maps an internal state and percept to an internal state:
next : I × Per I

### Agent Control Loop

1. Agent starts in some initial internal state i0
2. Observes its environment state e , and generates a percept see ( e )
3. Internal state of the agent is then updated via next function, becoming next ( i0 , see ( e ))
4. The action selected by the agent is action ( next ( i0 , see ( e )))
5. Goto 2

• We build agents in order to carry out tasks for us
• The task must be specified by us…
• But we want to tell agents what to do without telling them how to do it

### Utility Functions over States

• One possibility: associate utilities with individual states — the task of the agent is then to bring about states that maximize utility
• A task specification is a function
• u : E → ◊
which associates a real number with every environment state

### Utility Functions over States

• But what is the value of a run…
• minimum utility of state on run?
• maximum utility of state on run?
• sum of utilities of states on run?
• average?
• Disadvantage: difficult to specify a long term view when assigning utilities to individual states (One possibility: a discount for states later on.)

### Utilities over Runs

• Another possibility: assigns a utility not to individual states, but to runs themselves:
• u : R → ◊
• Such an approach takes an inherently long term view
• Other variations: incorporate probabilities of different states emerging
• Difficulties with utility-based approaches:
• where do the numbers come from?
• we don’t think in terms of utilities!
• hard to formulate tasks in these terms

### Expected Utility & Optimal Agents

• Write P ( r | Ag , Env ) to denote probability that run r occurs when agent Ag is placed in environment Env

Note:
• Then optimal agent Ag opt in an environment Env is the one that maximizes expected utility :

### Bounded Optimal Agents

• Some agents cannot be implemented on some computers
(A function Ag : R E Ac may need more than available memory to implement)
• Write AG m to denote the agents that can be implemented on machine (computer) m :
• We can replace equation (1) with the following, which defines the bounded optimal agent Ag opt :

• A special case of assigning utilities to histories is to assign 0 (false) or 1 (true) to a run
• If a run is assigned 1, then the agent succeeds on that run, otherwise it fails
• Call these predicate task specifications
• Denote predicate task specification by Ψ.
Thus Ψ : R → {0, 1}.

• A task environment is a pair ⟨ Env , Ψ ⟩ where Env is an environment,
• Ψ : R → {0, 1}
is a predicate over runs.Let TE be the set of all task environments.
• the properties of the system the agent will inhabit
• the criteria by which an agent will be judged to have either failed or succeeded

• Write R Ψ ( Ag , Env ) to denote set of all runs of the agent Ag in environment Env that satisfy Ψ:
• We then say that an agent Ag succeeds in task environment ⟨ Env , Ψ ⟩ if

1. Achievement tasks are those of the form “achieve state of affairs Φ”
2. Maintenance tasks are those of the form “maintain state of affairs Ψ”

• An achievement task is specified by a set G of “good” or “goal” states: G E

The agent succeeds if it is guaranteed to bring about at least one of these states (we do not care which one — they are all considered equally good).

• A maintenance goal is specified by a set B of “bad” states: B E

The agent succeeds in a particular environment if it manages to avoid all states in B — if it never performs actions which result in any state in B occurring

### Agent Synthesis

• Agent synthesis is automatic programming: goal is to have a program that will take a task environment, and from this task environment automatically generate an agent that succeeds in this environment:
(Think of ⊥ as being like null in Java.)
• Synthesis algorithm is:
• sound if, whenever it returns an agent, then this agent succeeds in the task environment that is passed as input
• complete if it is guaranteed to return an agent whenever there exists an agent that will succeed in the task environment given as input

### Agent Synthesis

• Synthesis algorithm syn is sound if it satisfies the following condition:
and complete if:

### Multi Agent Systems

• Several agents coordinate their knowledge and activities by reasoning about the problem solving process
• Distributed Problem Solving
• A particular problem is solved by dividing tasks among a number of generally equivalent nodes who divide and share knowledge about the problem
• Modern multiagent systems actually cover both

### Characteristics of Multiagent Systems

• Each agent has incomplete information
• Control is decentralized
• Data is decentralized
• Computation is asynchronous

### Diversity of Multiagent Systems

 Agents Number Uniformity Goals Architecture Abilities (sensor & effectors) Interaction Frequency Persistence Level Pattern (flow of control) Variability Purpose Environment Predictability Accessibility Dynamics Diversity Availability of resources

### Common Application Characteristics

• Inherent Distribution
• Geographically
• Temporally
• Semantics – requires different ontologies and languages
• Functional – requires different cognitive capabilities
• Inherent Complexity
• Too large to be solved by single, centralized system

### Properties of Multiagent Systems

• Speed & Efficiency

• Robustness & Reliability

• Scalability & Flexibility

• Cost

• Distributed Development

• Reusability

### Challenging Issues in Multi-Agent Systems

• When and how should agents interact – cooperate and compete – to successfully meet their design objectives?
• Two approaches
• Bottom up – search for specific agent-level capabilities that result in sufficient group capabilities
• Top down – search for group-level conventions that appropriately constrain interaction at the agent level
• Leads to several interesting issues …

### Challenging Issues in Multi-Agent Systems

1. How to enable agents to decompose their tasks and goals (and allocate sub-goals and sub-tasks to other agents) and synthesize partial results

2. How to enable agents to communicate, what languages and protocols to use

3. How to enable agents to represent and reason about the actions, plans, and knowledge of other agents in order to interact with them

### Challenging Issues in Multi-Agent Systems

1. How to enable agents to represent and reason about the state of their interactions

2. How to enable agents to recognize and handle conflicts between agents

3. How to engineer practical multiagent systems

### Challenging Issues in Multi-Agent Systems

1. How to effectively balance local computational versus communication

2. How to avoid or mitigate harmful (chaotic or oscillatory) system wide behavior

3. How to enable agents to negotiate and contract with each other

4. How to form and dissolve organizational structures to meet specific goals and objectives

### Challenging Issues in Multi-Agent Systems

1. How to formally describe multiagent systems and the interaction between agents and how to ensure multiagent systems are correctly specified

2. How to realize intelligent processes such as problem solving, planning, decision making, and learning in a multiagent systems context

### Applications of Multiagent Systems

• Electronic commerce

• Real-time monitoring and control of networks

• Modeling and control of transportation systems

• Information handling

• Automatic meeting scheduling

### Applications of Multiagent Systems (cont.)

• Industrial manufacturing and production

• Electronic entertainment

• Re-engineering of information flow in large organizations

• Investigation of complex social phenomena such as evolution of roles, norms, and organizational structures

### Overview of RoboCup

• The Robot World Cup Initiative
• “By the year 2050, develop a team of fully autonomous humanoid robots that can win against the human world soccer champion team.”
www.robocup.org
• A standard problem for AI research
• Started in 1992 as the Robot J-League (Japan)
• First games and conferences in 1997
• Workshops, conferences and yearly competition
• Check out this YouTube video:
(Graz 2009)

### Robocup Leagues: humanoid

 In the humanoid league, autonomous robots with a human-like body plan and human-like senses play soccer against each other. In addition to soccer competitions technical challenges take place. Research issues addressed in the humanoid league include: Walking Running Kicking the ball while maintaining balance Visual perception of the ball, other players, and the field Self-localization, Team play Several of the best autonomous humanoid robots in the world compete in the RoboCup Humanoid League. http://www.tzi.de/humanoid/bin/view/Website/WebHome

### Robocup Leagues: small size

• A Small Size robot soccer game takes place between two teams of five robots each.
• Each robot must conform to the dimensions as specified in the F180 rules:
• The robot must fit within an 180mm diameter circle.
• It must be no higher than 15cm unless they use on-board vision.
• Robots come in two flavours, those with local on-board vision sensors and those with global vision.
• Global vision robots use an overhead camera and off-field PC to identify and track the robots as they move around the field.
• Local vision robots have their sensing on the robot itself.
• The vision information is either processed on-board the robot or is transmitted back to the off-field PC for processing.
• Communications is wireless and typically uses dedicated commercial FM transmitter/receiver units.

### Robocup Leagues: middle size

• The middle size league involves robots that must fit inside a 50 cm x 50 cm x 80 cm box.
• The maximum weight of the robot is 40 kg.
• Two teams of 5 mid-sized robots with all sensors on-board play soccer on a field.
• Communications is wireless and typically uses dedicated commercial FM transmitter/receiver units.

### Robocup Leagues: simulation

 The simulation league does not involve real robots but simulates their behavior on a screen. It consists of a number of competitions with computer simulated soccer matches as the main event. There are no actual robots in this league but spectators can watch the action on a large screen, which looks like a giant computer game. Each simulated robot player may have its own play strategy and characteristic and every simulated team actually consists of a collection of programs. Many computers are networked together in order for this competition to take place. The games last for about 10 minutes, with each half being 5 minutes duration.

### Challenges involved in Robocup Simulation League

 Dynamic behavior of agents (players) must be enabled. There are limited means of communication because of limited shared bandwidth which might lead to delays. Players must be able to perform real-time actions. There is a tremendous state space. The multi-agent system must allow cooperation and competition.

### Robocup System – Overview for simulation league

• Provides a platform to develop software techniques, without the necessity of creating physical robots.
• Consists of three main applications:
• Soccer Server
• Soccer Monitor
• Soccer Player(s) - agents

### Robocup System – Soccer Server

• A system allowing several autonomous program agents to play a virtual soccer game.

• The games are carried out in a client/server style – where each client = one player.

• The communication can be done over a network connection, or even the Internet.

### Robocup System – Monitor

• Used to display the visual progress of the game.

• Several Monitors can be connected to the server.

• It can also be used to interrupt game play by doing simple tasks such as dropping a ball.

### Robocup System – Soccer Team - players

• The agents are the brains of the players.

• Players receive sensory information from the server, upon which decisions are made.

• Commands are formatted and sent to the server.

### Robocup System – Team – coaches

• Privileged clients used to provide assistance.
• Receives noise-free view of the whole field.
• Can only send occasional messages to players (info, advice, freeform, etc.).
• Used for
• opponent modelling,
• game analysis,
• giving strategic tips to teammates.

### Some RoboCup Teams

 UvA Trilearn (Amsterdam) (2003 champion) CMUnited (Carnegie Mellon) Everest (China) FC Portugal 2003 (Portugal) HELIOS (Japan) Magma Furtwangen (Germany)

### Agents and Objects

• Are agents just objects by another name?
• Object:
• encapsulates some state
• communicates via message passing
• has methods, corresponding to operations that may be performed on this state

### Agents and Objects

• Main differences:
• agents are autonomous:
agents embody stronger notion of autonomy than objects, and in particular, they decide for themselves whether or not to perform an action on request from another agent
• agents are smart:
capable of flexible (reactive, pro-active, social) behavior, and the standard object model has nothing to say about such types of behavior
• agents are active:
a multi-agent system is inherently multi-threaded, in that each agent is assumed to have at least one thread of active control

### Agents and Expert Systems

• Aren’t agents just expert systems by another name?
• Expert systems typically disembodied ‘expertise’ about some (abstract) domain of discourse (e.g., blood diseases)
• Example: MYCIN knows about blood diseases in humans
• It has a wealth of knowledge about blood diseases, in the form of rules
• A doctor can obtain expert advice about blood diseases by giving MYCIN facts, answering questions, and posing queries

### Agents and Expert Systems

• Main differences:
• agents situated in an environment:
MYCIN is not aware of the world — only information obtained is by asking the user questions
• agents act:
MYCIN does not operate on patients
• Some real-time (typically process control) expert systems are agents

### Intelligent Agents and AI

• When building an agent, we simply want a system that can choose the right action to perform, typically in a limited domain
• We do not have to solve all the problems of AI to build a useful agent:
• a little intelligence goes a long way!
• Oren Etzioni, speaking about the commercial experience of NETBOT, Inc:“We made our agents dumber and dumber and dumber…until finally they made money.”

### Summary

• Agent preceives the environment and acts.
• Agent = architecture + program
• Ideal agent takes action that maximizes performance at a given perception.
• Actions of an autonomous agent depend on experience.
• Mapping of perceptiuon to actions.
• Reactive agents act on perceptions, goal-oriented agents act to reach a goal, utility-based agents maximize their profit.
• Representation of knowledge!
• Various environments. Most difficult: not accessible, episodic, dynamic, and continuous.

### Summary

• Multiagent systems are systems in which multiple interacting agents interact to solve problems

• Key concepts of multiagent systems are agents and agent coordination

• There are many important issues for multiagent systems that attempt to answer when and how to interact with whom

### Summary

• Common characteristics of multiagent systems are their inherent distribution and complexity

• Distributed and flexible nature of multiagent systems leads to increased speed, robustness, scalability and reusability

### References

• G. Görz et al., Handbuch der künstlichen Intelligenz,
Oldenbourg, 2003.
Kapitel 24: Software Agenten.
• Bradshaw, J.; „An Introduction to Software Agents“;
AAAI Press/The MIT Press
• Finin, Tim and Fritzson, Richard and McKay, Don and McEntire, Robin: KQML as an agent communication language. CIKM '94: Proceedings of the third international conference on Information and knowledge management. ACM. 1994.

### References

• Wooldridge, M. & Jennings, M; „Intelligent Agents: Theory and Practice“; The Knowledge Engineering Review 10
• Nwana, H.; „Software Agents: An Overview“; The Knowledge Engineering Review, Vol. 11 No 3
• Rao A. & Geogreff M.; „BDI Agents: From Theory to Practice“; Tech. Rep. 56, Australian Artificial Intelligence Institue, Melbourne, Australia, Apr 1995
• http://www.robocup.org/

### References

• http://en.wikipedia.org/wiki/Agent
• http://en.wikipedia.org/wiki/Software_agent
• http://en.wikipedia.org/wiki/Intelligent_agent
• http://en.wikipedia.org/wiki/Mycin
• http://de.wikipedia.org/wiki/Knowledge_Query_and_Manipulation_Language
• http://en.wikipedia.org/wiki/RoboCup

### Tom Mitchell “Machine Learning”

• Slides in this lecture are partially based
on [1], Section 3 “Decision Tree Learning”.

### Outline

• Motivation
• Technical Solution
• Rule learning approaches
• Specialization
• The ID3 Algorithm
• Generalization
• The RELAX Algorithm
• Combining specialization and generalization
• The JoJo Algorithm
• Refinement of rule sets with JoJo
• Illustration by a Larger Example
• Extensions
• The C4.5 Algorithm
• Summary

### Motivation: Definition of Learning in AI

• Research on learning in AI is made up of diverse subfields (cf. [5])

• “Learning as adaptation”: Adaptive systems monitor their own performance and attempt to improve it by adjusting internal parameters, e.g.,
• Self-improving programs for playing games
• Pole balancing
• Problem solving methods

• “Learning as the acquisition of structured knowledge” in the form of
• Concepts
• Production rules
• Discrimination nets

### Machine Learning

• Machine learning (ML) is concerned with the design and development of algorithms that allow computers to change behavior based on data. A major focus is to automatically learn to recognize complex patterns and make intelligent decisions based on data (Source: Wikipedia)
• Driving question: “How can we build computer systems that automatically improve with experience, and what are the fundamental laws that govern all learning processes?” (cf. [2])
• Three niches for machine learning [1]:
• Data mining: using historical data to improve decisions, e.g. from medical records to medical knowledge
• Software applications hard to program by hand, e.g. autonomous driving or speech recognition
• Self-customizing programs, e.g., a newsreader that learns users interest
• Practical success of ML:
• Speech recognition
• Computer vision, i.e. to recognize faces, to automatically classify microscope images of cells, or for recognition of hand writings
• Bio surveillance, i.e. to detect and track disease outbreaks
• Robot control

### Rule learning

• Machine Learning is a central research area in AI to acquire knowledge [1].

• Rule learning
• is a popular approach for discovering interesting relations in data sets and to acquire structured knowledge.
• is a means to alleviate the „bottleneck“ (knowledge aquisition) – problem in expert systems [4].

### Simple Motivating Example

• I = {milk, bread, butter, beer}

• Sample database

 ID Milk Bread Butter Beer 1 1 1 1 0 2 0 1 1 0 3 0 0 0 1 4 1 1 1 0 5 0 1 0 0
• A possible rule would be {milk, bread} → {butter}

### Decision Trees

• Many inductive knowledge acquisition algorithms generate („induce“) classifiers in form of decision trees.

• A decision tree is a simple recursive structure for expressing a sequential classification process.
• Leaf nodes denote classes
• Intermediate nodes represent tests

• Decision trees classify instances by sorting them down the tree from the root to some leaf node which provides the classification of the instance.

### Decision Tree: Example

• {Outlook = Sunny, Temperature = Hot,
Humidity = High, Wind = Strong}:
Negative Instance

### Decision Trees and Rules

• Rules can represent a decision tree:
if item1 then subtree1
elseif item2 then subtree2
elseif...
• There are as many rules as there are leaf nodes in the decision tree.
• Advantage of rules over decision trees:
• Rules are a widely-used and well-understood representation formalism for knowledge in expert systems;
• Rules are easier to understand, modify and combine; and
• Rules can significantly improve classification performance by eliminating unnecessary tests.
• Rules make it possible to combine different decision trees for the same task.

• Decision trees are simple to understand.

People are able to understand decision trees model after a brief explanation.

• Decision trees have a clear evaluation strategy

Decision trees are easily interpreted

• Decision trees are able to handle both nominal and categorical data

Other techniques are usually specialised in analysing datasets that have only one type of variable. Ex: neural networks can be used only with numerical variables

### Advantages of Decision Trees (1)

• Decision trees are a white box model.

If a given situation is observable in a model the explanation for the condition is easily explained by boolean logic. An example of a black box model is an artificial neural network since the explanation for the results is excessively complex to be comprehended.

• Decision trees are robust, perform well with large data in a short time.

Large amounts of data can be analysed using personal computers in a time short enough to enable stakeholders to take decisions based on its analysis.

### Converting a Decision Tree to Rules

• There are two major rule learning tasks:

1. Learning of correct rule that cover a maximal number of positive examples but no negative ones. We call this maximal requirement for each rule.

2. Learning of a minimal rule set that covers all positive example. We call this minimal requirement for the rule set.

To illustrate the two rule learning tasks let’s consider a data set, containing both positive and negative examples, as depicted in the following chart

### Maximal requirement for each rule

We want to learn a correct rule that cover a maximal number of positive examples but no negative examples (maximal requirement for each rule).

### Maximal requirement for each rule

We start with deriving a rule that cover the positive example (x=3;y=3).

If x=3 and y=3 then class = positive

### Maximal requirement for each rule

We try to cover more positive examples by refining the rule. We need to make sure that we don’t cover negative examples. The new rule
if x>=2 and x<=4 and y=3 then class = positive
covers positive example (x=2;y=3),(x=3;y=3),(x=4;y=3)

### Maximal requirement for each rule

We try to cover more positive examples by further refining the rule. The new rule
if x>=2 and x<=4 and y>=3 and y<=4 then class = positive
covers positive example (x=2;y=3),(x=3;y=3),(x=4;y=3), (x=2;y=4),(x=3;y=4),(x=4;y=4)

### Maximal requirement for each rule

We try to cover more positive examples by further refining the rule. The new rule
if x>=2 and x<=4 and y>=2 and y<=5 then class = positive
covers all positive example in the lower part of the space

### Maximal requirement for each rule

We can not refine anymore the rule
if x>=2 and x<=4 and y>=2 and y<=5 then class = positive to cover more positive without covering negative examples as well. To cover the rest of the positive examples we need to learn a new rule.

### Maximal requirement for each rule

Following the same approach as illustrated before we can learn a second rule that covers the rest of the positive examples:
if x>=5 and x<=6 and y>=6 and y<=8 then class = positive

### Maximal requirement for each rule

The full set of positive examples is covered by the following two rules:
if x>=2 and x<=4 and y>=2 and y<=5 then class = positive
if x>=5 and x<=6 and y>=6 and y<=8 then class = positive

### Minimal requirement for the rule set

In general, for a decision problem, there are many possible sets of rules that cover all positive examples without covering negative examples. Possible sets of rules for our previous example are:

#### Solution 1:

if x>=2 and x<=4 and y>=2 and y<=5 then class = positive
if x>=5 and x<=6 and y>=6 and y<=8 then class = positive

### Minimal requirement for the rule set

#### Solution 2:

if x>=2 and x<=4 and y>=2 and y<=3 then class = positive
if x>=2 and x<=4 and y>3 and y<=5 then class = positive
if x>=5 and x<=6 and y>=6 and y<=7 then class = positive
if x>=5 and x<=6 and y>7 and y<=8 then class = positive

### Minimal requirement for the rule set

• The minimal requirement for the rule set is about learning of a minimal rule set that covers all positive example.

• The minimal requirement for the rule set is similar with the minimal set cover problem [18]

• The minimal requirement for the rule set, which basically translates to building the optimal linear decision tree, is NP-hard [19].

### Rule learning approaches

• Rule learning can be seen as a search problem in the latice of possible rules

• Depending on the starting point and on the direction of search we can distinguished three classes of rule learning approaches:
1. Specialization – the search procedure starts at the top of the lattice and searches towards the bottom of the latice, towards the concrete descriptions. Specialization is top-down .
2. Generalization – the search procedure starts at the bottom of the lattice and advances up the lattice towards the t o p element. Generalization is bottom-up .
3. Combining of specialization and generalization – the search procedure can start at any arbitrary point in the lattice and can move freely up or down as needed.

### Rule learning approaches

• Specialization and Generalization are dual search directions in a given rule set.

### Specialization

• Specialization algorithms start from very general descriptions and specializes those until they are correct.
• This is done by adding additional premises to the antecedent of a rule, or by restricting the range of an attribute which is used in an antecedent.

• Algorithms relying on specialization generally have the problem of overspecialization: previous specialization steps could become unnecessary due to subsequent specialization steps.
• This brings along the risk for ending up with results that are not maximal-general.

• Some examples of (heuristic) specialization algorithms are the following: ID3, AQ, C4, CN2, CABRO, FOIL, or PRISM; references at the end of the lecture.

### ID3

• Most algorithms apply a top-down, greedy search through the space of possible trees.
• e.g., ID3 [5] or its successor C4.5 [6]
• ID3
• Learns trees by constructing them top down.
• Initial question: “Which attribute should be tested at the root of the tree?” ->each attribute is evaluated using a statistical test to see how well it classifies.
• A descendant of the root node is created for each possible value of this attribute.
• Entire process is repeated using the training examples associated with each descendant node to select the best attribute to test at that point in the tree.

### Selecting the Best Classifier

• Core question: Which attribute is the best classifier?

• ID3 uses a statistical measure called information gain that measures how well a given example separates the training examples according to their target classification.

### Entropy

• Entropy characterizes the (im)purity of an arbitrary collection of examples.
• Given a collection S (containing positive and negative examples), the entropy of S relative to this boolean classification is
 p + / p - is the proportion of positive/negative examples in S. Entropy is 0 if all members of S belong to the same class; entropy is 1 when the collection contains an equal number of positive and negative examples. More general:

### Information Gain

• Information gain measures the expected reduction in entropy caused by partitioning the examples according to an attribute:

• First term: entropy of the original collection S; second term: expected value of entropy after S is partitioned using attribute A (Sv subset of S).

• Gain(S,A): The expected reduction in entropy caused by knowing the value of attribute A.

• ID3 uses information gain to select the best attribute at each step in growing the tree.

### Hypothesis Space Search in Decision Tree Learning

• ID3 performs a simple-to-complex hill-climbing search through the space of possible decision trees (the hypothesis search).
• Start: empty tree; progressively more elaborate hypotheses in search of a decision tree are tested.

### Capabilities and Limitations of ID3

• ID3’s hypothesis space is the complete space of finite discrete-valued functions, relative to the available attributes.

• ID3 maintains only a single current hypothesis, as it searches through the space of trees (earlier (perhaps better) versions are eliminated).

• ID3 performs no backtracking in search; it converges to locally optimal solutions that are not globally optimal.

• ID3 uses all training data at each step to make statistically based decisions regarding how to refine its current hypothesis.

### Inductive Bias in Decision Tree Learning

• Definition: Inductive bias is the set of assumptions that, together with the training data, deductively justifies the classifications assigned by the learner to future instances [1].
• Central question: How does ID3 generalize from observed training examples to classify unseen instances? What is its inductive bias?
• ID3 search strategy:
• ID3 chooses the first acceptable tree it encounters.
• ID3 selects in favour of shorter trees over longer ones.
• ID3 selects trees that place the attributes with the highest information gain closest to the root.
• Approximate inductive bias of ID3: Shorter trees are preferred over larger trees. Trees that place high information gain attributes close to the root are preferred over those that do not.

### ID3’s Inductive Bias

• ID3 searches a complete hypothesis search.

• It searches incompletely through this space, from simple to complex hypotheses, until its termination condition is met.
• Termination condition: Discovery of a hypothesis consistent with the data

• ID3’s inductive bias is a preference bias (it prefers certain hypotheses over others).

• Alternatively: A restriction bias categorically restricts considered hypotheses.

### Occam’s Razor

• …or why prefer short hypotheses?

• William Occam was one of the first to discuss this question around year 1320.

• Several arguments have been discussed since then.

• Arguments:
• Based on combinatorial arguments there are fewer short hypothesis that coincidentally fit the training data.
• Very complex hypotheses fail to generalize correctly to subsequent data.

### ID3 by Example ([1])

• Target attribute: PlayTennis (values: yes / no)

### Example: Which attribute should be tested first in the tree?

• ID3 determines information gain for each candidate attribute (e.g., Outlook, Temperature, Humidity, and Wind) and selects the one with the highest information gain

• Gain(S, Outlook) = 0.246 ; Gain(S, Humidity) = 0.151; Gain(S, Wind) = 0.048; Gain(S, Temperature)=0.029

### Example: Continue Selecting New Attributes

• Determing new attributes at the „Sunny“ – branch only using the examples classified there:

• Process continues for each new leaf node until either of two conditions are met:
1. Every attribute has already been included along this path through the tree, or
2. The training examples associated with this leaf node all have the same target attribute value (i.e., their entropy is zero).

### Generalization

• Generalization starts from very special descriptions and generalizes them as long as they are not incorrect, i.e. in every step some unnecessary premises are deleted from the antecedent.
• The generalization procedure stops if no more premises to remove exist.

• Generalization avoids the maximal-general issue of specialization, in fact it guarantees most-general descriptions.
• However, generalization of course risks to derive final results that are not most-specific.

• RELAX is an example of a generalization-based algorithm; references at the end of the lecture.

### RELAX

• RELAX is a generalization algorithm, and proceeds as long as the resulting rule set is not incorrect.
• The RELAX algorithm:
• RELAX considers every example as a specific rule that is generalized.
• The algorithm then starts from a first rule and relaxes the first premise.
• The resulting rule is tested against the negative examples.
• If the new (generalized) rule covers negative examples, the premise is added again, and the next premise is relaxed.
• A rule is considered minimal, if any further relaxation would destroy the correctness of the rule.
• The search for minimal rules starts from any not yet considered example, i.e. examples that are not covered by already discovered minimal rules.

### RELAX: Ilustration by an example

• Consider the following positive example for a consequent C:
(pos, (x=1, y=0, z=1))
• This example is represented as a rule: x ∩ ¬ y ∩ z → C
• In case of no negative examples, RELAX constructs and tests the following set of rules:
 1) x ∩ ¬ y ∩ z → C 5) x → C 2) ¬ y ∩ z → C 6) ¬ y → C 3) x ∩ z → C 7) z → C 4) x ∩ ¬ y → C 8) → C

### JoJo

• In general, it cannot be determined which search direction is the better one.

• JoJo is an algorithm that combines both search directions in one heuristic search procedure.

• JoJo can start at an arbitrary point in the lattice of complexes and generalizes and specializes as long as the quality and correctness can be improved, i.e. until a local optimum can be found, or no more search resources are available (e.g., time, memory).

### JoJo (1)

• While specialization moves solely from ‘Top‘ towards ‘Bottom‘ and generalization from ‘Bottom‘ towards ‘Top‘, JoJo is able to move freely in the search space.
• Either of the two strategies can be used interchangeable, which makes JoJo more expressive than comparable algorithms that apply the two in sequential order (e.g. ID3).

### JoJo (2)

• A starting point in JoJo is described by two parameters:
• Vertical position (length of the description)
• Horizontal position (chosen premises)

• Reminder: JoJo can start at any arbitrary point, while specialization requires a highly general point and generalization requires a most specific point.

• In general, it is possible to carry out several runs of JoJo with different starting points. Rules that were already produced can be used as subsequent starting points.

### JoJo (4)

• Criteria for choosing a vertical position:

1. Approximation of possible lenght or experience from earlier runs.

2. Random production of rules; distribution by means of the average correctness of the rules with the same length (so-called quality criterion).

3. Start with a small sample or very limited resources to discover a real starting point from an arbitrary one.

4. Randomly chosen starting point (same average expectation of success as starting with ‘Top‘ or ‘Bottom‘).

5. Heuristic: few positive examples and maximal-specific descriptions suggest long rules, few negative examples and maximal-generic descriptions rather short rules.

### JoJo Principle Components

• JoJo consists of three components: The Specializer , Generalizer , and Scheduler
• The former two can be provided by any such components depending on the chosen strategies and preference criterias.
• The Scheduler is responsible for selecting the next description out of all possible generalizations and specializations available (by means of a t-preference, total preference).
• Simple example scheduler:
• Specialize, if the error rate is above threshold;
• Otherwise, choose the best generalization with allowable error rate;
• Otherwise stop.

### Refinement of rules with JoJo

• Refinement of rules refers to the modification of a given rule set based on additonal examples.

• The input to the task is a so-called hypothesis (a set of rules) and a set of old and new positive and negative examples.

• The output of the algorithm are a refined set of rules and the total set of examples.

• The new set of rules is correct, complete, non-redundant and (if necessary) minimal.

### Refinement of rules with JoJo (1)

• There is a four step procedure for the refinment of rules:

1. Rules that become incorrect because of new examples are refined: incorrect rules are replaced by new rules that cover the positive examples, but not the new negative ones.

2. Complete the rule set to cover new positive examples.

3. Redundant rules are deleted to correct the rule set.

4. Minimize the rule set.
• Steps 1 and 2 are subject to the algorithm JoJo that integrates generalization and specification via a heuristic search procedure.

### Refinement of rules with JoJo (3)

• Correctness:
• Modify overly general rules that cover too many negative examples.
• Replace a rule by a new set of rules that cover the positive examples, but not the negative ones.

• Completeness:
• Compute new correct rules that cover the not yet considered positive examples (up to a threshold).

• Non-redundancy:
• Remove rules that are more specific than other rules (i.e. rules that have premises that are a superset of the premises of another rule).

### Issues in Decision Tree Learning

• How deeply to grow the decision tree?
• Handling continuous attributes.
• Choosing an appropriate attribute selection measure.
• Handling training data with missing attribute values.
• Handling attributes with differing costs.
• Improving computational efficiency.

• Successor of ID3, named C4.5 , addresses most of these issues (see [6]).

### Avoiding Overfitting the Data

• Definition: Given a hypothesis space H, a hypothesis h є H is said to overfit the training data if there exists some alternative hypothesis h’ є H, such that h’ has smaller error than h’ over the training examples, but h’ has a smaller error than h over the entire distribution of instances. [1]
• ID3 grows each branch of the tree just deeply enough to perfectly classify the training examples.
• This can lead to difficulties when there is noise in the data or when the number of training examples is too small to produce a representative sample of the true target function.
• Overfitting trees could be produced!
 Impact of overfitting in a typical application of decision tree learning [1]

### Example: Overfitting due to Random Noise

• Original decision tree based on correct data:
• Incorrectly labelled data leads to construction of a more complex tree, e.g., {Outlook=Sunny, Temperature=Hot, Humidity=Normal,
Wind=Strong, PlayTennis=No}
• ID3 would search for new refinements at the bottom of the left tree.

### Strategies to Avoid Overfitting

• Overfitting might occur based on erroneous input data or based on coincidental regularities.
• Different types of strategies:
• Stopping to grow the tree earlier, before it reaches the point where it perfectly classifies the training data.
• Allowing the tree to overfit the data, and then post-prune the tree (more successful approach).
• Key question: How to determine the correct final tree size?
• Use of a separate set of examples to evaluate the utility of post-pruning nodes from the tree (“Training and Validation Set” – approach); two approaches applied by Quinlain: “Reduced Error Pruning” and “Rule-Post Pruning”
• Use all available data for training, but apply a statistical test to estimate whether expanding (or pruning) a particular node is likely to produce an improvement beyond the training set.
• Use an explicit measure of the complexity for encoding the training examples and the decision trees, halting growth of the tree when this encoding size is minimized (“Minimum Decision Length” – principle, see [1] Chapter 6)

### Rule Post-Pruning

• Applied in C4.5.
• Steps ( [1])
1. Infer the decision tree from the training set, growing the set until the training data is fit as well as possible and allowing overfitting to occur.

2. Convert the learned tree into an equivalent set of rules by creating one rule for each path from the root node to a leaf node.

3. Prune (generalize) each rule by removing any preconditions that result in improving its estimated accuracy.

4. Sort the pruned rules by their estimated accuracy, and consider them in this sequence when classifying subsequent instances.
• Rule accuracy estimation based on the training set using a pessimistic estimate: C4.5 calculates standard deviation and takes the lower bound as estimate for rule performance.

### Incorporating Continuous-Valued Attributes

• ID3 is restricted to attributes with discrete values.
• Continuous-valued attributes - example:
• C4.5 allows continuous-valued attributes in the decision nodes inside the tree.
• Approach: Dynamic definition of new discrete-valued attributes that partition the continuous attribute value into a discrete set of intervals (e.g. Ac is true if A < c, false otherwise).
• Central question: How to select the best value for a threshold c?

### Computing a Threshold

• Goal: Determining a threshold c that produces the greatest information gain.
• Approach:

1. Sort examples according to the continuous attribute A.

2. Identify adjacent examples that differ in their target classification.

3. Generate a set of candidate thresholds midway between the corresponding values of A (c must always lie at such a boundary, see [5]).

4. Evaluate candidate thresholds by computing the information gain associated with each.
• Example:
• Two candidate thresholds at which the value of PlayTennis changes: (48+60)/2, (80+90)/2
• Information gain for the first threshold is higher.

### Alternative Measures for Selecting Attributes

• Natural bias in the information gain measure favours attributes with many values over those with few values (e.g. a “Date” – field)
• Example:
• Date – attributes has (too) many possible values; thus it separates the training examples into very small subsets.
• It would have a very high information gain.
• But it would be a very poor predictor of the target function over unseen instances

• Alternative selection measure: gain ratio (see [5]).

### Gain Ratio

• Gain ratio penalizes attributes such as Date by incorporating split information (entropy of S with respect to A) that is sensitive to how broadly and uniformly the attribute splits the data:

• Definition GainRatio:

### Handling Training Examples with Missing Attribute Values

• If data with missing attribute values exist, it is common to estimate the missing attribute value based on other examples for which the attribute has a known value.
• Common strategy: Assign value that is most common among training examples at a certain node n that have the same classification as the current example.
• More complex strategy used in C4.5: Assign a probability to each of the possible values.
• Probabilities can be estimated based on the observed frequencies of various values of a certain attribute among examples at a certain node n, e.g., A(x) = 1: 0.6
• Fractional of probability 0.6 is propagated down the branch of the tree for A=1.
• Fractional examples are used to compute information gain.
• Classification of a new instance is simply the most probable classification (computed by summing weights of instance fragments classified in different ways at leaf nodes of the tree).

### Summary

• Machine learning is a prominent topic in the field of AI.
• Rule learning is a means to learn rules from instance data to classify unseen instances.
• Decision tree learning can be used for concept learning, rule learning, or for learning of other discrete valued functions.
• The ID3 family of algorithms infers decision trees by growing them from the root downward in a greedy manner.
• ID3 searches a complete hypothesis space.
• ID3’s inductive bias includes a preference for smaller trees; it grows trees only as large as needed.
• A variety of extensions to basic ID3 have been developed; extensions include: methods for post-pruning trees, handling real-valued attributes, accommodating training examples with missing attribute values, or using alternative selection measures.

### Summary (2)

• Rules cover positive examples and should not cover negative examples.
• There are two main approaches for determining rules:
• Generalization
• Specification

• RELAX is a presented example of a generalization algorithm
• JoJo combines generalization and specialization and allows the algorithm to traverse the entire search space by either generalizing or specializing rules inter-changeably.
• JoJo can also be applied to incrementally refine rule sets.

### References

• [1] Mitchell, T. "Machine Learning", McGraw Hill, 1997. (Section 3)
• [2] Mitchell, T. "The Discipline of Machine Learning" Technical Report CMU-ML-06-108, July 2006. Available online: http://www.cs.cmu.edu/~tom/pubs/MachineLearning.pdf
• [3] Fensel, D. “Ein integriertes System zum maschinellen Lernen aus Beispielen” Künstliche Intelligenz (3). 1993.
• [4] Feigenbaum, E. A. “Expert systems in the 1980s” In: A. Bond (Ed.), State of the art report on machine intelligence. Maidenhea: Pergamon-infotech.
• [5] Quinlan, J. R. “Induction of decision trees”. Machine Learning 1 (1), pp. 81-106, 1986.

### References

• [6] Quinlan, J. R. “C4.5: Programs for Machine Learning” Morgan Kaufmann, 1993.

• [7] Fayyad, U. M. “On the induction of decision trees for multiple concept learning” PhD dissertation, University of Michigan, 1991.

• [8] Agrawal, Imielinsky and Swami: Mining Association Rules between Sets of Items in Large Databases. ACM SIGMOD Conference, 1993, pp. 207-216.

• [9] Quinlan: Generating Production Rules From Decision Trees. 10th Int’l Joint Conference on Artificial Intelligence, 1987, pp. 304-307.

• [10] Fensel and Wiese: Refinement of Rule Sets with JoJo. European Conference on Machine Learning, 1993, pp. 378-383.

### References

• [11] AQ: Michalski, Mozetic, Hong and Lavrac: The Multi-Purpose Incremental Learning System AQ15 and its Testing Application to Three Medical Domains. AAAI-86, pp. 1041-1045.
• [12] C4: Quinlan: Learning Logical Definitions from Relations: Machine Learning 5(3), 1990, pp. 239-266.
• [13] CN2: Clark and Boswell: Rule Induction with CN2: Some recent Improvement. EWSL-91, pp. 151-163.
• [14] CABRO: Huyen and Bao: A method for generating rules from examples and its application. CSNDAL-91, pp. 493-504.
• [15] FOIL: Quinlan: Learning Logical Definitions from Relations: Machine Learning 5(3), 1990, pp. 239-266.
• [16] PRISM: Cendrowska: PRISM: An algorithm for inducing modular rules. Journal Man-Machine Studies 27, 1987, pp. 349-370.
• [17] RELAX: Fensel and Klein: A new approach to rule induction and pruning. ICTAI-91.

### References

• [18] http://www.ensta.fr/~diam/ro/online/viggo_wwwcompendium/node146.html.

• [19] Goodrich, M.T., Mirelli, V. Orletsky, M., and Salowe, J. Decision tree conctruction in fixed dimensions: Being global is hard but local greed is good. Technical Report TR-95-1, Johns Hopkins University, Department of Computer Science, Baltimore, MD 21218, May 1995

### References

• Machine Learning, http://en.wikipedia.org/wiki/Machine_learning

• Rule learning, http://en.wikipedia.org/wiki/Rule_learning

• Association rule learning, http://en.wikipedia.org/wiki/Association_rule_learning

• ID3 algorithm, http://en.wikipedia.org/wiki/ID3_algorithm

• C4.5 algorithm, http://en.wikipedia.org/wiki/C4.5_algorithm

### Agenda

• Motivation
• Technical Solution
• Model Theory of ILP
• A Generic ILP Algorithm
• Proof Theory of ILP
• ILP Systems
• Illustration by a Larger Example
• Summary
• References

### Motivation

• There is a vast array of different machine learning techniques, e.g.:
• Decision Tree Learning (see previous lecture)
• Neural networks
• and… Inductive Logic Programming (ILP)
• Advantages over other ML approaches
• ILP uses an expressive First-Order framework instead of simple attribute-value framework of other approaches
• ILP can take background knowledge into account
Inductive Logic Programming

=

Inductive Learning   ∩   Logic Programming

### The inductive learning and logic programming sides of ILP

• From inductive machine learning, ILP inherits its goal: to develop tools and techniques to
• Induce hypotheses from observations (examples)
• Synthesise new knowledge from experience
• By using computational logic as the representational mechanism for hypotheses and observations, ILP can overcome the two main limitations of classical machine learning techniques:
• The use of a limited knowledge representation formalism (essentially a propositional logic)
• Difficulties in using substantial background knowledge in the learning process

### The inductive learning and logic programming sides of ILP (cont’)

• ILP inherits from logic programming its
• Representational formalism
• Semantical orientation
• Various well­established techniques
• ILP systems benefit from using the results of logic programming
• E.g. by making use of work on termination, types and modes, knowledge­base updating, algorithmic debugging, abduction, constraint logic programming, program synthesis and program analysis

### The inductive learning and logic programming sides of ILP (cont’)

• Inductive logic programming extends the theory and practice of logic programming by investigating induction rather than deduction as the basic mode of inference
• Logic programming theory describes deductive inference from logic formulae provided by the user
• ILP theory describes the inductive inference of logic programs from instances and background knowledge

• ILP contributes to the practice of logic programming by providing tools that assist logic programmers to develop and verify programs

### Introduction – Basic example

• Imagine learning about the relationships between people in your close family circle
• You have been told that your grandfather is the father of one of your parents, but do not yet know what a parent is
• You might have the following beliefs ( B ):
grandfather(X, Y) ← father(X, Z), parent(Z, Y)
father(henry, jane) ←
mother(jane. john) ←
mother(jane, alice) ←
• You are now given the following positive examples concerning the relationships between particular grandfathers and their grandchildren ( E + ):
grandfather(henry, john) ←
grandfather(henry, alice) ←

### Introduction – Basic example

• You might be told in addition that the following relationships do not hold (negative examples) ( E - )
← grandfather(john, henry)
← grandfather(alice, john)
• Believing B , and faced with examples E + and E - you might guess the following hypothesis H 1 H
parent(X, Y) ← mother(X, Y)
• H is the set of hypotheses and contain an arbitrary number of individual speculations that fit the background knowledge and examples
• Several conditions have to be fulfilled by a hypothesis
• Those conditions are related to completeness and consistency with respect to the background knowledge and examples

### Introduction – Basic example

• Consistency:
• First, we must check that our problem has a solution:
B ∪ E- ⊭ □ ( prior satisfiability )
• If one of the negative examples can be proved to be true from the background information alone, then any hypothesis we find will not be able to compensate for this. The problem is not satisfiable.
• B and H are consistent with E-:
B ∪ H ∪ E- ⊭ □ ( posterior satisfiability )
• After adding a hypothesis it should still not be possible to prove a negative example.
• Completeness:
• However, H allows us to explain E+ relative to B:
B ∪ H ⊧ E+ ( posterior sufficiency )
• This means that H should fit the positive examples given.

### Model Theory – Normal Semantics

• The problem of inductive inference:
• Given is background (prior) knowledge B and evidence E
• The evidence E = E+ ∪ E- consists of positive evidence E+ and negative evidence E-
• The aim is then to find a hypothesis H such that the following conditions hold:
Prior Satisfiability: B ∪ E- ⊭ □
Posterior Satisfiability: B ∪ H ∪ E- ⊭ □
Prior Necessity: B ⊭ E+
Posterior Sufficiency: B ∪ H ⊧ E+
• The Sufficiency criterion is sometimes named completeness with regard to positive evidence
• The Posterior Satisfiability criterion is also known as consistency with the negative evidence
• In this general setting, background-theory, examples, and hypotheses can be any (well-formed) formula

### Model Theory – Definite Semantics

• In most ILP practical systems background theory and hypotheses are restricted to being definite clauses
• Clause: A disjunction of literals
• Horn Clause: A clause with at most one positive literal
• Definite Clause: A Horn clause with exactly one positive literal
• This setting has the advantage that definite clause theory T has a unique minimal Herbrand model M +( T )
• Any logical formulae is either true or false in this minimal model (all formulae are decidable and the Closed World Assumption holds)

### Model Theory – Definite Semantics

• The definite semantics again require a set of conditions to hold
• We can now refer to every formula in E since they are guaranteed to have a truth value in the minimal model

• Consistency:
Prior Satisfiability: all e in E - are false in M +( B )
• Negative evidence should not be part of the minimal model
Posterior Satisfiability: all e in E - are false in M +( B H )
• Negative evidence should not be supported by our hypotheses
• Completeness
Prior Necessity: some e in E + are false in M +( B )
• If all positive examples are already true in the minimal model of the background knowledge, then no hypothesis we derive will add useful information
Posterior Sufficiency: all e in E + are true in M +( B H )
• All positive examples are true (explained by the hypothesis) in the minimal model of the background theory and the hypothesis

### Model Theory – Definite Semantics

• An additional restriction in addition to those of the definite semantics is to only allow true and false ground facts as examples (evidence)
• This is called the example setting
• The example setting is the main setting employed by ILP systems
• Only allows factual and not causal evidence (which usually captures more knowledge)
• Example:
• B: grandfather(X, Y) ← father(X, Z), parent(Z, Y)
father(henry, jane) ←
etc.
• E: grandfather(henry, john) ←
grandfather(henry, alice) ←
• Not allowed in example setting:
• ← grandfather(X, X)     [Not allowed in definite semantics]
grandfather(henry, john) ← father(henry, jane), mother(jane, john)

### Model Theory – Non-monotonic Semantics

• In the non­monotonic setting:
• The background theory is a set of definite clauses

• The evidence is empty
• The positive evidence is considered part of the background theory
• The negative evidence is derived implicitly, by making the closed world assumption (realized by the minimal Herbrand model)

• The hypotheses are sets of general clauses expressible using the same alphabet as the background theory

### Model Theory – Non-monotonic Semantics (2)

• Since only positive evidence is present, it is assumed to be part of the background theory:
B’ = B ∪ E
• The following conditions should hold for H and B’:
• Validity: all h in H are true in M +( B’ )
• All clauses belonging to a hypothesis hold in the database B, i.e. that they are true properties of the data
• Completeness: if general clause g is true in M +( B’ ) then H g
• All information that is valid in the minimal model of B’ should follow from the hypothesis
• Additionally the following can be a requirement:
• Minimality: there is no proper subset G of H which is valid and complete
• The hypothesis should not contain redundant clauses

### Model Theory – Non-monotonic Semantics (3)

• Example for B (definite clauses):
male(luc) ←
female(lieve) ←
human(lieve) ←
human(luc) ←
• A possible solution is then H (a set of general clauses):
← female(X), male(X)
human(X) ← male(X)
human(X) ← female(X)
female(X), male(X) ← human(X)

### Model Theory – Non-monotonic Semantics (4)

• One more example to illustrate the difference between the example setting and the non-monotonic setting
• Consider:
• Background theory B
bird(tweety) ←
bird(oliver) ←
• Examples E+:
flies(tweety)

• For the non-monotonic setting B’ = B ∪ E+ because positive examples are considered part of the background knowledge

### Model Theory – Non-monotonic Semantics (5)

• Example setting:
• An acceptable hypothesis H1 would be
flies(X) ← bird(X)
• It is acceptable because if fulfills the completeness and consistency criteria of the definite semantics
• This realizes can inductive leap because flies(oliver) is true in
M + ( B ∪ H ) = { bird(tweety), bird(oliver), flies(tweety), flies(oliver) }
• Non-monotonic setting:
• H1 is not a solution since there exists a substitution {X oliver} which makes the clause false in M +( B’ ) (the validity criteria is violated:
M + ( B’ ) = { bird(tweety), bird(oliver), flies(tweety) }
{X ← oliver}: flies(oliver) ← bird(oliver)
{X ← tweety}: flies(tweety) ← bird(tweety)

### ILP as a Search Problem

• ILP can be seen as a search problem - this view follows immediately from the model­theory of ILP
• In ILP there is a space of candidate solutions, i.e. the set of hypotheses, and an acceptance criterion characterizing solutions to an ILP problem
• Question: how the space of possible solutions can be structured in order to allow for pruning of the search?
• The search space is typically structured by means of the dual notions of generalisation and specialisation
• Generalisation corresponds to induction
• Specialisation to deduction
• Induction is viewed here as the inverse of deduction

### Specialisation and Generalisation Rules

• A hypothesis G is more general than a hypothesis S if and only if G ⊧ S
• S is also said to be more specific than G.
• In search algorithms, the notions of generalisation and specialisation are incorporated using inductive and deductive inference rules:
• A deductive inference rule r maps a conjunction of clauses G onto a conjunction of clauses S such that G ⊧ S
r is called a specialisation rule
• An inductive inference rule r maps a conjunction of clauses S onto a conjunction of clauses G such that G ⊧ S
r is called a generalisation rule

### Pruning the search space

• Generalisation and specialisation form the basis for pruning the search space; this is because:
• When B H e , where e ∈ E+, B is the background theory, H is the hypothesis, then none of the specialisations H’ of H will imply the evidence
• They can therefore be pruned from the search.
• When B H {e} , where e ∈ E-, B is the background theory, H is the hypothesis, then all generalisations H’ of H will also be inconsistent with B E
• We can again drop them

### A Generic ILP Algorithm

• Given the key ideas of ILP as search a generic ILP system is defined as:
• The algorithm works as follows:
• It keeps track of a queue of candidate hypotheses QH
• It repeatedly deletes a hypothesis H from the queue and expands that hypotheses using inference rules; the expanded hypotheses are then added to the queue of hypotheses QH, which may be pruned to discard unpromising hypotheses from further consideration
• This process continues until the stop­criterion is satisfied

### Algorithm – Generic Parameters

• Initialize denotes the hypotheses started from
• R denotes the set of inference rules applied
• Delete influences the search strategy
• Using different instantiations of this procedure, one can realise a depth­first (Delete = LIFO), breadth­first Delete = FIFO) or best­first algorithm
• Choose determines the inference rules to be applied on the hypothesis H

### Algorithm – Generic Parameters (2)

• Prune determines which candidate hypotheses are to be deleted from the queue
• This can also be done by relying on the user (employing an “oracle”)
• Combining Delete with Prune it is easy to obtain advanced search

• The Stop­criterion states the conditions under which the algorithm stops
• Some frequently employed criteria require that a solution be found, or that it is unlikely that an adequate hypothesis can be obtained from the current queue

### Proof Theory of ILP

• Inductive inference rules can be obtained by inverting deductive ones
• Deduction: Given B ⋀ H ⊧ E + , derive E + from B ⋀ H
• Induction: Given B ⋀ H ⊧ E + , derive H from B and B and E +
• Inverting deduction paradigm can be studied under various assumptions, corresponding to different assumptions about the deductive rule for ⊧ and the format of background theory B and evidence E +
⇒ Different models of inductive inference are obtained
• Example: θ -subsumption
• The background knowledge is supposed to be empty, and the deductive inference rule corresponds to θ-subsumption among single clauses

### θ-subsumption

• θ -subsumes is the simplest model of deduction for ILP which regards clauses as sets of (positive and negative) literals

• A clause c 1 θ -subsumes a clause c 2 if and only if there exists a substitution θ such that c 1 θ c 2

• c 1 is called a generalisation of c 2 (and c 2 a specialisation of c 1 ) under θ ­subsumption

• θ -subsumes The θ ­subsumption inductive inference rule is:

### θ-subsumption

• For example, consider:
c 1 = { father(X,Y) ← parent(X,Y), male(X) }
c 2 = { father(jef,paul) ← parent(jef,paul), parent(jef,ann), male(jef), female(ann) }

With θ = {X = jef, Y = paul} c 1 θ ­subsumes c2 because

{ father(jef,paul) ← parent(jef, paul), male(jef) }
father(jef,paul) ← parent(jef,paul), parent(jef,ann), male(jef), female(ann) }

### Some properties of θ-subsumption

• θ ­subsumption has a range of relevant properties

• Example: Implication
• If c 1 θ -subsumes c 2, then c 1 c 2
• Example: See previous slide

• This property is relevant because typical ILP systems aim at deriving a hypothesis H (a set of clauses) that implies the facts in conjunction with a background theory B, i.e. B ∪ H ⊧ E+
• Because of the implication property, this is achieved when all the clauses in E+ are θ -subsumed by clauses in B ∪ H

### Some properties of θ-subsumption

• Example: Equivalence
• There exist different clauses that are equivalent under θ ­subsumption
• E.g. parent(X,Y) ← mother(X,Y), mother(X,Z) θ ­subsumes parent(X,Y) ← mother(X,Y) and vice versa
• Two clauses equivalent under θ ­subsumption are also logically equivalent, i.e. by implication
• This is used for optimization purposes in practical systems

### Characteristics of ILP systems

• Incremental/non­incremental: describes the way the evidence E (examples) is obtained
• In non­incremental or empirical ILP, the evidence is given at the start and not changed afterwards
• In incremental ILP, the examples are input one by one by the user, in a piecewise fashion.
• Interactive/ Non­interactive
• In interactive ILP, the learner is allowed to pose questions to an oracle (i.e. the user) about the intended interpretation
• Usually these questions query the user for the intended interpretation of an example or a clause.
• The answers to the queries allow to prune large parts of the search space
• Most systems are non-interactive

### Concrete ILP implementations

• A well known family of related, popular systems: Progol
• CProgol, PProgol, Aleph
• Progol allows arbitrary Prolog programs as background knowledge and arbitrary definite clauses as examples
• Most comprehensive implementation: CProgol
• Homepage: http://www.doc.ic.ac.uk/~shm/progol.html
• Background information
• Example datasets
• Open source and free for research and teaching

### An ILP system: CProgol

• CProgol uses a covering approach: It selects an example to be generalised and finds a consistent clause covering the example
• Basic algorithm for CProgol:

1. Select an example to be generalized.

2. Build most-specific-clause. Construct the most specific clause that entails the example selected, and is within language restrictions provided. This is usually a definite clause with many literals, and is called the "bottom clause."

3. Find a clause more general than the bottom clause. This is done by searching for some subset of the literals in the bottom clause that has the "best" score.

4. Remove redundant examples. The clause with the best score is added to the current theory, and all examples made redundant are removed. Return to Step 1 unless all examples are covered.

### An ILP system: CProgol

• Example: CProgol can be used to learn legal moves of chess pieces (Based on rank and File difference for knight moves)
• Example included in CProgol distrubtion
 Input: % Typespos(b,3),pos(d,2)). knight(pos(e,7),pos(f,5)). rank(1). rank(2). rank(3). rank(4). rank(5). rank(6). rank(7). rank(8). knight(pos(c,4),pos(a,5)). file(a). file(b). file(c). file(d). file(e). file(f). file(g). file(h). knight(pos(c,7),pos(e,6)). Etc.

### An ILP system: CProgol

 Output: [Result of search is] knight(pos(A,B),pos(C,D)) :- rdiff(B,D,E), fdiff(A,C,-2), invent(q4, E). [17 redundant clauses retracted] knight(pos(A,B),pos(C,D)) :- rdiff(B,D,E), fdiff(A,C,2), invent(q4,E). knight(pos(A,B),pos(C,D)) :- rdiff(B,D,E), fdiff(A,C,1), invent(q2,E). knight(pos(A,B),pos(C,D)) :- rdiff(B,D,E), fdiff(A,C,-1), invent(q2, E). knight(pos(A,B),pos(C,D)) :- rdiff(B,D,E), fdiff(A,C,-2), invent(q4,E). [Total number of clauses = 4] [Time taken 0.50s] Mem out = 822

### Michalski’s train problem

• Assume ten railway trains: five are travelling east and five are travelling west; each train comprises a locomotive pulling wagons; whether a particular train is travelling towards the east or towards the west is determined by some properties of that train

• The learning task: determine what governs which kinds of trains are Eastbound and which kinds are Westbound

### Michalski’s train problem (2)

• Michalski’s train problem can be viewed as a classification task: the aim is to generate a classifier (theory) which can classify unseen trains as either Eastbound or Westbound

• The following knowledge about each car can be extracted: which train it is part of, its shape, how many wheels it has, whether it is open (i.e. has no roof) or closed, whether it is long or short, the shape of the things the car is loaded with. In addition, for each pair of connected wagons, knowledge of which one is in front of the other can be extracted.

### Michalski’s train problem (3)

• Examples of Eastbound trains
Positive examples:
eastbound(east1).
eastbound(east2).
eastbound(east3).
eastbound(east4).
eastbound(east5).
Negative examples:
eastbound(west6).
eastbound(west7).
eastbound(west8).
eastbound(west9).
eastbound(west10).

### Michalski’s train problem (4)

• Background knowledge for train east1 . Cars are uniquely identified by constants of the form car_xy , where x is number of the train to which the car belongs and y is the position of the car in that train. For example car_12 refers to the second car behind the locomotive in the first train
• short(car_12). short(car_14).
• long(car_11). long(car_13).
• closed(car_12).
• open(car_11). open(car_13). open(car_14).
• infront(east1,car_11). infront(car_11,car_12).
• infront(car_12,car_13). infront(car_13,car_14).
• shape(car_11,rectangle). shape(car_12,rectangle).
• shape(car_13,rectangle). shape(car_14,rectangle).
• wheels(car_11,2). wheels(car_12,2).
• wheels(car_13,3). wheels(car_14,2).
• has_car(east1,car_11). has_car(east1,car_12).
• has_car(east1,car_13). has_car(east1,car_14).

### Michalski’s train problem (5)

• An ILP systems could generate the following hypothesis:
eastbound(A) ← has_car(A,B), not(open(B)), not(long(B)).
i.e. A train is eastbound if it has a car which is both not open and not long.
• Other generated hypotheses could be:
• If a train has a short closed car, then it is Eastbound and otherwise Westbound
• If a train has two cars, or has a car with a corrugated roof, then it is Westbound and otherwise Eastbound
• If a train has more than two different kinds of load, then it is Eastbound and otherwise Westbound
• For each train add up the total number of sides of loads (taking a circle to have one side); if the answer is a divisor of 60 then the train is Westbound andotherwise Eastbound

### Michalski’s train problem – Demo

• http://www.doc.ic.ac.uk/~shm/Software/progol5.0

• Use the Progol input file for Michalski's train problem
• http://www.comp.rgu.ac.uk/staff/chb/teaching/cmm510/michalski_train_data

• Generate the hypotheses

### Summary

• ILP is a subfield of machine learning which uses logic programming as a uniform representation for
• Examples
• Background knowledge
• Hypotheses
• Many existing ILP systems
• Given an encoding of the known background knowledge and a set of examples represented as a logical database of facts, an ILP system will derive a hypothesised logic program which entails all the positive and none of the negative examples
• Lots of applications of ILP
• E.g. bioinformatics, natural language processing, engineering
• IPL is an active research filed

### References

• S.H. Muggleton. Inductive Logic Programming. New Generation Computing , 8(4):295-318, 1991.
• S.H. Muggleton and L. De Raedt. Inductive logic programming: Theory and methods. Journal of Logic Programming , 19,20:629-679, 1994.
• N. Lavrac and S. Dzeroski. Inductive Logic Programming: Techniques and Applications. 1994.
• http://www-ai.ijs.si/SasoDzeroski/ILPBook
• Wikipedia:
• http://en.wikipedia.org/wiki/Inductive_logic_programming

### Outline

• Motivation
• Technical Solution
• Definitions
• Concept Lattice
• Illustration by a Larger Example
• Extension
• Summary

### Introduction

• In many applications we deal with huge amount of data
• E.g. insurance company records
• Data by itself are not useful to support decisions
• E.g. can I make an insurance contract to this new customer or it is too risky?
• Thus we need a set of methods to generate from a data set a “summary” that represent a conceptualization of the data set
• E.g. what are similarities among different customers of an insurance company that divide them in different risk classes?
• Age >25, City=Innsbruck => Low Risk
• This is a common task that is needed in several domains to support data analysis
• Analysis of children suffering from diabetes
• Marketing analysis of store departments or supermarkets
• Formal Concept Analysis is a technique that enables resolution of such problems

### What is a Concept?

• What drives us to call an object a “bird”?
• Every object having certain attributes is called “bird”:
• A bird has feathers
• A bird has two legs
• A bird has a beak
• All objects having these attributes are called “birds”:
• Duck, goose, owl and parrot are birds
• Penguins are birds, too

### What is a Concept?

• This description of the concept “bird” is based on sets of

• Objects, attributes and a relation form a formal concept

### What is a Concept?

• So, a formal concept is constituted by two parts
• having a certain relation:
• every object belonging to this concept has all the attributes in B
• every attribute belonging to this concept is shared by all objects in A
• A is called the concept's extent
• B is called the concept's intent

### The Universe of Discourse

• A repertoire of objects and attri