INTRODUCTION
|
Title |
|
1 |
Introduction |
|
2 |
Propositional Logic |
|
3 |
Predicate Logic |
|
4 |
Reasoning |
|
5 |
Search Methods |
|
6 |
CommonKADS |
|
7 |
Problem-Solving Methods |
|
8 |
Planning |
|
9 |
Software Agents |
|
10 |
Rule Learning |
|
11 |
Inductive Logic Programming |
|
12 |
Formal Concept Analysis |
|
13 |
Neural Networks |
|
14 |
Semantic Web and Services |
|
||
|
![]() |
|
|
![]() |
Introduction
Propositional logic
Predicate logic
Reasoning
Search methods
CommonKADS
Problem-solving methods
Planning
Software Agents
Rule learning
Inductive logic programming
Formal concept analysis
Neural networks
Semantic Web and Services
Motivation
What is “Intelligence”?
What is “Artificial Intelligence” (AI)?
Strong AI vs. Weak AI
Technical Solution
Symbolic AI vs. Subsymbolic AI
Knowledge-based systems
Popular AI systems
Subdomains of AI
Some relevant people in AI
Summary
Introduction to Artificial Intelligence
"Intelligence denotes the ability of an individual to adapt his thinking to new demands; it is the common mental adaptability to new tasks and conditions of life" (William Stern, 1912)
Being "intelligent" means to be able to cognitively grasp phenomena, being able to judge, to trade of between different possibilities, or to be able to learn.
An important aspect of "Intelligence" is the way and efficiency how humans are able to adapt to their environment or assimilate their environment for solving problems.
Intelligence manifests itself in logical thinking, computations, the memory capabilities of the brain, through the application of words and language rules or through the recognition of things and events.
The combination of information, creativity, and new problem solutions is crucial for acting "intelligent".
Turing test is a proposal to test a machine’s ability to demonstrate “intelligence”
Turing test proceeds as follows:
A human judge C engages in a natural language conversation with one human B and one machine A , each of which tries to appear human.
All participants are placed in isolated locations.
If the judge C cannot reliably tell the machine A from the human B , the machine is said to have passed the test.
In order to test the machine's intelligence rather than its ability to render words into audio, the conversation is limited to a text-only channel such as a computer keyboard or screen
Turing test is an operational test for intelligent behaviour. For more details see [2].
The “Chinese room” experiment developed by John Searle in 1980 attempts to show that a symbol-processing machine like a computer can never be properly described as having a ”mind” or “understanding”, regardless of how intelligently it may behave.
With the “Chinese room” John Searle argues that it is possible to pass the Turing Test, yet not (really) think.
Source: http://en.wikipedia.org/wiki/Chinese_room
The “Chinese room” experiment proceeds as follows:
Searle, a human, who does not knows Chinese, is locked in a room with an enormous batch of Chinese script.
Slips of paper with still more Chinese script come through a slot in the wall.
Searle has been given a set of rules in English for correlating the Chinese script coming through with the batches of script already in the room.
Searle is instructed to push back through the slot the Chinese script with which the scripts coming in through the slot are correlated according to the rules.
Searle identifies the scripts coming in and going out on the basis of their shapes alone. He does not speak Chinese, he does not understand them
The scripts going in are called ‘the questions’, the scripts coming out are ‘the answers’, and the rules that Searle follows is ‘the program’.
Suppose also that the set of rules, the program is so good and Searle gets so good at following it that Searle’s answers are indistinguishable from those of a native Chinese speaker.
The result:
It seems clear that Searle nevertheless does not understand the questions or the answers
But Searle is behaving just a computer does, “performing computational operations on formally specified elements”
Hence, manipulating formal symbols, which is just what a computer running a program does, is not sufficient for understanding or thinking
Many definitions exist, among them:
“The study of the computations that make it possible to perceive, reason, and act” (Winston, 1992)
“A field of study that seeks to explain and emulate [human] intelligent behaviour in terms of computational processes” (Schalkoff, 1990)
It is an interdisciplinary field that is based on results from philosphy, psychology, linguistics, or brain sciences
Difference to “traditional” computer science: Emphasis on cognition, reasoning, and acting
Generative theory of intelligence:
Intelligence emerges from the orchestration of multiple processes
Process models of intelligent behaviour can be investigated and simulated on machines
Two main aspects begin to manifest in the early days of AI
Cognitive modelling, i.e., the simulation of cognitive processes through information processing models
The construction of “intelligent systems” that make certain aspects of human cognition and reasoning available.
Symbolic vs. Subsymbolic AI
Knowledge-based Systems
Research on Information Processing in AI by
Exact formulisations.
Exemplary realisation via implementations.
Core aspect: Representation and processing of symbols as a foundation of internal processes.
Symbols are naming objects which provide access to meaning (Newell, 1958)
“Spoken words are the symbols of mental experience, and written words are the symbols of spoken words.” (Aristotle) [3]
Mental abilities of humans can be inspected on a symbolic level independent of neuronal architectures or processes.
Subject of Symbolic AI is thus the meaning of processes (or their symbolic representations respectively).
Symbolic AI aims to imitate intelligence via formal models.
Main persons behind symbolic AI are: Simon, Newell, Minsky
Core paradigm of symbolic AI is the “Intelligent Agent” [4]:
has a memory and the capability to act in his world based on it.
has sensors to perceive information from his environment.
has actuators to influence the external world.
has the capability to probe actions. By that he is able to choose the best possible action.
has internal memory for methods and the exploration of the world is guided by knowledge kept in it.
Subsymbolic AI (SSAI) aims to model intelligence empirically.
SSAI was inspired by biological systems: A model which imitates neural nets in the brain is the basis for the creation of artificial intelligence.
Neural nets consist of a network of
neurons which have weighted connections
with each other.
Early work by Rosenblatt (1962):
the “Perceptron” [6]
Advantages of artificial neuronal nets:
Distributed representation
Representation and processing of fuzziness
Highly parallel and distributed action
Speed and fault-tolerance
Image: http://www.neuronalesnetz.de
KNOWLEDGE-BASED SYSTEMS
General Problem Solver
Knowledge-is-power hypothesis
Knowledge levels
3a. Newell’s 3 levels of knowledge
3b. Brachman’s 5 levels of knowledge
Problem Solving Methods
The General Problem Solver (GPS) is a universal problem solving approach.
GPS is the first approach that makes the distinction between knowledge of problems domains and how to solve problems
GPS is domain and task independent approach.
GPS does not put any restrictions both on the domain knowledge and on the task.
Examples of GPS are: automated theorem proving and generic search methods
Automatic theorem provers are GPS for which every problem can be expressed as logical inference
Automated theorem proving is about proving of mathematical theorems by a computer program
Generic Search Methods are GPS for which every problem can be expressed as search
One particular example of a Generic Search Method is the A* algorithm.
A* works for problems that can be represented as a state space i.e. a graph of states. Initial conditions of the problem are represented as start state , goal conditions are represented as end state
A* is an informed search or heuristic search approach that uses the estimation function:
g(n) the cost to get from the star state to current state n
h(n) estimated cost to get from current state n to end state
f(n) estimated total cost from start state through current state n to the end state
More in Lecture 5
However, GPS has a set of limitations :
It works in theory but in practice works only on toy problems (e.g. Tower of Hanoi)
Could not solve real-world problems because search was easily lost in the combinatorial explosion of intermediate states
Knowledge-is-power hypothesis , also called the Knowledge Principle was formulated by E.A. Feigenbaum in 1977:
“knowledge of the specific task domain in which the program is to do its problem solving was more important as a source of power for competent problem solving than the reasoning method employed” [15]
The Knowledge-is-power hypothesis shifted the focus on how to build intelligent systems from inference to the knowledge base .
Problem solving is guided by experiential, qualitative, heuristic knowledge.
The meaning of intelligence as knowledge is the common meaning in English world.
The Central Intelligence Agency (CIA) defines intelligence as knowledge.
The Knowledge-is-power hypothesis lead to the emergence of a new filed i.e. expert systems and a new profession i.e. knowledge engineer
3a. Newell’s 3 levels of knowledge
3b. Brachman’s 5 levels of knowledge
In his work from 1981, Newell tried to answer questions such as
How can knowledge be characterised?
What is the relation of this characterisation and the representation of knowledge?
What is characteristic about a system which holds knowledge?
Newell distinguished 3 levels in the context of knowledge representation:
Knowledge Level
Logical Level
Implementation Level
Knowledge Level
The most abstract level of representing knowledge.
Concerns the total knowledge contained in the Knowledge Base
Example: The automated DB-Information system knows that a trip from Innsbruck to Vienna costs 120€
Logical Level
Encoding of knowledge in a formal language.
Example: Price(Innsbruck, Vienna, 120)
Implementation Level
The internal representation of the sentences.
Example:
As a String “Price(Innsbruck, Vienna, 120)”
As a value in a matrix
Brachman defines 5 levels for different types of representations.
Levels interpret the transition from data to knowledge.
Each level corresponds to an explicit set of primitives offered to the knowledge engineer.
Ordering of knowledge levels from simple/abstract to complex/concrete:
Implementational Level
Logical Level
Epistemological Level
Conceptual Level
Linguistic Level
Implementational Level
The primitives are pointers and memory cells.
Allows the construction of data structures with no a priori semantics
Logical Level
The primitives are logical predicates, operators, and propositions.
An index is available to structure primitives.
A formal semantic is given to primitives in terms of relations among objects in the real world
No particular assumption is made however as to the nature of such relations
Epistemological Level
The primitives are concept types and structuring relations.
Structuring relations provide structure in a network of conceptual types or units. (i.e. inheritance: conceptual units, conceptual sub-units)
The epistemological level links formal structure to conceptual units
It contains structural connections in our knowledge needed to justify conceptual inferences.
Conceptual Level
The primitives are conceptual relations, primitive objects and actions.
The primitives have a definite cognitive interpretation, corresponding to language-independent concepts like elementary actions or thematic roles
Linguistic Level
The primitives are words, and (lingustic) expressions.
The primitives are associated directly to nouns and verbs of a specific natural language
Arbitrary relations and nodes that exist in a domain
Problem Solving Methods (PSM) abstract from details of the implementation of the reasoning process.
Characteristics of PSM [10]:
A PSM specifies which inference actions have to be carried out for solving a given task.
A PSM determines the sequence in which these actions have to be activated.
Knowledge roles determine which role the domain knowledge plays in each inference action.
Generic inference pattern “Heuristic Classification” describes the problem-solving behaviour of these systems on the Knowledge Level in a generic way.
The propose & revise method is an efficient method for solving the task of parametric design . (see more details in [14])
The method dependes on the following inferences:
propose – derives an initial design based on the requirements;
C-test – requires knowledge that describes which possible designs are valid (i.e., the domain constraints);
revise – tries to improve an incorrect design based on the feedback of the C-test step.
More in Lecture 7
KBS are realized based on a knowledge base (KB).
KB contains a model represented in a (logical) formalism which can be interpreted by an interpreter (inference engine) that is able draw conclusions from it.
KBs typically capture knowledge of a domain.
Methodologies for the development of KBS: e.g. CommonKADS
Examples: CYC
One of the first systems that aimed to capture common knowledge in a knowledge base
Special form of a KBS.
Definition: An expert system is a software application that stores knowledge about a certain domain. It is able to draw conclusions from that knowledge and offers concrete solutions for problems in that domain.
ES simulate human experts and thus the knowledge base typically consists of highly specialized expert knowledge.
Reasoning of human experts vs. reasoning in ES:
Human experts are able to master unforeseen effects and situations.
Human experts are able to learn from experiences.
Human experts expand their knowledge continuously.
Human experts derive new knowledge not only based on drawn. conclusions but via analogy and intuition.
POPULAR AI SYSTEMS
Early computer program capable of natural language processing.
Written by J. Weizenbaum between 1964 and 1966.
ELIZA simulated a psychotherapist by reformulating questions posed by the user.
Sample ELIZA conversation:
(Source: Wikipedia)
More information: [9]
Chess-playing computer developed by IBM that won against world champion Garry Kasparov in 1997.
Applied a brute force strategy, processing
was highly parallel.
Evaluation of 200 million positions per second.
Deep Blue's knowledge base contained over
4,000 positions and 700,000 grandmaster
games.
It was fine-tuned by chess grand masters.
Admission from IBM: „Deep Blue, as it stands
today, is not a "learning system." It is therefore
not capable of utilizing artificial intelligence to
either learn from its opponent or "think" about
the current position of the chessboard.“
Link: http://www.research.ibm.com/deepblue/
Project at the MIT Artificial Intelligence Lab
The goal of the COG project was to build a robot capable of interacting with humans and objects in a human-like way.
"As I pondered [this] and thought about HAL, I decided to try to build the first serious attempt at a robot with human-level capabilities, the first serious attempt at a HAL-class being." (Rodney Brooks, Inventor of COG)
Link: http://groups.csail.mit.edu/lbr/humanoid-robotics-group/cog/
DARPA funded project, “Personal assistant that learns” – program
Involves 25 partners, 300+ researchers, including top researchers in AI
500+ publications in first four years
“The goal of the project is to create cognitive software systems, that is, systems that can reason, learn from experience, be told what to do, explain what they are doing, reflect on their experience, and respond robustly to surprise. “ (calosystem.org)
CALO assists its user with six high-level functions:
Organizing and Prioritizing Information
Preparing Information Artifacts
Mediating Human Communications
Task Management
Scheduling and Reasoning in Time
Resource allocation
Link: http://www.calosystem.org/
An advanced device capable of performing a variety of tasks and interacting with its human users (companions?).
The HAL9000 communicates by voice and can control a auxiliary devices on a spaceship.
It (he?) has an unfortunate tendency towards obsessing over minor details or inconsistencies in the instructions given it, however.
In the events described in Arthur C. Clarke's “2001: A Space Odyssey,” HAL's tendency toward obsessive literalism led to the unfortunate death of most of its spaceship's human crew
SEAS (“Synthetic Environment for Analysis and Simulation”)
Can be used to simulate realistic events; has a world model
http://www.krannert.purdue.edu/centers/perc/html/aboutperc/seaslabs/seaslabs.htm
SYSTRAN
Early machine translation system
Foundation for Yahoo’s Babelfish or Google Translator
http://www.systransoft.com/
VirtualWoman
Virtual-reality based chatbot
http://virtualwoman.net/
For further references, see http://en.wikipedia.org/wiki/List_of_notable_artificial_intelligence_projects
SUBDOMAINS OF AI
Cognition as information processing
Artificial neuronal networks
Heuristic search methods
Knowledge representation and logic
Automatic theorem proving
Non-monotonic reasoning
Case-based reasoning
Planning
Machine Learning
Knowledge Engineering
Natural Language Processing
Image Understanding
Cognitive Robotics
Software Agents
Deals with complex software systems that directly interact and communicate with human users.
Characteristics of cognitive systems (CS):
CS are directly integrated in their environment, act in it, and are able to communicate with it.
CS are able to direct and adapt their actions based on the environment they are situated in.
CS typically represent system-relevant aspects of the environment.
Their information processing capabilities are characterized through learning aptitude and anticipation
Examples of cognitive system:
Organisms / biological cognitive systems
Technical systems such as robots or agents
Mixed human-machine systems
Neural networks are networks
of neurons as in the real
biological brain.
Neurons are highly specialized
cells that transmit impulses within
animals to cause a change in a target
cell such as a muscle effector cell or glandular cell.
The axon , is the primary conduit through which the neuron transmits impulses to neurons downstream in the signal chain
Humans: 1011 neurons of > 20 types, 1014 synapses, 1ms-10ms cycle time
Signals are noisy “spike trains” of electrical potential
What we refer to as Neural Networks in the course are mostly Artificial Neural Networks (ANN).
ANN are approximation of biological neural networks and are built of physical devices, or simulated on computers.
ANN are parallel computational entities that consist of multiple simple processing units that are connected in specific ways in order to perform the desired tasks.
Remember: ANN are computationally primitive approximations of the real biological brains .
Application examples : e.g., handwriting recognition, time series prediction, kernel machines (support vector machines, data compression, financial predication, speech recognition, computer vision, protein structures
Search Methods are typically helping humans to solve complex tasks by generating (optimal) plans (i.e. a set of operations / states) that includes sequences / actions to reach a goal state.
Example problem: Tower of Hanoi
|
|
Definition: A search method is defined by picking the order of node expansion.
Search strategies are evaluated according to completeness, time complexity, space complexity, optimality.
Time and space complexity are measured in terms of maximum branching, depth of the least-cost solution, maximum depth of the state space
Distinction between informed / uninformed search techniques
The term knowledge representation describes the design and implementation of formalisms, to model a part of the reality (a domain).
A model represented using a formalisms and implemented by an interpreter is often called a knowledge base.
A knowledge base is a collection of facts and beliefs.
Modelling of knowledge bases happens on a conceptual level.
Intention: To model a domain of discourse and to draw inferences about the objects in the domain (reasoning)
Logic studies the principles of reasoning and offers
Formal languages for expressing knowledge
Well understood formal semantics
Reasoning methods to make implicit knowledge explicit
Automatic theorem proving deals with the design and implementation of computer programmes that are capable of making mathematical proofs.
Theorem provers deduce new formulas from given formulas via logical deduction rules until the target formula is found.
Theoretical foundation of automated theorem proving: mathematical logic; typically first-order-logic.
Formulas are mathematically precisely defined via interpretations (provide semantics for function and predicate symbols via mappings and relations respectively)
Classical logic is monotonic in the following sense: whenever a sentence A is a logical consequence of a set of sentences T, then A is also a consequence of an arbitrary superset of T [13].
Non-monotonic reasoning:
Additional information may invalidate conclusions.
Non-monotonic reasoning is closer to (human) common-sense reasoning.
Most rules in common-sense reasoning only hold with exceptions (i.e. university_professors_teach)
Important approaches to formalise non-monotonic reasoning:
Default-Logics: Non-classical inference rules are use to represent defaults
The modale approach: Modal operators are used to explicitely declare if sth. is believed in or is consistent.
Circumscription: Validity can be restricted to specific models.
Conditional approaches: A conditional junctor is used to represent defaults in a logical language.
Definition: A “case” is an experience made during the solving of a problem.
A case is typically informally given and covers the problem and the solution.
Experiences (resp. cases) are used to solve newly occurring problems.
Cases are collected in a so-called case-base (analogous to a knowledge base in KBS)
Case-based reasoning is inspired by human problem solving capabilities.
Application scenarios are characterized through:
A considerable amount of cases has to be available
Using the cases to solve the problem has to be easier than solving the problem directly.
Available information is incomplete or unsecure and imprecise.
The construction of a KBS and the modelling of cases is not easily possible.
Typical application scenarios can be found in the area of diagnostics, electronic sales, configaration, or planning.
What is Planning?
“Planning is the process of thinking about the activities required to create a desired goal on some scale” [Wikipedia]
We take a more pragmatic view – planning is a flexible approach for taking complex decisions:
decide about the schedule of a production line;
decide about the movements of an elevator;
decide about the flow of paper through a copy machine;
decide about robot actions.
By “flexible” we mean:
the problem is described to the planning system in some generic language;
a (good) solution is found fully automatically;
if the problem changes, all that needs to be done is to change the description.
Planning looks at methods to solve any problem that can be described in the language chosen for the particular planning system.
Approaches for the generation of action sequences: action planning and situated activity planning.
Machine Learning (ML) is a central research area in AI to acquire knowledge.
ML deals with the computer-aided design and realisation of learning problems.
Learning is defined as the process that enables a systems to perform better during solving of the same or similar tasks in the future (Simon, 1983)
Reduction of learning to mathematical theories: Deduction, Induction, Abduction.
Learning task is typically characterized through the description of inputs, expected outputs, and environmental conditions.
Typical machine learning applications: Data mining, Speech recognition, text analysis, control learning, hidden markov networks, etc.
Knowledge engineering is concerned with the acquisition, management, use and transformation of knowledge.
Goals are similar to software engineering, i.e. to systematically develop expert systems using existing methods and tools.
Core process in knowledge engineering: knowledge acquisition; During knowledge acquisition knowledge is formalised, i.e. transformed from a natural language representation to a formal representation.
Process models for knowledge acquisition: Model by Puppe; model by Buchanan; Harmon/Mauss/Morrissey; Waterman; or MIKE
Methodical approachs and tools: D3; CommonKADS; MIKE; Protégé-II; RASSI
Application cases include the development of expert systems, workflow systems or knowledge management
Goal: Processing and understanding of speech or written language.
Early applications include question-answer systems, natural-language based access to databases or speech-based control of robots.
Challenges include information re-construction from spoken words or information selection and reduction during speech production.
Application areas: Tools for inter-human communication, tools for text generation or text correction (i.e. identification of grammatical errors based on language models), information classification or filtering, or human-machine communication.
Image Understanding (IU) deals with the analysis and interpretation of visual information. IU denotes the reconstruction and interpretation of scenes based on images.
Early approaches based on pattern recognition (still one of the most important foundations of this field)
Prominent application: object recognition of still and moving objects
Application areas: symbol recognition, medical image analysis, vehicle navigation, image archiving, gesture recognition,
AI deals with the development of robots as autonomous and intelligent systems.
Robotic covers many sub-areas of AI and involves interdisciplinary work including mechanic and electrical design and cognitive areas.
Types of robots: static robots, mobile robots, and humanoid robots.
Application areas: construction, planning, or observation.
Core paradigm in AI.
Definition: A software agent is a long-term operating program whose function can be described as autonomous execution of tasks or tracing of goals via interaction with his environment.
Agent (see earlier slide)
has a memory and the capability to act in his world based on it.
has sensors to perceive information from his environment.
has actuators to influence the external world.
has the capability to probe actions
has internal memory for methods and the exploration of the world is guided by knowledge kept in it.
Applications: Data collection and filtering, event notification, planning and optimization in various application areas (commerce, production, military, education)
SOME RELEVANT PEOPLE IN AI
|
|||||||||||
|
|
SUMMARY
Birth of AI in the 1950s
Broad spectrum of subdomains and combination of disciplines
Distinction between
Weak and strong AI
Symbolic and subsymbolic AI
Central role: symbols and knowledge representation
Knowledge-based systems and intelligent agents are core concepts in AI
REFERENCES
Wikipedia links:
http://en.wikipedia.org/wiki/List_of_notable_artificial_intelligence_projects
http://en.wikipedia.org/wiki/Turing_test
http://en.wikipedia.org/wiki/General_Problem_Solver
Mandatory reading:
[1] G. Görz, C.-R. Rollinger, J. Schneeberger (Hrsg.) “Handbuch der künstlichen Intelligenz” Oldenbourg Verlag, 2003, Fourth edition
Further reading:
[2] A. Turing. "Computing Machinery and Intelligence", Mind LIX (236): 433–460, Ocotober, 1950.
[3] Aristotle “On Interpretation”, 350 B.C.E, see: http://classics.mit.edu/Aristotle/interpretation.html
[4] A. Newell, H.A. Simon, “Human Problem Solving” Englewood Cliffs, N.J.: Prentice Hall, 1972
[5] A. Newell. “The Knowledge Level”, AI Magazine 2 (2), 1981, p. 1-20.
[6] F. Rosenblatt. “Strategic Approaches to the Study of Brain Models” In: Förster, H.: Principles of Self-Organization. Elmsford, N.Y.: Pergamon Press, 1962.
[7] S. Russell, E.H. Wefald. "Do the Right Thing: Studies in Limited Rationality" MIT Press, 1991.
[8] C. Beierle and G. Kern-Isberner "Methoden wissensbasierter Systeme. Grundlagen, Algorithmen, Anwendungen" Vieweg, 2005.
[9] J. Weizenbaum. "ELIZA - A Computer Program For the Study of Natural Language Communication Between Man And Machine", Communications of the ACM 9 (1): p. 36–45, 1966.
[10] W. Birmingham and G. Klinker “Knowledge Acquisition Tools with Explicit Problem-Solving Methods” The Knowledge Engineering Review 8, 1 (1993), 5-25
[11] A. Newell and H. Simon "GPS, a program that simulates human thought" In: Computation & intelligence: collected readings, pp. 415 - 428, 1995.
[12] R. J. Brachman “On the Epistemological Status of Semantic Networks” In: N.V. Findler (ed.): Associative Networks: Representation and Use of Knowledge by Computers. New York: Academic Press, 1979, 3-50.
[13] G. Brewka, I. Niemelä, M. Truszczynski “Nonmonotonic Reasoning” In: V. Lifschitz, B. Porter, F. van Harmelen (eds.), Handbook of Knowledge Representation, Elsevier, 2007, 239-284
[14] D. Fensel “Problem-Solving Methods: Understanding, Description, Development and Reuse”,, Springer LNAI 1791, 2000
[15] E.A. Feigenbaum. “The Art of Artificial Intelligence: Themes and Case Studies of Knowledge Engineering,” Proceedings of the International Joint Conference on Artificial Intelligence, Cambridge, MA, 1977
[16] W.J. Clancey. “Heuristic Classification”, Artificial Intelligence , 27:289-350, 1985
|
Title |
|
1 |
Introduction |
|
2 |
Propositional Logic |
|
3 |
Predicate Logic |
|
4 |
Reasoning |
|
5 |
Search Methods |
|
6 |
CommonKADS |
|
7 |
Problem-Solving Methods |
|
8 |
Planning |
|
9 |
Software Agents |
|
10 |
Rule Learning |
|
11 |
Inductive Logic Programming |
|
12 |
Formal Concept Analysis |
|
13 |
Neural Networks |
|
14 |
Semantic Web and Services |
Questions?
Propositional Logic
|
Title |
|
1 |
Introduction |
|
2 |
Propositional Logic |
|
3 |
Predicate Logic |
|
4 |
Reasoning |
|
5 |
Search Methods |
|
6 |
CommonKADS |
|
7 |
Problem-Solving Methods |
|
8 |
Planning |
|
9 |
Software Agents |
|
10 |
Rule Learning |
|
11 |
Inductive Logic Programming |
|
12 |
Formal Concept Analysis |
|
13 |
Neural Networks |
|
14 |
Semantic Web and Services |
Motivation
Technical Solution
Syntax
Semantics
Inference
Illustration by a Larger Example
Extensions
Summary
References
MOTIVATION
Logic is used to formalize deduction
Deduction = derivation of true statements (called conclusions ) from statements that are assumed to be true (called premises )
Natural language is not precise, so the careless use of logic can lead to claims that false statements are true, or to claims that a statement is true, even tough its truth does not necessarily follow from the premises
=> Logic provides a way to talk about truth and correctness in a rigorous way, so that we can prove things, rather than make intelligent guesses and just hope they are correct
Propositional logic is a good vehicle to introduce basic properties of logic; used to:
Associate natural language expressions with semantic representations
Evaluate the truth or falsity of semantic representations relative to a knowledge base
Compute inferences over semantic representations
One of the simplest and most common logic
The core of (almost) all other logics
An unambiguous formal language, akin to a programming language
Syntax: Vocabulary for expressing concepts without ambiguity
Semantics: Connection to what we're reasoning about
Interpretation - what the syntax means
Reasoning: How to prove things
What steps are allowed
TECHNICAL SOLUTIONS
SYNTAX
Logical constants: true, false
Propositional symbols: P, Q, S, ...
Wrapping parentheses: ( … )
Atomic formulas: Propositional Symbols or logical constants
Formulas are either atomic formulas, or can be formed by combining atomic formulas with the following connectives:
∧ ...and [conjunction]
∨ ...or [disjunction]
→...implies [implication / conditional]
↔ ...is equivalent [biconditional]
¬ ...not [negation]
A sentence (well formed formula) is defined as follows:
A symbol is a sentence
If S is a sentence, then ¬S is a sentence
If S is a sentence, then (S) is a sentence
If S and T are sentences, then (S ∨ T), (S ∧ T), (S → T), and (S ↔ T) are sentences
A sentence results from a finite number of applications of the above rules
Sentence → AtomicSentence | ComplexSentence
AtomicSentence → True | False | P | Q | R | ...
ComplexSentence → ( Sentence ) | Sentence Connective Sentence | ¬ Sentence
Connective → ∧ | ∨ | → | ↔
Ambiguities are resolved through precedence ¬
∧ ∨
↔
or parentheses
e.g. ¬ P ∨ Q ∧ R ⇒ S is equivalent to (¬ P) ∨ (Q ∧ R)) ⇒ S
P means "It is hot. " | ¬ p ∧ ¬ q |
Q means "It is humid. " | ¬(p ∧ q) |
R means "It is raining. " | (p ∧ q) ∨ r |
(P ∧ Q) → R "If it is hot and humid, then it is raining " |
p ∧ q ∧ r (((¬ p) ∧ q) → r) ↔ ((¬ r) ∨ p) |
Q → P "If it is humid, then it is hot " |
(¬ (p ∨ q) → q) → r ((¬ p) ∨ (¬ q)) ↔ (¬ r) Etc. |
SEMANTICS
Given the truth values of all symbols in a sentence, it can be “evaluated” to determine its truth value (True or False)
A model for a KB is a “possible world” (assignment of truth values to propositional symbols) in which each sentence in the KB is True
A valid sentence or tautology is a sentence that is True under all interpretations, no matter what the world is actually like or how the semantics are defined (example: “It’s raining or it’s not raining”)
An inconsistent sentence or contradictio n is a sentence that is False under all interpretations (the world is never like what it describes, as in “It’s raining and it’s not raining”)
P entails Q , written P ⊧ Q, means that whenever P is True, so is Q; in other words, all models of P are also models of Q
In propositional logic, truth values are assigned to the atoms of a formula in order to evaluate the truth value of the formula
An assignment is a function
v : P → {T,F}
v
assigns a truth value to any atom in a given formula (
P
is the set of all propositional letters, i.e. atoms)
Suppose
F
denotes the set of all propositional formulas. We can extend an assignment v to a function
v : F → {T,F}
which assigns the truth value v(A) to any formula A in F. v is called an interpretation.
v(p) = F, v(q) = T.
If A = (¬p → q) ↔ (p V q) , what is v(A) ?
Solution:
v(A) = v ((¬ p → q ) ↔ ( p V q ))
= v (¬ p → q ) ↔ v ( p V q )
= ( v (¬ p ) → v ( q )) ↔ ( v ( p ) V v ( q ))
= (¬ v ( p ) → v ( q )) ↔ ( v ( p ) V v ( q ))
= (¬F → T) ↔ (F V T)
= (T → T) ↔ (F V T)
= T ↔ T
= T
If A,B are formulas are such that
v ( A ) = v ( B )
for all interpretations v , A is (logically) equivalent to B:
A ≡ B
Example: ¬p V q ≡ p → q since both formulas are true in all interpretations except when v ( p ) = T, v ( q ) = F and are false for that particular interpretation
Caution : ≡ does not mean the same thing as ↔ :
A ↔ B is a formula (syntax)
A ≡ B is a relation between two formula (semantics)
Theorem : A ≡ B if and only if A ↔ B is true in every interpretation; i.e. A ↔ B is a tautology .
A propositional formula A is satisfiable iff v ( A ) = T in some interpretation v; s uch an interpretation is called a model for A .
A is unsatisfiable (or, contradictory) if it is false in every interpretation
A set of formulas U = { A 1 ,A 2 ,…,A n } is satisfiable iff there exists an interpretation v such that v ( A 1 ) = v ( A 2 ) = … = v ( A n ) = T; such an interpretation is called a model of U.
U is unsatisfiable if no such interpretation exists
Relevant properties:
If U is satisfiable, then so is U − {Ai} for any i = 1, 2,…, n
If U is satisfiable and B is valid, then U U { B } is also satisfiable
If U is unsatisfiable and B is any formula, U U { B } is also unsatisfiable
If U is unsatisfiable and some A i is valid, then U − {Ai} is also unsatisfiable
A is valid (or, a tautology), denoted ⊧ A , iff v ( A ) = T , for all i nterpretations v
A is not valid (or, falsifiable), denoted ⊭ A if we can find some interpretation v , such that v(A) = F
Relationship between validity, satisfiability, falsifiability, and unsatisfiability:
Examples:
Valid (tautology):
Not valid, but satisfiable:
False (contradiction):
Theorem:
(a) A is valid if and only if ¬A is unsatisfiable
(b) A is satisfiable if and only if ¬A is falsifiable
Let U be a set of formulas and A a formula. A is a (logical) consequence of U , if any interpretation v which is a model of U is also a model for A :
U ⊧ A
Example:
If some interpretation v is a model for the set
it must satisfy but in this interpretation, we also have
A set of formulas T is a theory if it is closed under logical consequence. This means that, for every formula A , if T ⊧ A , then A is in T
Let U be a set of formulas. Then, the set of all consequences of U
T ( U ) = { A | U ⊧ A }
is called the theory of U . The formulas in U are called the axioms for the theory T(U) .
INFERENCE
Several basic methods for determining whether a given set of premises propositionally entails a given conclusion
Truth Table Method
Deductive (Proof) Systems
Resolution
One way of determining whether or not a set of premises logically entails a possible conclusion is to check the truth table for the logical constants of the language
This is called the truth table method and can be formalized as follows:
Step 1: Starting with a complete truth table for the propositional constants, iterate through all the premises of the problem, for each premise eliminating any row that does not satisfy the premise
Step 2: Do the same for the conclusion
Step 3: Finally, compare the two tables; If every row that remains in the premise table, i.e. is not eliminated, also remains in the conclusion table, i.e. is not eliminated, then the premises logically entail the conclusion
Simple sentences:
Amy loves Pat: lovesAmyPat
Amy loves Quincy: lovesAmyQuincy
It is Monday: ismonday
Premises:
If Amy loves Pat, Amy loves Quincy:
lovesAmyPat → lovesAmyQuincy
If it is Monday, Amy loves Pat or Quincy:
ismonday → lovesAmyPat ∨ lovesAmyQuincy
Question:
If it is Monday, does Amy love Quincy?
i.e. is ismonday → lovesAmyQuincy entailed by the premises?
|
|||||
lovesAmyPat |
lovesAmyQuincy |
ismonday |
lovesAmyPat → lovesAmyQuincy |
ismonday → lovesAmyPat ∨ lovesAmyQuincy |
|
T |
T |
T |
T |
T |
|
T |
T |
F |
T |
T |
|
T |
F |
T |
F |
T |
|
T |
F |
F |
F |
T |
|
F |
T |
T |
T |
T |
|
F |
T |
F |
T |
T |
|
F |
F |
T |
T |
F |
|
F |
F |
F |
T |
T |
|
|||||
lovesAmyPat |
lovesAmyQuincy |
ismonday |
lovesAmyPat → lovesAmyQuincy |
ismonday → lovesAmyPat ∨ lovesAmyQuincy |
|
T |
T |
T |
T |
T |
|
T |
T |
F |
T |
T |
|
T |
F |
T |
F |
T |
|
T |
F |
F |
F |
T |
|
F |
T |
T |
T |
T |
|
F |
T |
F |
T |
T |
|
F |
F |
T |
T |
F |
|
F |
F |
F |
T |
T |
|
||||
lovesAmyPat |
lovesAmyQuincy |
ismonday |
ismonday → lovesAmyQuincy |
|
T |
T |
T |
T |
|
T |
T |
F |
T |
|
T |
F |
T |
F |
|
T |
F |
F |
T |
|
F |
T |
T |
T |
|
F |
T |
F |
T |
|
F |
F |
T |
F |
|
F |
F |
F |
T |
|
||||
lovesAmyPat |
lovesAmyQuincy |
ismonday |
ismonday → lovesAmyQuincy |
|
T |
T |
T |
T |
|
T |
T |
F |
T |
|
T |
F |
T |
F |
|
T |
F |
F |
T |
|
F |
T |
T |
T |
|
F |
T |
F |
T |
|
F |
F |
T |
F |
|
F |
F |
F |
T |
Finally, in order to make the determination of logical entailment, we compare the two rightmost tables and notice that every row remaining in the premise table also remains in the conclusion table.
In other words, the premises logically entail the conclusion.
The truth table method has the merit that it is easy to understand
It is a direct implementation of the definition of logical entailment.
In practice, it is awkward to manage two tables, especially since there are simpler approaches in which only one table needs to be manipulated
Validity Checking
Unsatisfability Checking
Approach: To determine whether a set of sentences
{ φ1,…,φn }
logically entails a sentence j, form the sentence
( φ1 ∧…∧ φn → φ )
and check that it is valid.
To see how this method works, consider the previous example and write the tentative conclusion as shown below.
(lovesAmyPat → lovesAmyQuincy) ∧ (ismonday → lovesAmyPat ∨ lovesAmyQuincy) → (ismonday → lovesAmyQuincy)
Then, form a truth table for our language with an added column for this sentence and check its satisfaction under each of the possible interpretations for our logical constants
It is almost exactly the same as the validity checking method, except that it works negatively instead of positively.
To determine whether a finite set of sentences {φ1,…,φn} logically entails a sentence φ, we form the sentence
(φ1 ∧…∧ φn ∧ ¬ φ)
and check that it is unsatisfiable .
Both the validity checking method and the satisfiability checking method require about the same amount of work as the truth table method, but they have the merit of manipulating only one table
|
|||||||||
p |
q |
r |
p → q |
p → r |
p → r ∨ q |
( p → q ) ∧ ( p → r ) → ( p → r ∨ q ) |
p → r ∧ q |
( p → q ) ∧ ( p → r ) → ( p → r ∧ q ) |
|
T |
T |
T |
T |
T |
T |
T |
T |
T |
|
T |
T |
F |
T |
F |
T |
T |
F |
T |
|
T |
F |
T |
F |
T |
T |
T |
F |
T |
|
T |
F |
F |
F |
F |
F |
T |
F |
T |
|
F |
T |
T |
T |
T |
T |
T |
T |
T |
|
F |
T |
F |
T |
T |
T |
T |
T |
T |
|
F |
F |
T |
T |
T |
T |
T |
T |
T |
|
F |
F |
F |
T |
T |
T |
T |
T |
T |
Semantic methods for checking logical entailment have the merit of being conceptually simple; they directly manipulate interpretations of sentences
Unfortunately, the number of interpretations of a language grows exponentially with the number of logical constants.
When the number of logical constants in a propositional language is large, the number of interpretations may be impossible to manipulate.
Deductive (proof) systems provide an alternative way of checking and communicating logical entailment that addresses this problem
In many cases, it is possible to create a “proof” of a conclusion from a set of premises that is much smaller than the truth table for the language;
Moreover, it is often possible to find such proofs with less work than is necessary to check the entire truth table
An important component in the treatment of proofs is the notion of a schema
A schema is an expression satisfying the grammatical rules of our language except for the occurrence of metavariables in place of various subparts of the expression.
For example, the following expression is a pattern with metavariables φ and ψ.
φ → (ψ → φ)
An instance of a sentence schema is the expression obtained by substituting expressions for the metavariables.
For example, the following is an instance of the preceding schema.
p →( q → p )
The basis for proof systems is the use of correct rules of inference that can be applied directly to sentences to derive conclusions that are guaranteed to be correct under all interpretations
Since the interpretations are not enumerated, time and space can often be saved
A rule of inference is a pattern of reasoning consisting of:
One set of sentence schemata, called premises , and
A second set of sentence schemata, called conclusions
A rule of inference is sound if and only if, for every instance, the premises logically entail the conclusions
φ → ψ φ ψ |
P →(Q →R) P Q →R |
raining → wet raining wet |
(P →Q) →R P → Q R |
wet → slippery wet slippery |
|
The implication introduction schema ( II ), together with Modus Ponens, allows us to infer implications
φ → (ψ → φ)
The implication distribution schema ( ID ) allows us to distribute one implication over another
(φ → (ψ → c)) → ((φ → ψ) → (φ → c))
The contradiction realization schemata ( CR ) permit us to infer a sentence if the negation of that sentence implies some sentence and its negation
(ψ → ¬ φ) → ((ψ → φ) → ¬ ψ)
(¬ ψ → ¬ φ) → ((¬ ψ → φ) → ψ)
The equivalence schemata ( EQ ) captures the meaning of the ↔ operator
(φ ↔ ψ) → (φ → ψ)
(φ ↔ ψ) → (ψ → φ)
(φ → ψ) → ((ψ → φ) → (ψ ↔ φ))
The meaning of the other operators in propositional logic is captured in the following axiom schemata
(φ ← ψ) ↔ (ψ → φ)
(φ ∨ ψ) ↔ (¬φ → ψ)
(φ ∧ ψ) ↔ ¬(¬φ ∨ ¬ψ)
The above axiom schemata are jointly called the standard axiom schemata for Propositional Logic
They all are valid
A proof of a conclusion from a set of premises is a sequence of sentences terminating in the conclusion in which each item is either
a premise,
an instance of an axiom schema, or
the result of applying a rule of inference to earlier items in sequence
Example:
1. p → q |
Premise |
2. q → r |
Premise |
3. ( q → r ) → ( p → ( q → r )) |
II |
4. p → ( q → r ) |
MP : 3,2 |
5. ( p → ( q → r )) → (( p → q ) →( p → r )) |
ID |
6. ( p → q ) → ( p → r ) |
MP : 5,4 |
7. p → r |
MP : 6,1 |
If there exists a proof of a sentence φ from a set Δ of premises and the standard axiom schemata using Modus Ponens, then φ is said to be provable from Δ, written as
Δ ⊢ φ
There is a close connection between provability and logical entailment (⊧):
A set of sentences Δ logically entails a sentence φ if and only if φ is provable from Δ
Soundness Theorem:
If φ is provable from Δ, then Δ logically entails φ.
Completeness Theorem:
If Δ logically entails φ, then φ is provable from Δ.
The concept of provability is important because it suggests how we can automate the determination of logical entailment
Starting from a set of premises Δ, we enumerate conclusions from this set
If a sentence φ appears, then it is provable from Δ and is, therefore, a logical consequence
If the negation of φ appears, then ¬φ is a logical consequence of Δ and φ is not logically entailed (unless Δ is inconsistent)
Note that it is possible that neither φ nor ¬φ will appear
Propositional resolution is an extremely powerful rule of inference for Propositional Logic
Using propositional resolution (without axiom schemata or other rules of inference), it is possible to build a theorem prover that is sound and complete for all of Propositional Logic
The search space using propositional resolution is much smaller than for standard propositional logic
Propositional resolution works only on expressions in clausal form
Before the rule can be applied, the premises and conclusions must be converted to this form
A clause is a set of literals which is assumed (implicitly) to be a disjunction of those literals
Example:
Unit clause: clause with only one literal; e.g. {¬ q }
Clausal form of a formula: Implicit conjunction of clauses
Example:
Abbreviated notation:
Notation:
l -literal, l c -complement of l
C -clause (a set of literals)
S -a clausal form (a set of clauses)
(1) If l appears in some clause of S , but l c does not appear in any clause, then, if we delete all clauses in S containing l , the new clausal form S' is satisfiable if and only if S is satisfiable
Example: Satisfiability of is equivalent to satisfiability of
(2) Suppose C = { l } is a unit clause and we obtain S' from S by deleting C and l c from all clauses that contain it; then, S is satisfiable if and only if S' is satisfiable
Example: is satisfiable iff is satisfiable
(3) If S contains two clauses C and C' , such that C is a subset of C' , we can delete C‘ without affecting the (un)satisfiability of S
Example: is satisfiable iff is satisfiable
(4) If a clause
C
in
S
contains a pair of complementary literals
l
,
l
c
, then
C
can be deleted from
S
without affecting its (un)satisfiability
Example: is satisfiable iff is such
Theorem : Every propositional formula can be transformed into an equivalent formula in CNF
1. Implications:
φ1 → φ2 → ¬φ1 ∨ φ2
φ1 ← φ2 → φ1 ∨ ¬φ2
φ1 ↔ φ2 → (¬φ1 ∨ φ2 ) ∧ (φ1 ∨ ¬φ2)
2. Negations:
¬¬φ → φ
¬(φ1 ∧ φ2 ) → ¬φ1 ∨ ¬φ2
¬(φ1 ∨ φ2 ) → ¬φ1 ∧ ¬φ2
3. Distribution:
φ1 ∨ (φ2 ∧ φ3 ) → (φ1 ∨ φ2) ∧ (φ1 ∨ φ3 )
(φ1 ∧ φ2) ∨ φ3 → (φ1 ∨ φ3 ) ∧ (φ2 ∨ φ3)
(φ1 ∨ φ2) ∨ φ3 → φ1 ∨ (φ2 ∨ φ3)
(φ1 ∧ φ2) ∧ φ3 → φ1 ∧ (φ2 ∧ φ3)
4. Operators:
φ1 ∨... ∨ φn → {φ1,...,φn}
φ1 ∧ ...∧ φn → {φ1}...{φn}
Transform the formula
(p → q) → (¬q → ¬p)
into an equivalent formula in CNF
Solution:
Suppose C 1 ,C 2 are clauses such that l in C 1 , l c in C 2 . The clauses C 1 and C 2 are said to be clashing clauses and they clash on the complementary literals l , l c
C , the resolvent of C 1 ,C 2 is the clause
C 1 and C 2 are called the parent clauses of C.
Example:
The clauses
clash on P, ¬ P
C 1 ,C 2 also clash on Q, ¬ Q so, another way to find a resolvent for these two clauses is
Theorem : Resolvent C is satisfiable if and only if the parent clauses C 1 ,C 2 are simultaneously satisfiable
Resolution Algorithm:
Input : S – a set of clauses
Output : “S is satisfiable” or “S is not satisfiable”
Set S 0 := S
Suppose S i has already been constructed
To construct S i+1 , choose a pair of clashing literals and clauses C 1 ,C 2 in S (if there are any) and derive
C := Res( C 1 ,C 2 )
S i+1 := S i U {C}
If C is the empty clause, output “S is not satisfiable”; if S i+1 = S i , output “S is satisfiable”
Otherwise, set i := i + 1 and go back to Step 2
Example: Show that
(p → q) → (¬q → ¬p)
is a valid formula
Solution: We will show that
¬[(p → q) → (¬q → ¬p)]
is not satisfiable.
(1) Transform the formula into CNF:
(2) Show, using resolution, that
C is the empty clause
A derivation of the empty clause from S is called a refutation of S
Theorem : If the set of a clauses labeling the leaves of a resolution tree is satisfiable, then the clause at the root is satisfiable
Theorem ( Soundness ): If the empty clause is derived from a set of clauses, then the set of clauses is unsatisfiable
Theorem ( Completeness ) If a set of clauses is unsatisfiable, then the empty clause can be derived from it using resolution algorithm
ILLUSTRATION BY LARGER EXAMPLE
For each of these sets of premises, what relevant conclusion or conclusions can be drawn? Explain the rules of inference used to obtain each conclusion from the premises.
(a) “If I eat spicy foods, then I have strange dreams.” “I have strange dreams if there is thunder while I sleep.” “I did not have strange dreams.”
(b) “I am dreaming or hallucinating.” “I am not dreaming.” “If I am hallucinating, I see elephants running down the road.”
(c) “If I work, it is either sunny or partly sunny.” “I worked last Monday or I worked last Friday.” “It was not sunny on Tuesday.” “It was not partly sunny on Friday.”
(b) “I am dreaming or hallucinating.” “I am not dreaming.” “If I am hallucinating, I see elephants running down the road.”
The relevant conclusion is: “I see elephants running down the road.”.
Let the primitive statements be:
d, ‘I am dreaming’
h, ‘I am hallucinating’
e, ‘I see elephants running down the road’
Then the premises are translated as: d ∨ h, ¬d, and h → e.
And the conclusion: e.
Steps Reason
d ∨ h premise
¬d premise
h rule of disjunctive syllogism to Steps 1 and 2
h → e premise
e Modus Ponens to Steps 4 and 3
(c) “If I work, it is either sunny or partly sunny.” “I worked last Monday or I worked last Friday.” “It was not sunny on Tuesday.” “It was not partly sunny on Friday.”
There is no single relevant conclusion in this problem, its main difficulty is to to represent the premises so that one is able infer anything at all. One possible relevant conclusion is: “It was sunny or partly sunny last Monday or it was sunny last Friday.”.
Let the primitive statements be:
wm, ‘I worked last Monday’
wf , ‘I worked last Friday’
sm, ‘It was sunny last Monday’
st, ‘It was sunny last Tuesday’
sf , ‘It was sunny last Friday’
pm, ‘It was partly sunny last Monday’
pf , ‘It was partly sunny last Friday’
Then the premises are translated as: wm ∨ wf , wm → (sm ∨ pm), wf → (sf ∨ pf ), ¬st, and ¬pf .
And the conclusion: sf ∨ sm ∨ pm.
Steps | Reason | |
1 | wf → (sf ∨ pf ) | premise |
2 | ¬wf ∨ sf ∨ pf | expression for implication |
3 | ¬pf → (¬wf ∨ sf ) | expression for implication |
4 | ¬pf | premise |
5 | ¬wf ∨ sf | modus ponens to Steps 3 and 4 |
6 | wf → sf | expression for implication |
7 | wm ∨ wf | premise |
8 | ¬wm → wf | expression for implication |
9 | ¬wm → sf | rule of syllogism to Steps 8 and 6 |
10 | wm ∨ sf | expression for implication |
11 | ¬sf → wm | expression for implication |
12 | wm → (sm ∨ pm) | premise |
13 | ¬sf → (sm ∨ pm) | rule of syllogism to Steps 11 and 12 |
14 | sf ∨ sm ∨ pm | expression for implication. |
Steps | Reason | |
1 | wf → (sf ∨ pf ) | premise |
2 | ¬wf ∨ sf ∨ pf | expression for implication |
3 | ¬pf | premise |
4 | ¬wf ∨ sf | rule of resolution to Steps 2 and 3 |
5 | wm ∨ wf | premise |
6 | wm ∨ sf | rule of resolution to Steps 4 and 5 |
7 | wm → (sm ∨ pm) | premise |
8 | ¬wm ∨ sm ∨ pm | expression for implication |
9 | sf ∨ sm ∨ pm | rule of resolution to Steps 7 and 8 |
EXTENSIONS
Propositional logic is not adequate for formalizing valid arguments that rely on the internal structure of the propositions involved
In propositional logic the smallest atoms represent whole propositions (propositions are atomic)
Propositional logic does not capture the internal structure of the propositions
It is not possible to work with units smaller than a proposition
Example:
“A Mercedes Benz is a Car” and “A car drives” are two individual, unrelated propositions
We cannot conclude “A Mercedes Benz drives”
It is possible to represent everything you want in propositional logic
But often this is not very efficient
Basic idea : A proposition is expressed as predicate about (on or more) objects in the world
Propositions are predicates and arguments
I.e. Car(Mercedes Benz).
The most immediate way to develop a more complex logical calculus is to introduce rules that are sensitive to more fine-grained details of the sentences being used
When the atomic sentences of propositional logic are broken up into terms, variables, predicates, and quantifiers, they yield first-order logic , which keeps all the rules of propositional logic and adds some new ones
SUMMARY
Propositional logic is one of the simplest and most common logic and is the core of (almost) all other logics
Propositional logic commits only to the existence of facts that may or may not be the case in the world being represented
Propositional logic quickly becomes impractical, even for very small worlds
This lecture focused on three core aspects of the propositional logic:
Syntax: Vocabulary for expressing concepts without ambiguity
Semantics: Connection to what we're reasoning about
Interpretation - what the syntax means
Reasoning: How to prove things
What steps are allowed
REFERENCES
Mandatory Reading:
First-Order Logic and Automated Theorem Proofing (2nd edition) by Melvin Fitting
Further Reading:
Mathematical Logic for Computer Science (2nd edition) by Mordechai Ben-Ari
http://www.springer.com/computer/foundations/book/978-1-85233-319-5
Propositional Logic at The Internet Encyclopedia of Philosophy
http://www.iep.utm.edu/p/prop-log.htm
Wikipedia links:
http://en.wikipedia.org/wiki/Propositional_calculus
|
Title |
|
1 |
Introduction |
|
2 |
Propositional Logic |
|
3 |
Predicate Logic |
|
4 |
Reasoning |
|
5 |
Search Methods |
|
6 |
CommonKADS |
|
7 |
Problem-Solving Methods |
|
8 |
Planning |
|
9 |
Software Agents |
|
10 |
Rule Learning |
|
11 |
Inductive Logic Programming |
|
12 |
Formal Concept Analysis |
|
13 |
Neural Networks |
|
14 |
Semantic Web and Services |
Questions?
Predicate Logic
Motivation
Technical Solution
Syntax
Semantics
Inference
Illustration by Larger Example
Extensions
Summary
References
MOTIVATION
Suppose we want to capture the knowledge that
Anyone standing in the rain will get wet.
and then use this knowledge. For example, suppose we also learn that
Jan is standing in the rain.
We'd like to conclude that Jan will get wet. But each of these sentences would just be a represented by some proposition, say P, Q and R . What relationship is there between these propositions? We can say
P /\ Q → R
Then, given P /\ Q , we could indeed conclude R . But now, suppose we were told
Pat is standing in the rain.
We'd like to be able to conclude that Pat will get wet, but nothing we have stated so far will help us do this
The problem is that we aren't able to represent any of the details of these propositions
It's the internal structure of these propositions that make the reasoning valid.
But in propositional calculus we don't have anything else to talk about besides propositions!
⇒ A more expressive logic is needed
⇒ Predicate logic (occasionally referred to as First-order logic (FOL) )
SYNTAX
Constants
Names of specific objects
E.g. doreen, gord, william, 32
Functions
Map objects to objects
E.g. father(doreen), age(gord), max(23,44)
Variables
For statements about unidentified objects or general statements
E.g. x, y, z, …
Terms represent objects
The set of terms is inductively defined by the following rules:
Constants: Any constant is a term
Variables: Any variable is a term
Functions: Any expression f(t1,…,tn) of n arguments (where each argument is a term and f is a function of arity n ) is a term
Terms without variables are called ground terms
Examples:
c
f(c)
g(x,x)
g(f(c), g(x,x))
Predicate symbols represent relations between zero or more objects
The number of objects define a predicate‘s aritiy
Examples:
Likes(george, kate)
Likes(x,x)
Likes(joe, kate, susy)
Friends (father_of(david), father_of(andrew))
Signature: A signature is a collection of constants, function symbols and predicate symbols with specified arities
FOL formulas are joined together by logical operators to form more complex formulas (just like in propositional logic)
The basic logical operators are the same as in propositional logic as well:
Negation: ¬p („it is not the case that p“)
Conjunction: p ∧ q („p and q“)
Disjunction: p ∨ q („p or q“)
Implication: p → q („p implies q“ or “q if p“)
Equivalence: p ↔ q („p if and only if q“)
Two quantifiers: Universal (∀) and Existential (∃)
Allow us to express properties of collections of objects instead of enumerating objects by name
Apply to sentence containing variable
Universal ∀: true for all substitutions for the variable
“for all”: ∀
Existential ∃: true for at least one substitution for the variable
“there exists”: ∃
Examples:
∃ x: Mother(art) = x
∀ x ∀ y: Mother(x) = Mother(y) → Sibling(x,y)
∃ y ∃ x: Mother(y) = x
The set of formulas is inductively defined by the following rules:
Preciate symbols : If P is an n-ary predicate symbol and t1,…,tn are terms then P(t1,…,tn) is a formula.
Negation : If φ is a formula, then ¬φ is a formula
Binary connectives : If φ and ψ are formulas, then (φ → ψ) is a formula. Same for other binary logical connectives.
Quantifiers : If φ is a formula and x is a variable, then ∀xφ and ∃xφ are formulas.
Atomic formulas are formulas obtained only using the first rule
Example: If f is a unary function symbol, P a unary predicate symbol, and Q a ternary predicate symbol, then the following is a formula:
∀x∀y(P(f(x)) → ¬(P(x)) → Q(f(y), x, x)))
Any occurrence a variable in a formulate not in the scope of a quantifier is said to be a free occurrence
Otherwise it is called a bound occurrence
Thus, if x is a free variable in φ it is bound in ∀xφ and ∃xφ
A formula with no free variables is called a closed formula
Example: x and y are bound variables, z is a free variable
∀x∀y(P(f(x)) → ¬(P(x)) → Q(f(y), x, z)))
S := <sentence>
<sentence>
:=
<atomicsentence>
|
<sentence>
<connective>
<sentence>
|
<quantifier>
<variable>
,...
<sentence>
| ¬
<sentence>
| (
<sentence>
)
<atomicsentence> := <predicate> ( <term> , ... )
<term>
:=
<function>
(
<term>
, ... )
|
<constant>
|
<variable>
<connective> := ∧ | v | → | ↔
<quantifier> := ∃ | ∀
<constant> := “c" | “x1" | “john" | ...
<variable> := "a" | "x" | "s" | ...
<predicate> := “before" | “hasColor" | “raining" | ...
<function> := “mother" | “leftLegOf" | ...
Semantics
Interpretations
Models and Satisfiability
Logical Consequence (Entailment)
Interpretation – Maps symbols of the formal language (predicates, functions, variables, constants) onto objects, relations, and functions of the “world” (formally: Domain, relational Structure, or Universe)
Valuation – Assigns domain objects to variables
The Valuation function can be used for describing value assignments and constraints in case of nested quantifiers.
The Valuation function otherwise determines the satisfaction of a formula only in case of open formulae.
Constructive Semantics – Determines the semantics of complex expressions inductively, starting with basic expressions
Domain, relational Structure, Universe
D finite set of Objects d 1, d 2, ... , d n
R,... Relations over D R ⊆ D n
F,... Functions over D F: D n → D
Basic Interpretation Mapping
constant I [c] = d Object
function I [f] = F Function
predicate I [P] = R Relation
Valuation V
variable V(x) = d ∈ D
Next, determine the semantics for complex terms and formulae constructively, based on the basic interpretation mapping and the valuation function above.
Terms with variables
I [f(t1,...,tn)) = I [f] (I [t1],..., I [tn]) = F(I [t1],..., I [tn]) ∈D
where I[ti] = V(ti) if ti is a variable
Atomic Formula
I [P(t1,...,tn)] true if (I [t1],..., I [tn]) ∈ I [P] = R
Negated Formula
I [¬α] true if I [α] is not true
Complex Formula
I[α∨β] true if I [α] or I [β] true
I [α∧β] true if I [α] and I [β] true
I [α→β] if I [α] not true or I [β] true
Quantified Formula (relative to Valuation function)
I [∃x:α] | true if α is true with V’(x)=d for some d∈D where V’ is otherwise identical to the prior V. |
I [∀x:α] | true if α is true with V’(x)=d for all d∈D and where V’ is otherwise identical to the prior V. |
Note: ∀x∃y: α is different from ∃y∀x: α
In the first case ∀x∃y:α , we go through all value assignments V'(x), and for each value assignment V'(x) of x, we have to find a suitable value V'(y) for y.
In the second case ∃y∀x:α, we have to find one value V'(y) for y, such that all value assignments V'(x) make α true.
Given is an interpretation I into a domain D with a valuation V, and a formula φ .
We say that:
φ is satisfied in this interpretation or
this interpretation is a model of φ iff
I[φ] is true.
That means the interpretation function I into the domain D (with valuation V) makes the formula φ true.
Given a set of formulae Φ and a formula α .
α is a logical consequence of Φ iff
α is true in every model in which Φ is true.
Notation:
Φ ⊧ α
That means that for every model (interpretation into a domain) in which Φ is true, α must also be true.
Inference
Entailment: KB ⊧ Q
Entailment is a relation that is concerned with the semantics of statements
Q is entailed by KB (a set of premises or assumptions) if and only if there is no logically possible world in which Q is false while all the premises in KB are true
Stated positively: Q is entailed by KB if and only if the conclusion is true in every possible world in which all the premises in KB are true
Provability: KB ⊢ Q
Provability is a syntactic relation
We can derive Q from KB if there is a proof consisting of a sequence of valid inference steps starting from the premises in KB and resulting in Q
Soundness: If KB ⊢ Q then KB ⊧ Q
If Q is derived from a set of sentences KB using a set of inference rules, then Q is entailed by KB
Hence, inference produces only real entailments, or any sentence that follows deductively from the premises is valid
Only sentences that logically follow will be derived
Completeness: If KB ⊧ Q then KB ⊢ Q
If Q is entailed by a set of sentences KB, then Q can also be derived from KB using the rules of inference
Hence, inference produces all entailments, or all valid senteces can be proved from the premises
All the sentences that logically follow can be derived
Inference rules for propositional logic apply to propositional logic as well
Modus Ponens, Modus tollens etc.
New (sound) inference rules for use with quantifiers:
Modus Ponens, Modus Tollens
Universal elimination
Existential elimination
Existential introduction
Generalized Modus Ponens (GMP)
Modus Ponens (Law of Detachment )
Based on claiming 1.) that P is true, and 2.) the implication P → Q , we can conclude that Q is true.
If P, then Q. P, therefore, Q
Example:
RegularlyAttends(joe, lecture),
RegularlyAttends(joe, lecture) → Pass(joe, lecture)
Allow us to conclude:
Pass(joe, lecture)
Modus Tollens (Denying the consequent)
We again make two claims. 1.) The implication P → Q, and 2.) that Q is false. We can conclude that P is false as well.
If P , then Q . ¬Q Therefore, ¬P
Example:
Detects(alarm, intruder) → GoesOff(alarm),
¬GoesOff(alarm)
Allows us to conclude:
¬Detects(alarm, intruder)
Universal elimination
If ∀x P(x) is true, then P(c) is true, where c is any constant in the domain of x
Variable symbol can be replaced by any ground term
Example:
∀x RegularlyAttends(x, lecture) → Pass(x, lecture)
Allow us to conclude (assuming Joe is in the domain of x):
RegularlyAttends(joe, lecture) → Pass(joe, lecture)
Existential elimination
From ∃x P(x) infer P(c)
Note that the variable is replaced by a brand-new constant not occurring in this or any other sentence in the KB
Also known as skolemization; constant is a skolem constant
In other words, we don’t want to accidentally draw other inferences by introducing the constant
Convenient to use this to reason about the unknown object, rather than constantly manipulating the existential quantifier
Example:
∃x Pass(x)
Allows us to conclude:
Pass(someStudent) , where someStudent is a new constant
If P(c) is true, then ∃x P(x) is inferred.
The inverse of existential elimination
All instances of the given constant symbol are replaced by the new variable symbol
Note that the variable symbol cannot already exist anywhere in the expression
Example:
Eats(Ziggy, IceCream)
∃x Eats(Ziggy,x)
Apply modus ponens reasoning to generalized rules
Combines Universal-Elimination, and Modus Ponens
E.g, from P(c) and Q(c) and ∀ x (P(x) ∧ Q(x)) → R(x) derive R(c)
GMP requires substitutions for variable symbols
subst(θ, α) denotes the result of applying a set of substitutions defined by θ to the sentence α
A substitution list θ = {v1/t1, v2/t2, ..., vn/tn} means to replace all occurrences of variable symbol vi by term ti
Substitutions are made in left-to-right order in the list
subst ({x/IceCream, y/Ziggy}, eats(y,x)) = eats (Ziggy, IceCream)
Completeness : If a knowledge-base KB entails a statement S , we can prove S
Gödel Completeness Theorem : There exists a complete proof system for FOL
KB ⊨ Q ↔ KB ⊢ Q
Gödel proved that there exists a complete proof system for FOL.
Gödel did not come up with such a concrete proof system.
Robinson’s Completeness Theorem : Resolution is such a concrete complete proof system for FOL
FOL is only semi-decidable : If a conclusion follows from premises, then a complete proof system (like resolution) will find a proof.
If there’s a proof, we’ll halt with it (eventually)
However, If there is no proof (i.e. a statement does not follow from a set of premises), the attempt to prove it may never halt
From a practical point of view this is problematic
We cannot distinguish between the non-existence of a proof or the failure of an implementation to simply find a proof in reasonable time.
Theoretical completeness of an inference procedure does not make a difference in this cases
Does a proof simply take too long or will the computation never halt anyway?
ILLUSTRATION BY LARGER EXAMPLE
Consider the following english text:
“Anyone passing his Intelligent System exam and winning the lottery is happy. But any student who studies for an exam or is lucky can pass all his exams. John did not study but John is lucky. Anyone who is lucky wins the lottery. Mary did not win the lottery, however Mary passed her IS exam. Gary won the lottery. Gary, John, and Mary are all students.„
Identify the task
Knowledge to be captured, questions to be answered
Assemble the relevant knowledge
Relavant: Relation between exams, lottery jackpots, passing of an exam information about individuals, etc.
Irrelevant: Dates of exams, teacher of the exam, amount of money in the lottery pot, etc
Decide on a vocabulary
Predicates
Constants symbols
E.g. Win(x, Lottery) or Win(x, y) ∧ Lottery(y)?
Encode general knowledge of the domain
Anyone passing the IS exams and winning the lottery is happy
∀x Pass(x, IS) ∧ Win(x, Lottery) ⇒ Happy(x)
Any student who studies or is lucky can pass all his exams
∀x ∀y Student(x) ∧ Exam(y) ∧ (StudiesFor(x, y) ∨ Lucky(x)) ⇒ Pass(x,y)
Anyone who is lucky wins the lottery
∀x Lucky(x) ⇒ Win(x, Lottery)
Encode the specific problem instance
John did not study, but John is lucky
¬Study(John) ∧ Lucky(John)
Mary did not win the lottery, however Mary passed her IS exam ¬ Win(Mary, Lottery)
Pass(Mary,IS)
Gary won the lottery
Win(Gary, Lottery)
Gary, John, and Mary are all students
Student(Gary), Student(John), Student(Mary)
6. Pose queries to the inference procedure
Happy(John) ?
Happy(Mary), Lucky(Mary)?
∃x Student(x) ∧ Pass(x, IS) ?
Specific inference procedures (e.g. Resolution ) can be used to systematically answer those queries (see next lecture)
EXTENSIONS
Ordinary first-order interpretations have a single domain of discourse over which all quantifiers range; many-sorted first-order logic allows variables to have different sorts , which have different domains
The characteristic feature of first-order logic is that individuals can be quantified, but not predicates; second-order logic extends first-order logic by adding the latter type of quantification
Intuitionistic first-order logic uses intuitionistic rather than classical propositional calculus; for example, ¬¬φ need not be equivalent to φ
Infinitary logic allows infinitely long sentences; for example, one may allow a conjunction or disjunction of infinitely many formulas, or quantification over infinitely many variables
First-order modal logic has extra modal operators with meanings which can be characterised informally as, for example "it is necessary that φ" and "it is possible that φ“
SUMMARY
Predicate logic differentiates from propositional logic by its use of quantifiers
Each interpretation of predicate logic includes a domain of discourse over which the quantifiers range.
There are many deductive systems for predicate logic that are sound (only deriving correct results) and complete (able to derive any logically valid implication)
The logical consequence relation in predicate logic is only semidecidable
This lecture focused on three core aspects of the predicate logic: Syntax, Semantics, and Inference
REFERENCES
Mandatory Reading:
M. Fitting: First-Order Logic and Automated Theorem Proving, 1996 Springer-Verlag New York (Chapter 5)
Further Reading:
Michael Huth and Mark Ryan, Logic in Computer Science(second edition) , 2004, Cambridge University Press
http://plato.stanford.edu/entries/model-theory/
Wikipedia links:
http://en.wikipedia.org/wiki/First-order_logic
Questions?
REASONING
Motivation
Technical Solution
Introduction to Theorem Proving and Resolution
Description Logics
Logic Programming
Summary
References
MOTIVATION
Computers can do reasoning
Theorem Proving and Resolution, Description Logics, and Logic Programming
Introduction to Theorem Proving and Resolution
|
||
P 1 … P n
____________
C
|
where C is a conclusion sequent, and P i ‘s are premises sequents .
Resolution is a refutation system .
To prove a statement we attempt to refute its negation
Basic idea: Given a consistent set of axioms KB and goal sentence Q , we want to show that KB ⊧ Q
This means that every interpretation I that satisfies KB also satisfies Q
Any interpretation I only satisfies either Q or ¬ Q, but not both
Therefore, if in fact KB ⊧ Q holds , an interpretation that satisfies KB , satisfies Q and does not satisfy ¬ Q
The resolution rule is a single inference that produces a new clause implied by two clauses (called the parent clauses) containing complementary literals
The resulting clause contains all the literals that do not have complements. Formally:
When the two initial clauses contain more than one pair of complementary literals, the resolution rule can be applied (independently) for each such pair
However, only the pair of literals that are resolved upon can be removed: All other pairs of literals remain in the resolvent clause
Modus ponens (see last lecture) can be seen as a special case of resolution of a one-literal clause and a two-literal clause
Example: Given clauses
P(x, f(a)) v P(x, f(y)) v Q(y) and ¬P(z, f(a)) v ¬ Q(z)
P(x, f(a)) and ¬P(z, f(a)) unify with substitution σ = {x/z}
Therefore, we derive the resolvent clause (to which σ is applied):
P(z, f(y)) v Q(y) v ¬ Q(z)
Convert all the propositions of a knowledge base KB to clause normal form (CNF)
Negate Q (the to be proven statement) and convert the result to clause form. Add it to the set of clauses obtained in step 1.
Repeat until either a contradiction is found, no progress can be made, or a predetermined amount of effort has been expended:
a) Select two clauses. Call these the parent clauses.
b) Resolve them together using the resolution rule . The resolvent will be the disjunction of all the literals of both parent clauses with appropriate substitutions.
c) If the resolvent is the empty clause , then a contradiction has been found. If it is not, then add it to the set of clauses available to the procedure.
procedure resolution-refutation(KB, Q)
;; KB a set of consistent, true sentences
;; Q is a goal sentence that we want to derive
;; return success if KB ⊢ Q, and failure otherwise
KB = union(KB, ¬Q)
while false not in KB do
pick 2 sentences, S1 and S2, in KB that contain
literals that unify
if none return "failure“
resolvent = resolution-rule(S1, S2)
KB = union(KB, resolvnt)
return "success"
Step 3: Standardize by renaming all variables so that variables bound by different quantifiers have unique names
( ∀ X) a(X) ∨ ( ∀ X) b(X) = ( ∀ X) a(X) ∨ ( ∀ Y) b(Y)
Step 4: Move all quantifiers to the left to obtain a prenex normal form
A formula is inn prenex normal form if it is written as a string of quantifiers followed by a quantifier-free part
Step 5: Eliminate existential quantifiers by using skolemization
|
||
Predicate form
|
Clause form
|
|
1.
∀
(x) (dog(X)
→
animal(X))
|
¬
dog(X)
∨
animal(X)
|
|
2. dog(fido)
|
dog(fido)
|
|
3.
∀
(Y) (animal(Y)
→
die(Y))
|
¬
animal(Y)
∨
die(Y)
|
Also, the resolution cannot be used to prove that a given sentence is not entailed by a set of axioms
A search method is required
Worst case: Resolution is exponential in the number of clauses to resolve
Order of clause combination is important
N clauses → N2 ways of combinations or checking to see whether they can be combined
Search heuristics are very important in resolution proof procedures
Strategies
Breadth-First Strategy
Set of Support Strategy
Unit Preference Strategy
Linear Input Form Strategy
Q is entailed by KB (a set of premises or assumptions) if and only if there is no logically possible world in which Q is false while all the premises in KB are true
Does a proof simply take too long or will the computation never halt anyway?
Due to its complexity and remaining limitations FOL is often not suitable for practical applications
Logic Programming
DESCRIPTION LOGICS
Concepts/classes (unary predicates/formulae with one free variable)
E.g. Person, Female
Roles (binary predicates/formulae with two free variables)
E.g. hasChild
Individuals (constants)
E.g. Mary, John
Constructors allow to form more complex concepts/roles
Union ⊔: Man ⊔ Woman
Intersection ⊓ : Doctor ⊓ Mother
Existential restriction ∃: ∃hasChild.Doctor (some child is a doctor)
Value(universal) restriction ∀: ∀hasChild.Doctor (all children are doctors)
Complement /negation¬: Man ⊑ ¬Mother
Number restriction ≥n, ≤n
Axioms
Subsumption ⊑ : Mother ⊑ Parent
Classes/concepts are actually a set of individuals
We can distinguish different types of concepts:
Atomic concepts: Cannot be further decomposed (i.e. Person)
Incomplete concepts (defined by ⊑)
Complete concepts (defined by ≡)
Example incomplete concept defintion:
Man ⊑ Person ⊓ Male
Intended meaning: If an individual is a man, we can conclude that it is a person and male.
Man(x) ⇒ Person(x) ∧ Male(x)
Example complete concept definition:
Man ≡ Person ⊓ Male
Intended meaning: Every individual which is a male person is a man, and every man is a male person.
Man(x) ⇔ Person(x) ∧ Male(x)
I.e. directedBy(Pool Sharks, Edwin Middleton), hasChild(Jonny, Sue)
Roles have a domain and a range
Given the above definitions we can conclude that Pool Sharks is a move and that Edwin Middleton is (was) a person.
A special case of (unqualified) number restriction ≤1 R
Transitivity can be captured in DLs by role hierarchies and transitive roles:
hasChild ≡ hasParent
Bear(Winni Puh)
From a theoretical point of view this division is arbitrary but it is a useful simplification
Restricted use of quantifiers: ∃, ∀
Provides (implicitly or explicitly) conjunction, union and negation of class descriptions
Person ⊓ ∀hasChild.(Doctor ⊔ ∃hasChild.Doctor)
Mary is a Woman.
E.g. Marry loves Peter.
E.g.: A Dog is an animal. A man is a male Person.
Semantics follow standard FOL model theory
Description Logics are a fragment of FOL
The vocabulary is the set of names (concepts and roles) used
I.e. Mother, Father, Person, knows, isRelatedTo, hasChild, …
An interpretation I is a tuple (∆I, ⋅I)
∆I is the domain (a set)
⋅I is a mapping that maps:
Names of objects (individuals) to elements of the domain
Names of unary predicates (classes/concepts) to subsets of the domain
Names of binary predicates (properties/roles) to subsets of ∆I x ∆I
The semantics of DL are based on standard First Order Model theory
A translation is usually very straightforward, according to the following correspondences (for ALC):
A description is translated to a first-order formula with one free variable
An individual assertion is translated to a ground atomic formula
An axiom is translated to an implication, closed under universal implication
More complex DLs can be handled in a similar way
Main reasoning tasks for DL systems:
Satisfiability : Check if the assertions in a KB have a model
Instance checking : Check if an instance belongs to a certain concept
Concept satisfiability : Check if the definition of a concept can be satisfied
Subsumption : Check if a concept A subsumes a concept B (if every individual of a concept B is also of concept A)
Equivalence : A ≡ B ⇔ B ⊑ A and A ⊑ B
Retrieval : Retrieve a set of instances that belong to a certain concept
Reasoning Task are typically reduced to KB satisfiability sat(A) w.r.t. to a knowledge base A
Instance checking : instance(a,C, A) ⇔¬sat(A ⋃ {a: ¬ C})
Concept satisfiability : csat(C) ⇔ sat(A ⋃ {a: ¬ C})
Concept subsumption : B ⊑ A ⇔ A ⋃ {¬B ⊓ C} is not satisfiable ⇔ ¬sat(A ⋃ {¬B ⊓ C})
Retrieval : Instance checking for each instance in the Abox
Note: Reduction of reasoning tasks to one another in polynomial time only in propositionally closed logics
DL reasoners typically employ tableaux algorithms to check satisfiability of a knowledge base
Pellet supports expected standard DL reasoning tasks
Consistency, classification, realization
Conjunctive query answering
Concept satisfiability
Additionally Pellet supports:
SPARQL-DL queries
Datatype reasoning
User-defined datatypes
N-ary datatype predicates
Rules support (DL-safe SWRL rules)
Explanation and debugging features
Axiom pinpointing service
Explanation & Debugging support
Motivation: It is hard to understand large and/or complex ontologies
Examples:
Why is a specific subclass relation inferred?
Why is an ontology inconsistent?
Pellet provides axiom pinpointing service:
For any inference, returns the (minimal set of) source axioms that cause the inference
Applications can track additional provenance information
Axiom pinpointing is the first step to explanations
Precise justifications (pinpoint parts of axioms) are ongoing work in Pellet‘s developement
LOGIC PROGRAMMING
Logic Programming is based on a subset of First Order Logic called Horn Logic
Horn Logic can serve as a simple KR formalism and allows to express
IF <condition> THEN <result> rules
Under certain restrictions reasoning over knowledge bases based on such rules is decideable (in contrast to general ATP within First Order Logic)
A Herbrand Interpretation I is a subset of the Herbrand Base B for a program
The domain of a Herbrand interpretation is the Herbrand Universe U
Constants are assigned to themselves
Every function symbol is interpreted as the function that applies it
If f is an n-ary function symbol (n>0) then the mapping from Un to U defined by (t1, …, tn) → f(t1, …, tn) is assigned to f
Logic Programs can also contain recursion
ancestor(x,z) :- ancestor(x,y), ancestor(y,z).
This is a problem as soon as negation is allowed since the minimal model is not uniquely defined anymore
If the dependency graph contains a cycle then the program is recursive
Example: What is the meaning of win(x) :- not win(x) ?
→ This guarantees a unique interpretation of a Logic Program using negation
Is turing complete
Prolog programs are not guaranteed to terminate
Datalog is such a subset
Originally a rule and query language for deductive databases
Extensional Database (EDB) consists of facts
Intentional Database(IDB) consists of non-ground rules
Restrictions:
Datalog disallows function symbols
Imposes stratification restrictions on the use of recursion + negation
Allows only range restricted variables ( safe variables )
This limits evaluation of variables to finitely many possible bindings
Non-ground query, i.e. ?- loves(Mary, x)
Replace x by every possible value
A ⊧ loves(Mary, Joe)
Homepage: http://www.iris-reasoner.org/
Various built-in predicates (Equality, inequality, assignment, unification, comparison, type checking, arithmetic, regular expressions,... )
Top-down evaluation by SLDNF resolution (backward-chaining)
|
||
|
|
IRIS performing computations using semi-naiv evaluation in combination with stratification and complex datatypes
SUMMARY
Resolution is a sound and complete inference procedure for First-Order Logic
Due to its complexity and remaining limitations FOL is often not suitable for practical applications
Often formalisms expressivity and complexity results are more appropriate:
Description Logics
Logic Programming
Individuals
Many different Description Logics exist, depending on choice of constructs
Usually reasoning tasks in DLs can all be reduced to satisfiablity checking
The semantics of a Logic Program are however based on its minimal Herbrand Model
Negation-as-failure introduced non-monotonic behavior and pushes LP beyond the expressivity of First Order Logic
A typical inference task for LP engines is conjunctive query answering
REFERENCES
Questions?
Search Methods
Problem Solving as Search
|
Move disk to peg
to the initial state ((123)()()) a new state is reached ((23)()(1))
|
|
|
Search Methods
|
||
|
|
|
* A partial tree search space representation
* A complete direct graph representation
[http://en.wikipedia.org/wiki/Tower_of_Hanoi]
Brute force approach to explore search space
List open, closed, successors={};
Node root_node, current_node;
insert-first(
root_node,open)
while not-empty (open );
N.B.= this version is not saving the path for simplicity
Result is: S->A->B->F
=1 + b + b2+ ......... + bm = O (b m )
|
|
|
List open, closed, successors={};
Node root_node, current_node;
insert-last(
root_node,open)
while not-empty (open );
N.B.= this version is not saving the path for simplicity
Result is: S->A->F
Depth-limited search (DLS) : Impose a cut-off (e.g. n for searching a path of length n -1), expand nodes with max. depth first until cut-off depth is reached (LIFO strategy, since it is a variation of depth-first search).
Bidirectional search (BIDI) : forward search from initial state & backward search from goal state, stop when the two searches meet. Average effort O(b d/2 ) if testing whether the search fronts intersect has constant effort
In AI, the problem graph is typically not known. If the graph is known, to find all optimal paths in a graph with labelled arcs, standard graph algorithms can be used
Using knowledge on the search space to reduce search costs
Blind search methods take O(bm) in the worst case
May make blind search algorithms prohibitively slow where d is large
Use problem-specific knowledge to pick which states are better candidates
Also called heuristic search
In a heuristic search each state is assigned a “heuristic value” (h-value) that the search uses in selecting the “best” next step
A heuristic is an operationally-effective nugget of information on how to direct search in a problem space
Heuristics are only approximately correct
Best-First Search
A*
Hill Climbing
f(n)=g(n)+h(n)
g(n) the cost (so far) to reach the node n
h(n) estimated cost to get from the node to the goal
f(n) estimated total cost of path through n to goal
List open, closed, successors={};
Node root_node, current_node;
insert-last(
root_node,open)
while not-empty (open );
estimationOrderedSuccessorsOf
returns the list of direct
descendants of the current
node in shortest cost order
N.B.= this version is not saving the path for simplicity
In this case we estimate the cost as the distance from the root node (in term of nodes)
Derived from Best-First Search
Uses both g(n) and h(n)
A* is optimal
A* is complete
List open, closed, successors={};
Node root_node, current_node, goal;
insert-back(
root_node,open)
while not-empty (open );
totalEstOrderedSuccessorsOf
returns the list of direct
descendants of the current
node in shortest total
estimation order
N.B.= this version is not saving the path for simplicity
In this case we estimate the cost as the distance from the root node (in term of nodes)
Result is: S->B->F!
N.B.= this version is not saving the path for simplicity
In this case we estimate the cost as the distance from the root node (in term of nodes)
Result is: S->A->B->F!
Not optimal, more if at step 1 h(S)=2 we would have completed without funding a solution
Algorithm
|
Time
|
Space
|
Optimal
|
Complete
|
Derivative
|
|
Best First Search
|
O(bm)
|
O(bm)
|
No
|
Yes
|
BFS
|
|
Hill Climbing
|
O(∞)
|
O(b)
|
No
|
No
|
|
|
A*
|
O(2N)
|
O(bd)
|
Yes
|
Yes
|
Best First Search
|
ILLUSTRATION BY A LARGER EXAMPLE
|
|
N.B.: by building the tree, we are exploring the search space!
N.B.: by building the tree, we are exploring the search space!
N.B.: by building the tree, we are exploring the search space!
EXTENSIONS
SUMMARY
REFERENCES
Questions?
CommonKADS
MOTIVATION
knowledge management
Errors in a knowledge-base can cause serious problems
changes over time
TECHNICAL SOLUTION AND ILLUSTRATIONS
Overview of CommonKADS
CommonKADS: a comprehensive methodology for KBS development
Knowledge engineering is not some kind of `mining from the expert's head', but consists of constructing different aspect models of human knowledge
The knowledge-level principle: in knowledge modeling, first concentrate on the conceptual structure of knowledge, and leave the programming details for later
Knowledge has a stable internal structure that is analyzable by distinguishing specific knowledge types and roles.
system that solves a real-life problem using knowledge about the application domain and the application task
knowledge system that solves a problem which requires a considerable amount of expertise, when solved by humans.
many-to-many relations between roles and people
specific kind of system analyst
should avoid becoming an "expert"
plays a liaison function between application domain and system
person that implements a knowledge system on a particular target platform
needs to have general design/implementation expertise
but only on the “use”-level
responsible for planning, scheduling and monitoring development work
liaises with client
typically medium-size projects (4-6 people)
profits from structured approach
Knowledge model components
gas dial
value: gas dial |
fuel tank
status: { full, almost-empty, empty} |
|
CONCEPT gas dial; ATTRIBUTES: value: dial-value; END CONCEPT gas-dial; VALUE-TYPE dial-value; VALUE-LIST: {zero, low, normal}; TYPE: ORDINAL; END VALUE-TYPE dial-value; |
CONCEPT fuel-tank; ATTRIBUTES: status: {full, almost-empty,empyt}; END CONCEPT fuel-tank; |
Description of the input/output
example, the output of a medical diagnosis task would not be a “disease” but an abstract name such as “fault category”
Template knowledge models
Knowledge models partially reused in new applications
Type of task = main guide for reuse
Catalog of task templates
prevent "re-inventing the wheel"
cost/time efficient
decreases complexity
quality-assurance
Knowledge model construction
STAGES | TYPICAL ACTIVITIES |
knowledge identification |
|
knowledge specification |
|
knowledge refinement |
|
Talk to people in the organization who have to talk to experts but are not experts themselves
Avoid diving into detailed, complicated theories unless the usefulness is proven
Construct a few typical scenarios which you understand at a global level
Never spend too much time on this activity. Two person weeks should be maximum.
AAT for art objects ontology libraries, reference models, product model libraries
empirical evidence
construct annotated inference structure (and domain schema)
if no template fits: question the knowledge-intensity of the task
the user says:
“the car does
not start” |
DIAGNOSIS:
Complaint: engine-
behavior.status =
does-not-start |
Complaint is received,
for which a
diagnostic task is
started |
|
a possible
cause is that
the fuel tank is
empty |
COVER:
hypothesis; fuel-
tank.status =
empty |
One of the three
possible causes is
produced |
|
in that case we
would expect
the gas
indicator to be
low |
PREDICT:
expected-finding:
gas-dial.value = zer |
The expected finding
provides us with a
way of getting
supporting
evidence for
hypothesis |
Knowledge elicitation techniques
Interview
Self report
Laddering
Concept sorting
Repertory grids
EXAMPLE
Organization Model
|
Problems and Opportunities Worksheet Organization model 1
|
|
Problems and opportunities
|
assessment takes too much time
not sufficient time for urgent cases
|
|
Organizational context
|
Mission: transparency of procedure, clear applicant responsibility External actors: local council, public opinion, national regulations, … Strategy: broaden scope of market |
|
Solutions
|
|
Organization model
|
Variant aspects: Worksheet Organization model 2
|
|
Resources
|
Existing database
Priority calculator
|
|
Knowledge
|
Assessment criteria:
Assignment rules:
Urgency rules:
|
|
Culture & power
|
Hierarchical organization
Employees view the future with some trepidation Management style: history as civil servant |
Task
|
Performed by
|
Where
|
Knowledge asset(s)
|
KI?
|
Signifi-
cance
|
|
1. Magazine production
|
Magazine editor
|
Public service
|
-
|
No
|
3
|
|
2. Data entry applications
|
Data typist / automated telephone
|
Residence assignment
|
-
|
No
|
2
|
|
3. Application assessment
|
Assigner
|
Residence assignment
|
Assessment criteria
|
Yes
|
5
|
|
4. Residence assignment
|
Assigner
|
Residence
Assignment
|
Assignment &
urgency rules
|
Yes
|
5
|
Name
|
Assigner
|
|
Organization
|
Residence-assignment department
|
|
Involved In
|
3. Application assessment
4. Residence assignment |
|
Communicates with
|
Database
Priority calculator |
|
Knowledge
|
Assessment criteria
Assignment rules Urgency rules |
|
Other competencies
|
Ability to handle problematic non-standard cases
|
|
Responsibilities
& constraints |
Make sure that people are treated equally (no favors). This has been a problem in the past
|
SUMMARY
|
||
STAGES
|
TYPICAL ACTIVITIES
|
|
knowledge
identification
|
|
|
knowledge
specification
|
|
|
knowledge
refinement
|
|
REFERENCES
Questions?
Problem-Solving Methods
MOTIVATION
In order to allow automation in the achievement of complex problems we should like a general solution with the following characteristics:
As a motivating example we consider the task (i.e., goal) of parametric design , which can be defined over:
Design space – the space that contains all possible designs;
Requirements – a finite set of which are assumed to be provided by the user;
Constraints – a finite set of which model additional conditions for a valid design;
Preference – a function over the design space which can be used to discriminate between different solutions.
Domain knowledge is implicit in the parametric design description in several places:
Design space – the concrete definition of the design space is domain specific knowledge
[Requirements]
Constraints – domain knowledge concerning regularities in the domain in which the design is constructed;
[Preference]
Formally, a design artifact is described by a set of attribute-value pairs.
Let A1, ..., An be a fixed set of parameters with fixed ranges R1, ..., Rn:
Design space is the cartesian product R1 × ... × Rn;
Requirements – a relation R which defines a subset of this space;
Constraints – a relation C which defines a subset of this space;
Preference – a partial function P having all possible designs as its domain and preference values as its range.
An inefficient naive approach to parametric design is called generate & test , which depends on the following inferences:
generate – requires knowledge that describes what constitutes a possible design (i.e., the design space);
R-test – requires knowledge that describes which possible designs are desired (i.e. the user requirements);
C-test – requires knowledge that describes which possible designs are valid (i.e., the domain constraints);
select – requires knowledge that evaluates solutions (i.e., knowledge that describes what constitutes a preferred design).
TECHNICAL SOLUTIONS
DEVELOPMENT OF KNOWLEDGE-BASED SYSTEMS TOWARDS PROBLEM-SOLVING METHODS
General Problem Solver
Knowledge-is-power hypothesis
Knowledge levels
3a. Newell’s 3 levels of knowledge
3b. Brachman’s 5 levels of knowledge
Problem Solving Methods
The General Problem Solver (GPS) is a universal problem solving approach.
GPS is the first approach that makes the distinction between knowledge of problems domains and how to solve problems
GPS is domain and task independent approach.
GPS does not put any restrictions both on the domain knowledge and on the task.
Examples of GPS are: automated theorem proving and generic search methods
Generic Search Methods are GPS for which every problem can be expressed as search
One particular example of a Generic Search Method is the A* algorithm.
A* works for problems that can be represented as a state space i.e. a graph of states. Initial conditions of the problem are represented as start state , goal conditions are represented as end state
A* is an informed search or heuristic search approach that uses the estimation function:
f(n)=g(n)+h(n)
g(n) the cost to get from the star state to current state n
h(n) estimated cost to get from current state n to end state
f(n) estimated total cost from start state through current state n to the end state
See Lecture 5
However, GPS has a set of limitations :
It works in theory but in practice works only on toy problems (e.g. Tower of Hanoi)
Could not solve real-world problems because search was easily lost in the combinatorial explosion of intermediate states
Knowledge-is-power hypothesis , also called the Knowledge Principle was formulated by E.A. Feigenbaum in 1977:
“knowledge of the specific task domain in which the program is to do its problem solving was more important as a source of power for competent problem solving than the reasoning method employed” [Feigenbaum, 1977]
The Knowledge-is-power hypothesis shifted the focus on how to build intelligent systems from inference to the knowledge base .
Problem solving is guided by experiential, qualitative, heuristic knowledge.
The meaning of intelligence as knowledge is the common meaning in English world.
The Central Intelligence Agency (CIA) defines intelligence as knowledge.
The Knowledge-is-power hypothesis lead to the emergence of a new filed i.e. expert systems and a new profession i.e. knowledge engineer
3a. Newell’s 3 levels of knowledge
3b. Brachman’s 5 levels of knowledge
In his work from 1981, Newell tried to answer questions such as:
How can knowledge be characterised?
What is the relation of this characterisation and the representation of knowledge?
What is characteristic about a system which holds knowledge?
Newell distinguished 3 levels in the context of knowledge representation:
Knowledge Level
Logical Level
Implementation Level
Knowledge Level
The most abstract level of representing knowledge.
Concerns the total knowledge contained in the Knowledge Base
Example:
The automated DB-Information system knows that a trip from Innsbruck to Vienna costs 120€Logical Level
Encoding of knowledge in a formal language.
Example:
Price(Innsbruck, Vienna, 120)Implementation Level
The internal representation of the sentences.
As a value in a matrix
Implementational Level
The primitives are pointers and memory cells.
Allows the construction of data structures with no a priori semantics
Logical Level
The primitives are logical predicates, operators, and propositions.
An index is available to structure primitives.
A formal semantic is given to primitives in terms of relations among objects in the real world
No particular assumption is made however as to the nature of such relations
Epistemological Level
The primitives are concept types and structuring relations.
Structuring relations provide structure in a network of conceptual types or units. (i.e. inheritance: conceptual units, conceptual sub-units)
The epistemological level links formal structure to conceptual units
It contains structural connections in our knowledge needed to justify conceptual inferences.
Conceptual Level
The primitives are conceptual relations, primitive objects and actions.
The primitives have a definite cognitive interpretation, corresponding to language-independent concepts like elementary actions or thematic roles
Linguistic Level
The primitives are words, and (lingustic) expressions.
The primitives are associated directly to nouns and verbs of a specific natural language
Arbitrary relations and nodes that exist in a domain
Problem Solving Methods (PSM) abstract from details of the implementation of the reasoning process.
Knowledge roles determine which role the domain knowledge plays in each inference action.
PROBLEM-SOLVING METHODS
“Reasoning strategies that gain efficiency through
assumptions.
”
[Fensel, 2000]
We consider again the task of parametric design.
revise – tries to improve an incorrect design based on the feedback of the C-test step.
In other words, instead of proposing a complete design which is then repaired, we can also incrementally develop a design and repair it at each step where a constraint violation occurs.
revise delivers designs that are desired or that requirement violations that it does not x must be accepted.
It assumes that the propose step and the revise step deliver acceptable (or optimal) solutions or that the functionality of the task is reduced to finding an arbitrary solution.
the task – specifies the goals that should be solved in order to solve a given problem. A second part of a task specification is the definition of requirements on domain knowledge;
the problem-solving method – describes an reasoning steps to perform a task as an operational specification separately from a description of the competence, and a description of the requirements on domain knowledge;
the domain model – usually ontology-based description in three parts, a meta-level characterisation of properties, the domain knowledge, and (external) assumptions of the domain model;
the adapter – maps the different terminologies of the task definition, problem-solving method and domain model. Moreover, gives further requirements and assumptions needed to relate the competence of the PSM with the functionality of the task.
Local search can be represented in the following operational specification:
operational specification local search
local search(X)
output := recursion(currents);
successors := generate(X); new := select2(X, successors);
then output := select3(X); else recursion(new); |
select1(x) ⊆ x
|
ILLUSTRATION BY A LARGER EXAMPLE
The Sisyphus-I office allocation problem was the first of a series of test applications for approaches to building knowledge-based systems.
The aim is to automate the problem-solving behaviour of ‘Siggi’, a hypothetical domain expert .
Specifically Siggi is required to make assignments of the staff of a research group ‘YQT’ to offices.
The 4-page problem statement describes the office layout and relevant data about the 15 group members.
A ‘protocol’ is provided, describing the steps Siggi takes.
From the protocol can be drawn data such as:
The approach used rests on mapping from task concepts to method concepts as shown [Motta, 1999] :
From the protocol can be drawn value ranges, based on given justifications in the protocol:
From the protocol can be drawn the following requirements and constraints:
From the protocol can be drawn the following preferences:
< n1, n2, n3, n4 > , where:
n1 measures the distance between the room of the head of group and that of the secretaries;
n2 measures the distance between the manager’s room and the rooms of the head of group and the secretaries;
n3 measures the distance between the heads of projects and the head of group and secretaries;
n4 provides a measure of the ‘project synergy’ afforded by a solution.
A design model in the Sisyphus-I domain d1, with cost function
< n11, n12, n13, n14 >
is cheaper than a design model d2, with cost
< n21, n22, n23, n24 >
, iff one or more of the following conditions are satisfied:
n11 < n21
n11 = n21 and n12 < n22
n11 = n21 and n12 = n22 and n13 < n23
n11 = n21 and n12 = n22 and n13 = n23 and n14 < n24
EXTENSIONS
A standard language was developed for the description of PSMs called the Unified Problem-Solving Method Language (UPML) [Fensel et al, 2002]
UPML was applied to the modelling of Web Services in the Web Service Modelling Framework (WSMF) [Fensel & Bussler, 2002]
The WSMF approach was encoded in the Web Service Modeling Ontology (WSMO) – wherein ontology-based models are builts for goals (~tasks), services (~methods) and these are linked by mediators (~adapters) [Fensel et al., 2007]
WSMO was encoded in a family of ontology languages, WSML, and an open-source implementation was carried out in WSMX.
SUMMARY
Problem-solving methods offer a means to structure and reuse elements of knowledge-based systems by abstracting from their domain.
Efficiency is achieved by introducing assumptions that either restrict the size of the problem or that postulate strong requirements on the available domain knowledge.
Adapters link tasks and PSMs to domains, allowing reuse of both, and express refinements between PSMs, allowing libraries to be built of these.
REFERENCES
[Fensel, 2000] “Problem-Solving Methods: Understanding, Description, Development and Reuse”, D. Fensel, Springer LNAI 1791, 2000
[Stefik et al., 1983] “Basic Concepts for Building Expert Systems”, M. Stefik, J. Aitkins, R. Balzer, J. Benoit, L. Birnbaum, F. Hayes-Roth, and E. Sacerdoti in Building Expert Systems , Addison-Wesley, 1983
[Smith & Lowry, 1990] “Algorithm Theories and Design Tactics”, D. R. Smith and M. R. Lowry in Science of Computer Programming , 14:305–321Addison-Wesley, 1990
[Fensel et al, 2002] “The Unified Problem-solving Method Development Language UPML”, D. Fensel, E. Motta, R. Benjamins, M. Crubezy, S. Decker, M. Gaspari, R. Groenboom, W. Grosso, F. van Harmelen, M. Musen, E. Plaza, G. Schreiber, R. Studer, B. Wielinga, Knowledge and Information Systems, 5(1), 83–131, 2002
[Fensel & Bussler, 2002] “The Web Service Modeling Framework (WSMF)”, D. Fensel and C. Bussler, Electronic Commerce Research and Applications, 1(2), 2002
[Fensel et al., 2007] “Enabling Semantic Web Services:The Web Service Modeling Ontology (WSMO)”, D. Fensel, H. Lausen, A. Polleres, J. de Bruijn, M. Stollberg, D. Roman, and J. Domingue, Springer, 2007
[Brachman, 1979] “On the Epistemological Status of Semantic Networks” R. J. Brachman. In: N.V. Findler (ed.): Associative Networks: Representation and Use of Knowledge by Computers. New York: Academic Press, 1979, 3-50.
[Birmingham&Klinker, 1993] “Knowledge Acquisition Tools with Explicit Problem-Solving Methods”, W. Birmingham and G. Klinker. The Knowledge Engineering Review 8, 1 (1993), 5-25
[Feigenbaum, 1977] “The Art of Artificial Intelligence: Themes and Case Studies of Knowledge Engineering,” E.A. Feigenbaum. Proceedings of the International Joint Conference on Artificial Intelligence, Cambridge, MA, 1977
[Newell, 1981] “The Knowledge Level” Newell, AI Magazine 2 (2), 1981, p. 1-20
Wikipedia and other links:
http://en.wikipedia.org/wiki/Problem_solving
http://www.wsmo.org/
http://www.wsmo.org/wsml/
http://www.wsmx.org/
http://www.wsmo.org/papers/publications/wsmf.paper.pdf
Questions?
Planning
Motivation
Technical Solution
Illustration by a Larger Example
Extensions
Summary
References
Motivation
Goal: Buy 1 liter of milk
It may sound like a simple task, but if you break it down, there are many small tasks involved:
|
![]() ![]() ![]() |
![]() ![]() ![]() |
![]() |
![]() |
TECHNICAL SOLUTIONS
Use a Heuristic function to estimate G- or I-distance, respectively – prefer states that have a lower heuristic value.
It will also introduce partial-order planning as a more flexible approach
REPRESENTATION
A problem to be solved by a planning system is called a planning problem
The world in which planning takes place is often called the domain or application domain.
A state is a point in the search space of the application
A planning problem is characterized by an initial state and a goal state .
The initial state is the state of the world just before the plan is executed
The goal state is the desired state of the world that we want to reach
after the plan has been executed. A goal state is also refer as simply goal
Stanford Research Institute Problem Solver (STRIPS) is an automated planner developed by Richard Fikes and Nils Nilsson in 1971 [Nilsson & Fikes, 1970]
STRIPS is also the name of the planning language used for the STRIPS planner
A state (excluding the goal state) is represented in STRIPS as a conjunction of function-free ground literals
e.g. on(A, Table) ^ on (B, A) ^ ¬ clear(A) ^ clear(B)
(block A is on the Table, block B is on block A, block A has something on top, block B has nothing on top)
A conjunction of function-free ground literals is also know as a fact
A goal is represented in STRIPS as a conjunction of literals that may contain variables
e.g. on(B, Table) ^ on (A, B) ^ clear(A) ^ ¬ clear(B)
(block B is on the Table, block A is on block B, block B has something on top, block A has nothing on top)
Definition – Action
Let P be a set of facts. An action a is a triple a = ( pre(a ), add(a ), del(a )) of subsets of P , where add(a ) \ del(a ) = Ø
pre(a ) , add(a ) , and del(a ) are called the action precondition , add list , and delete list , respectively
Delete List: list of formulas to be deleted from the description of state before the action is performed
The action pickup in the Blocks World domain can be modeled as follows in STRIPS
pickup(x)
Preconditions : on(x, Table) ^ handEmpty ^ clear(x)
Add List : holding(x)
Delete List : on(x, Table) ^ handEmpty ^ clear(x)
Definition – Task
A task is a quadruple (P,A,I,G) where P is a (finite) set of facts, A is a (finite) set of actions, and I and G are subsets of P, I being the initial state and G the goal state.
A goal state i.e. the desired state of the world that we want to reach
result(s , ‹a , a1› ) = result(result(s , ‹ a › ), ‹a1› ) = result(s1, ‹a1› ) = s1 ∪ add(a1)) \ del(a1)
= ((holding (A) ^ clear(B ) ∪ holding(x ) ^ on(x,y ) ^ handEmpty ^ clear(x )) \ holding(x ) ^ clear(y )
= on(A,B ) ^ handEmpty ^ clear(A )
Precondition hoding(x ) ^ clear(y ) is fulfilled in state s1 = holding(A ) ^ clear(B )
Definition – Minimal Plan
Let (P,A,I,G) be a task. A plan ‹a1, . . . ,an› for the task is minimal if ‹a1, . . . , ai-1, ai+1, . . . ,an› is not a plan for any i.
Definition – Optimal Plan
Let (P,A,I,G) be a task. A plan ‹a1, . . . ,an› for the task is optimal if there is no plan with less than n actions
|
|
ALGORITHMS
We now present particular algorithms related to (direct) state-space search:
Progression -based search also know as forward search
Regression -based search also know as backward search
Heuristic Functions and Informed Search Algorithms
Definition – Search Scheme
A search scheme is a tuple (S, s0, succ, Sol):
the set S of all states s ∈ S,
the start state s0 ∈ S,
the successor state function succ : S → 2S, and
the solution states Sol ⊆ S.
Solution paths s0 →, ... , → sn ∈ Sol correspond to solutions to our problem
Regression algorithm:
Start with the goal
Choose an action that will result in the goal
Replace that goal with the action’s preconditions
Repeat steps 2-3 until you have reached the initial state
Regression is not as non-natural as you might think: if you plan a trip to Thailand, do you start with thinking about the train to Frankfurt?
Definition – Heuristic Function
Let (S, s 0 , succ , Sol) be a search scheme. A heuristic function is a function h : S → N 0 ∪ { ∞ } from states into the natural numbers including 0 and the ∞ symbol. Heuristic functions , or heuristics , are efficiently computable functions that estimate a state’s “solution distance”.
A heuristic function for Blocks World domain could be:
h1(s) = n – sum(b(i )) , i =1… n , where
b(i )=0 in state s if the block i is resting on a wrong thing
b(i )=1 in state s if the block i is resting on the thing it is supposed to be resting on;
n is the total number of blocks
h1(I) = 3 – 1 (because of block B) – 0 (because of block C) – 0 (because of block A) = 2
h1(G) = 3 – 1 (because of block B) – 1 (because of block C) – 1 (because of block A) = 0
Definition – Solution Distance
Let (S, s 0 , succ , Sol) be a search scheme. The solution distance sd(s ) of a state s ∈ S is the length of a shortest succ path from s to a state in Sol , or 1 if there is no such path. Computing the real solution distance is as hard as solving the problem itself.
Definition – Admissible Heuristic Function
Let (S, s0 , succ , Sol) be a search scheme. A heuristic function h is admissible if h(s ) ≤ sd(s ) for all s ∈ S .
In other words, an admissible heuristic function is a heuristic function that never overestimates the actual cost i.e. the solution distance.
e.g. h1 heuristic function defined below is admissible
h1(s) = n – sum(b(i )) , i =1… n , where
b(i )=0 in state s if the block i is resting on a wrong thing
b(i )=1 in state s if the block i is resting on the thing it is supposed to be resting on;
n is the total number of blocks
COMPLEXITY AND DECIDABILITY
Its time complexity : in the worst case, how many search states are expanded?
Its space complexity : in the worst case, how many search states are kept in the open list at any point in time?
Is it complete , i.e. is it guaranteed to find a solution if there is one?
Is it optimal , i.e. is it guaranteed to find an optimal solution?
Definition - PLANSAT
Let PLANSAT denote the following decision problem. Given a STRIPS task (P, A, I, G), is the task solvable?
Theorem
PLANSAT is PSPACE-complete. Proof in see [Bylander, 1994]
Definition – Bounded- PLANSAT
Let Bounded-PLANSAT denote the following decision problem. Given a STRIPS task (P, A, I, G) and an integer b . Is there a plan with at most b actions?
ILLUSTRATION BY A LARGER EXAMPLE
PROGRESSION
s12 = result(s0, ‹pickup(B)›) = on(A, Table) ^ on(B, Table) ^ on(C, Table) ^ on(D, C) ^ clear(A) ^ clear(B) ^ clear(D) ^ handEmpty ∪ holding(x) \ on(x, Table) ^ handEmpty ^ clear(x) = on(A, Table) ^ on(C, Table) ^ on(D, C) ^ clear(A) ^ clear(D) ^ holding(B)
s13 = result(s0, ‹unstack(D,C)›) = on(A, Table) ^ on(B, Table) ^ on(C, Table) ^ on(D, C) ^ clear(A) ^ clear(B) ^ clear(D) ^ handEmpty ∪ holding(x) ^ clear(y) \ on(x, y) ^ handEmpty ^ clear(x) = on(A, Table) ^ on(B, Table) ^ on(C, Table) ^ clear(A) ^ clear(B) ^ clear(C) ^ holding(D)
REGRESSION
Step 3:
sn-1=regress(G, stack(C,A)) = G \ add(stack ) ∪ pre(stack ) = on(A, Table) ^ on(D, Table) ^ on(C, A) ^ on(B,D) ^ clear(C) ^ clear(B) ^ handEmpty \ on(C,A) ^ handEmpty ^ clear(C) ∪ holding(C) ^ clear(A) = on(A, Table) ^ on(D, Table) ^ on(B, D) ^ clear(B) ^ clear(A) ^ holding(C)
sn-2=regress(G, stack(B,D)) = G \ add(stack ) ∪ pre(stack ) = on(A, Table) ^ on(D, Table) ^ on(C, A) ^ on(B,D) ^ clear(C) ^ clear(B) ^ handEmpty \ on(B,D) ^ handEmpty ^ clear(B) ∪ holding(B) ^ clear(D) = on(A, Table) ^ on(D, Table) ^ on(C, A) ^ clear(C) ^ clear(D) ^ holding(B)
EXTENSIONS
Forward and backward state-space search are both forms of totally-ordered plan search – explore linear sequences of actions from initial state or goal.
As such they cannot accomodate problem decomposition and work on subgraphs separately.
A set of open preconditions not achieved by any action in the current plan
|
||
Partial Order Plan:
|
Total Order Plans:
|
|
|
|
Compared with a classical search procedure, viewing a problem as one of constraint satisfaction can reduce substantially the amount of search.
Constrain-based search involved complex numerical or symbolic variable and state constraints
Constraint-Based Search:
1. Constraints are discovered and propagated as far as possible.
2. If there is still not a solution, then search begins, adding new constraints.
Constrain the task by a time horizon, b
Formulate this as a constraint satisfaction problem and use backtracking methods to solve it: “do x at time t yes/no”, “do y at time t ’ yes/no”,
If there is no solution, increment b
Inside backtracking : constraint propagation, conflict analysis are used
Constraint-Based Searcg is an “undirected” search i.e. there is no fixed time-order to the decisions done by backtracking
SUMMARY
REFERENCES
Questions?
Software Agents
MOTIVATION
|
![]() |
TECHNICAL SOLUTION AND ILLUSTRATIONS
Definition
Properties
Other properties, sometimes discussed in the context of agents:
Environments
Agents as intentional systems
Agent architecture
Multi-agent systems
Agents |
Number Uniformity Goals Architecture Abilities (sensor & effectors) |
|
Interaction |
Frequency Persistence Level Pattern (flow of control) Variability Purpose |
|
Environment |
Predictability Accessibility Dynamics Diversity Availability of resources |
Speed & Efficiency
Robustness & Reliability
Scalability & Flexibility
Cost
Distributed Development
Reusability
How to enable agents to decompose their tasks and goals (and allocate sub-goals and sub-tasks to other agents) and synthesize partial results
How to enable agents to communicate, what languages and protocols to use
How to enable agents to represent and reason about the actions, plans, and knowledge of other agents in order to interact with them
How to enable agents to represent and reason about the state of their interactions
How to enable agents to recognize and handle conflicts between agents
How to engineer practical multiagent systems
How to effectively balance local computational versus communication
How to avoid or mitigate harmful (chaotic or oscillatory) system wide behavior
How to enable agents to negotiate and contract with each other
How to form and dissolve organizational structures to meet specific goals and objectives
How to formally describe multiagent systems and the interaction between agents and how to ensure multiagent systems are correctly specified
Electronic commerce
Real-time monitoring and control of networks
Modeling and control of transportation systems
Information handling
Automatic meeting scheduling
Industrial manufacturing and production
Electronic entertainment
Re-engineering of information flow in large organizations
Investigation of complex social phenomena such as evolution of roles, norms, and organizational structures
LARGE EXAMPLE
|
![]() |
|
![]() |
|
![]() |
A system allowing several autonomous program agents to play a virtual soccer game.
The games are carried out in a client/server style – where each client = one player.
The communication can be done over a network connection, or even the Internet.
Used to display the visual progress of the game.
Several Monitors can be connected to the server.
It can also be used to interrupt game play by doing simple tasks such as dropping a ball.
The agents are the brains of the players.
Players receive sensory information from the server, upon which decisions are made.
Commands are formatted and sent to the server.
|
![]() |
EXTENSIONS
SUMMARY
Multiagent systems are systems in which multiple interacting agents interact to solve problems
Key concepts of multiagent systems are agents and agent coordination
There are many important issues for multiagent systems that attempt to answer when and how to interact with whom
Common characteristics of multiagent systems are their inherent distribution and complexity
Distributed and flexible nature of multiagent systems leads to increased speed, robustness, scalability and reusability
REFERENCES
Questions?
Rule Learning
Slides in this lecture are partially based
on [1], Section 3 “Decision Tree Learning”.
MOTIVATION
Research on learning in AI is made up of diverse subfields (cf. [5])
Problem solving methods
Discrimination nets
Machine Learning is a central research area in AI to acquire knowledge [1].
is a means to alleviate the „bottleneck“ (knowledge aquisition) – problem in expert systems [4].
I = {milk, bread, butter, beer}
Sample database
ID
|
Milk
|
Bread
|
Butter
|
Beer
|
1
|
1
|
1
|
1
|
0
|
2
|
0
|
1
|
1
|
0
|
3
|
0
|
0
|
0
|
1
|
4
|
1
|
1
|
1
|
0
|
5
|
0
|
1
|
0
|
0
|
A possible rule would be {milk, bread} → {butter}
TECHNICAL SOLUTIONS
Many inductive knowledge acquisition algorithms generate („induce“) classifiers in form of decision trees.
Intermediate nodes represent tests
Decision trees classify instances by sorting them down the tree from the root to some leaf node which provides the classification of the instance.
{Outlook = Sunny, Temperature = Hot,
Humidity = High, Wind = Strong}:
Negative Instance
People are able to understand decision trees model after a brief explanation.
Decision trees are easily interpreted
Other techniques are usually specialised in analysing datasets that have only one type of variable. Ex: neural networks can be used only with numerical variables
If a given situation is observable in a model the explanation for the condition is easily explained by boolean logic. An example of a black box model is an artificial neural network since the explanation for the results is excessively complex to be comprehended.
Large amounts of data can be analysed using personal computers in a time short enough to enable stakeholders to take decisions based on its analysis.
RULE LEARNING TASKS
Learning of correct rule that cover a maximal number of positive examples but no negative ones. We call this maximal requirement for each rule.
Learning of a minimal rule set that covers all positive example. We call this minimal requirement for the rule set.
To illustrate the two rule learning tasks let’s consider a data set, containing both positive and negative examples, as depicted in the following chart
We want to learn a correct rule that cover a maximal number of positive examples but no negative examples (maximal requirement for each rule).
If x=3 and y=3 then class = positive
The minimal requirement for the rule set is about learning of a minimal rule set that covers all positive example.
The minimal requirement for the rule set is similar with the minimal set cover problem [18]
The minimal requirement for the rule set, which basically translates to building the optimal linear decision tree, is NP-hard [19].
RULE LEARNING APPROACHES
Rule learning can be seen as a search problem in the latice of possible rules
Combining of specialization and generalization – the search procedure can start at any arbitrary point in the lattice and can move freely up or down as needed.
Specialization and Generalization are dual search directions in a given rule set.
SPECIALIZATION
This is done by adding additional premises to the antecedent of a rule, or by restricting the range of an attribute which is used in an antecedent.
This brings along the risk for ending up with results that are not maximal-general.
Some examples of (heuristic) specialization algorithms are the following: ID3, AQ, C4, CN2, CABRO, FOIL, or PRISM; references at the end of the lecture.
THE ID3 ALGORITHM
Core question: Which attribute is the best classifier?
ID3 uses a statistical measure called information gain that measures how well a given example separates the training examples according to their target classification.
|
||
![]() |
![]() |
Information gain measures the expected reduction in entropy caused by partitioning the examples according to an attribute:
First term: entropy of the original collection S; second term: expected value of entropy after S is partitioned using attribute A (Sv subset of S).
Gain(S,A): The expected reduction in entropy caused by knowing the value of attribute A.
ID3 uses information gain to select the best attribute at each step in growing the tree.
ID3’s hypothesis space is the complete space of finite discrete-valued functions, relative to the available attributes.
ID3 maintains only a single current hypothesis, as it searches through the space of trees (earlier (perhaps better) versions are eliminated).
ID3 performs no backtracking in search; it converges to locally optimal solutions that are not globally optimal.
ID3 uses all training data at each step to make statistically based decisions regarding how to refine its current hypothesis.
ID3 searches a complete hypothesis search.
Termination condition: Discovery of a hypothesis consistent with the data
ID3’s inductive bias is a preference bias (it prefers certain hypotheses over others).
Alternatively: A restriction bias categorically restricts considered hypotheses.
…or why prefer short hypotheses?
William Occam was one of the first to discuss this question around year 1320.
Several arguments have been discussed since then.
Very complex hypotheses fail to generalize correctly to subsequent data.
Target attribute: PlayTennis (values: yes / no)
ID3 determines information gain for each candidate attribute (e.g., Outlook, Temperature, Humidity, and Wind) and selects the one with the highest information gain
Gain(S, Outlook) = 0.246 ; Gain(S, Humidity) = 0.151; Gain(S, Wind) = 0.048; Gain(S, Temperature)=0.029
Determing new attributes at the „Sunny“ – branch only using the examples classified there:
The training examples associated with this leaf node all have the same target attribute value (i.e., their entropy is zero).
GENERALIZATION
The generalization procedure stops if no more premises to remove exist.
However, generalization of course risks to derive final results that are not most-specific.
RELAX is an example of a generalization-based algorithm; references at the end of the lecture.
THE RELAX ALGORITHM
1) x ∩ ¬ y ∩ z → C |
5) x → C |
|
2) ¬ y ∩ z → C |
6) ¬ y → C |
|
3) x ∩ z → C |
7) z → C |
|
4) x ∩ ¬ y → C |
8) → C |
COMBINING SPECIALIZATION AND GENERALIZATION
THE JOJO ALGORITHM
In general, it cannot be determined which search direction is the better one.
JoJo is an algorithm that combines both search directions in one heuristic search procedure.
JoJo can start at an arbitrary point in the lattice of complexes and generalizes and specializes as long as the quality and correctness can be improved, i.e. until a local optimum can be found, or no more search resources are available (e.g., time, memory).
Either of the two strategies can be used interchangeable, which makes JoJo more expressive than comparable algorithms that apply the two in sequential order (e.g. ID3).
Horizontal position (chosen premises)
Reminder: JoJo can start at any arbitrary point, while specialization requires a highly general point and generalization requires a most specific point.
In general, it is possible to carry out several runs of JoJo with different starting points. Rules that were already produced can be used as subsequent starting points.
Criteria for choosing a vertical position:
Approximation of possible lenght or experience from earlier runs.
Random production of rules; distribution by means of the average correctness of the rules with the same length (so-called quality criterion).
Start with a small sample or very limited resources to discover a real starting point from an arbitrary one.
Randomly chosen starting point (same average expectation of success as starting with ‘Top‘ or ‘Bottom‘).
Heuristic: few positive examples and maximal-specific descriptions suggest long rules, few negative examples and maximal-generic descriptions rather short rules.
REFINEMENT OF RULES WITH JOJO
Refinement of rules refers to the modification of a given rule set based on additonal examples.
The input to the task is a so-called hypothesis (a set of rules) and a set of old and new positive and negative examples.
The output of the algorithm are a refined set of rules and the total set of examples.
The new set of rules is correct, complete, non-redundant and (if necessary) minimal.
There is a four step procedure for the refinment of rules:
Rules that become incorrect because of new examples are refined: incorrect rules are replaced by new rules that cover the positive examples, but not the new negative ones.
Complete the rule set to cover new positive examples.
Redundant rules are deleted to correct the rule set.
Steps 1 and 2 are subject to the algorithm JoJo that integrates generalization and specification via a heuristic search procedure.
Replace a rule by a new set of rules that cover the positive examples, but not the negative ones.
Compute new correct rules that cover the not yet considered positive examples (up to a threshold).
Remove rules that are more specific than other rules (i.e. rules that have premises that are a superset of the premises of another rule).
ILLUSTRATION BY A LARGER EXAMPLE: ID3
EXTENSIONS
THE C4.5 ALGORITHM
![]() |
|
ID3 would search for new refinements at the bottom of the left tree.
Infer the decision tree from the training set, growing the set until the training data is fit as well as possible and allowing overfitting to occur.
Convert the learned tree into an equivalent set of rules by creating one rule for each path from the root node to a leaf node.
Prune (generalize) each rule by removing any preconditions that result in improving its estimated accuracy.
Rule accuracy estimation based on the training set using a pessimistic estimate: C4.5 calculates standard deviation and takes the lower bound as estimate for rule performance.
Central question: How to select the best value for a threshold c?
Approach:
Sort examples according to the continuous attribute A.
Identify adjacent examples that differ in their target classification.
Generate a set of candidate thresholds midway between the corresponding values of A (c must always lie at such a boundary, see [5]).
Information gain for the first threshold is higher.
But it would be a very poor predictor of the target function over unseen instances
Alternative selection measure: gain ratio (see [5]).
Gain ratio penalizes attributes such as Date by incorporating split information (entropy of S with respect to A) that is sensitive to how broadly and uniformly the attribute splits the data:
Definition GainRatio:
SUMMARY
Specification
JoJo can also be applied to incrementally refine rule sets.
REFERENCES
[6] Quinlan, J. R. “C4.5: Programs for Machine Learning” Morgan Kaufmann, 1993.
[7] Fayyad, U. M. “On the induction of decision trees for multiple concept learning” PhD dissertation, University of Michigan, 1991.
[8] Agrawal, Imielinsky and Swami: Mining Association Rules between Sets of Items in Large Databases. ACM SIGMOD Conference, 1993, pp. 207-216.
[9] Quinlan: Generating Production Rules From Decision Trees. 10th Int’l Joint Conference on Artificial Intelligence, 1987, pp. 304-307.
[10] Fensel and Wiese: Refinement of Rule Sets with JoJo. European Conference on Machine Learning, 1993, pp. 378-383.
[18] http://www.ensta.fr/~diam/ro/online/viggo_wwwcompendium/node146.html.
[19] Goodrich, M.T., Mirelli, V. Orletsky, M., and Salowe, J. Decision tree conctruction in fixed dimensions: Being global is hard but local greed is good. Technical Report TR-95-1, Johns Hopkins University, Department of Computer Science, Baltimore, MD 21218, May 1995
Wikipedia links:
Machine Learning, http://en.wikipedia.org/wiki/Machine_learning
Rule learning, http://en.wikipedia.org/wiki/Rule_learning
Association rule learning, http://en.wikipedia.org/wiki/Association_rule_learning
ID3 algorithm, http://en.wikipedia.org/wiki/ID3_algorithm
C4.5 algorithm, http://en.wikipedia.org/wiki/C4.5_algorithm
Questions?
Inductive Logic Programming
MOTIVATION
ILP theory describes the inductive inference of logic programs from instances and background knowledge
ILP contributes to the practice of logic programming by providing tools that assist logic programmers to develop and verify programs
Model Theory of ILP
The background theory is a set of definite clauses
The negative evidence is derived implicitly, by making the closed world assumption (realized by the minimal Herbrand model)
The hypotheses are sets of general clauses expressible using the same alphabet as the background theory
A Generic ILP Algorithm
Combining Delete with Prune it is easy to obtain advanced search
Some frequently employed criteria require that a solution be found, or that it is unlikely that an adequate hypothesis can be obtained from the current queue
Proof Theory of ILP
θ -subsumes is the simplest model of deduction for ILP which regards clauses as sets of (positive and negative) literals
A clause c 1 θ -subsumes a clause c 2 if and only if there exists a substitution θ such that c 1 θ ⊆ c 2
c 1 is called a generalisation of c 2 (and c 2 a specialisation of c 1 ) under θ subsumption
θ -subsumes The θ subsumption inductive inference rule is:
ILP Systems
Basic algorithm for CProgol:
Select an example to be generalized.
Build most-specific-clause. Construct the most specific clause that entails the example selected, and is within language restrictions provided. This is usually a definite clause with many literals, and is called the "bottom clause."
Find a clause more general than the bottom clause. This is done by searching for some subset of the literals in the bottom clause that has the "best" score.
Remove redundant examples. The clause with the best score is added to the current theory, and all examples made redundant are removed. Return to Step 1 unless all examples are covered.
|
![]() |
|
![]() |
Michalski’s train problem
Assume ten railway trains: five are travelling east and five are travelling west; each train comprises a locomotive pulling wagons; whether a particular train is travelling towards the east or towards the west is determined by some properties of that train
The learning task: determine what governs which kinds of trains are Eastbound and which kinds are Westbound
Michalski’s train problem can be viewed as a classification task: the aim is to generate a classifier (theory) which can classify unseen trains as either Eastbound or Westbound
The following knowledge about each car can be extracted: which train it is part of, its shape, how many wheels it has, whether it is open (i.e. has no roof) or closed, whether it is long or short, the shape of the things the car is loaded with. In addition, for each pair of connected wagons, knowledge of which one is in front of the other can be extracted.
http://www.doc.ic.ac.uk/~shm/Software/progol5.0
http://www.comp.rgu.ac.uk/staff/chb/teaching/cmm510/michalski_train_data
Generate the hypotheses
SUMMARY
REFERENCES
Questions?
Formal Concept Analysis
From data to their conceptualization
Formal Concept Analysis
This description of the concept “bird” is based on sets of
Objects, attributes and a relation form a formal concept
A repertoire of objects and attributes (which might or