Neural Networks
MOTIVATION
A main motivation behind neural networks is the fact that symbolic rules do not reflect reasoning processes performed by humans.
Biological neural systems can capture highly parallel computations based on representations that are distributed over many neurons.
They learn and generalize from training data; no need for programming it all...
They are very noise tolerant – better resistance than symbolic systems.
Neural networks are stong in:
TECHNICAL SOLUTIONS
(Artificial) Neural Networks
but also the frequency of oscillation, hence neural networks have an even higher complexity. 
Metaphorically speaking, a thought is a specific selfoscillation of a network of neurons.
The topology of the network determines its resonance.
However, it is the resonance in the brain's interaction with the environment and with itself that creates, reinforces or decouples interaction patterns.
The brain is not a static device, but a device that is created through usage…
What we refer to as Neural Networks, in this course, are mostly Artificial Neural Networks (ANN).
ANN are approximations of biological neural networks and are built of physical devices, or simulated on computers.
ANN are parallel computational entities that consist of multiple simple processing units that are connected in specific ways in order to perform the desired tasks.
Remember: ANN are computationally primitive approximations of the real biological brains .
Perceptron
Changing the bias weight W_{0,i} moves the threshold location
McCulloch and Pitts: Boolean function can be implemented with a artificial neuron (not XOR).
Artificial Neural Network Structures

XOR Problem: Recall that XOR cannot be modeled with a SingleLayer FeedForward perceptron.
Rule of Thumb 1
Even if the function to learn is slightly nonlinear, the generalization may be better with a simple linear model than with a complicated nonlinear model; if there is too little data or too much noise to estimate the nonlinearities accurately.
Rule of Thumb 2
If there is only one input, there seems to be no advantage to using more than one hidden layer; things get much more complicated when there are two or more inputs.
1st layer draws linear boundaries 
2nd layer combines the boundaries. 
3rd layer can generate arbitrarily boundaries. 
Learning and Generalization
Learning is based on training data, and aims at appropriate weights for the perceptrons in a network.
Direct computation is in the general case not feasible.
An initial random assignment of weights simplifies the learning process that becomes an iterative adjustment process.
In the case of single perceptrons, learning becomes the process of moving hyperplanes around; parametrized over time t: Wi(t+1) = Wi(t) + Δ Wi(t)
Learning at the output layer is the same as for singlelayer perceptron:
Typical problems: slow convergence, local minima
ILLUSTRATION BY AN EXAMPLE
out1 = sign(0.92*0 + 0.62*0 – 0.22) = sign(0.22) = 0  
out2 = sign(0.92*1 + 0.62*0 – 0.22) = sign(0.7) = 1  X 
W_{2}(1) = 0.62 + 0.1 * (0 – 1) * 0 = 0.62 W_{0}(1) = 0.22 + 0.1 * (0 – 1) * (1)= 0.32 

out3 = sign(0.82*0 + 0.62*1 – 0.32) = sign(0.5) = 1
out4 = sign(0.82*1 + 0.62*1 – 0.32) = 1 
Training – epoch 2:
out1 = sign(0.82*0 + 0.62*0 – 0.32) = sign(0.32) = 0 

out2 = sign(0.82*1 + 0.62*0 – 0.32) = sign(0.5) = 1
W_{2}(2) = 0.62 + 0.1 * (0 – 1) * 0 = 0.62 W_{0}(2) = 0.32 + 0.1 * (0 – 1) * (1)= 0.42 
X 

out3 = sign(0.72*0 + 0.62*1 – 0.42) = sign(0.3) = 1 

out4 = sign(0.72*1 + 0.62*1 – 0.42) = 1 
Training – epoch 3:
out1 = sign(0.72*0 + 0.62*0 – 0.42) = 0 

out2 = sign(0.72*1 + 0.62*0 – 0.42) = 1
W_{2}(3) = 0.62 + 0.1 * (0 – 1) * 0 = 0.62 W_{0}(3) = 0.42 + 0.1 * (0 – 1) * (1)= 0.52 
X 
out3 = sign(0.62*0 + 0.62*1 – 0.52) = 1 

out4 = sign(0.62*1 + 0.62*1 – 0.52) = 1 
Training – epoch 4:
out1 = sign(0.62*0 + 0.62*0 – 0.52) = 0 

out2 = sign(0.62*1 + 0.62*0 – 0.52) = 1
W_{2}(4) = 0.62 + 0.1 * (0 – 1) * 0 = 0.62 W_{0}(4) = 0.52 + 0.1 * (0 – 1) * (1)= 0.62 
X 

out3 = sign(0.52*0 + 0.62*1 – 0.62) = 0
W_{2}(4) = 0.62 + 0.1 * (1 – 0) * 1 = 0.72 W_{0}(4) = 0.62 + 0.1 * (1 – 0) * (1)= 0.52 
X 

out4 = sign(0.52*1 + 0.72*1 – 0.52) = 1 
Finally:
out1 = sign(0.12*0 + 0.82*0 – 0.42) = 0 

out2 = sign(0.12*1 + 0.82*0 – 0.42) = 0 

out3 = sign(0.12*0 + 0.82*1 – 0.42) = 1 

out4 = sign(0.12*1 + 0.82*1 – 0.42) = 1 

SUMMARY
REFERENCES
Questions?