When specific information about the domain comes in the rules are used to draw conclusions and to point out appropriate actions. This is called inference. The inference takes place as a kind of chain reaction. In the above example, if you are told that there is a power failure, rule 3 will state that there is a pump failure and rule 1 will then tell us that the pressure is low. Rule 2 will also give a (useless) recommendation to check the oil level.
Rules can also be used in the opposite direction. Suppose that you are told that the pressure is low, then rule 1 states that it can be due to a pump failure, while rule 3 states that a pump failure can be caused by a power failure. You should also be able to use rule 2 to recommend checking the oil level, but it is tremendously difficult to control such a mixture of inference back an forth in the same session.
Uncertainty
Often the connections reflected by the rules are not absolutely certain, and similarly the gathered information is often subject to uncertainty. In such cases, a certainty measure is added to the premises as well as the conclusions in the rules of the system. Now, a rule gives a function that describes how much a change in the certainty of the premis will change the certainty of the conclusion. In its simplest form, this looks like:
4) If A (with certainty x) then B (with certainty f(x))
There are many schemes for treating uncertainty in rulebased systems. The most common are fuzzy logic, certainty factors and (adaptions of) Dempster-Shafer belief functions. Common to all of these schemes is that uncertainty is treated locally. That is, the treatment is connected directly to the incoming rules and the uncertainty of their elements. Imagine, for example, that in addition to 4) we have the rule
5) If C (with certainty x) then B (with certainty g(x))
If we now get the information that A holds with certainty a and C holds with certainty c, what is the certainty of B?
There are different algebras for such a combination of uncertainty, depending on the scheme. Common to all these algebras is that in many cases they come to incorrect conclusions. This is because the combination of uncertainty is not a local phenomenon, but it is strongly dependent on the entire situation (in principle a global matter).
A neural network consists of several layers of nodes: On top there is a layer of input-nodes, In the bottom a layer of output-nodes and in between these normally 1-2 hidden layers. All nodes in a layer are in principle connected to all nodes in the layer just below. A node along with the in-going edges belonging to it is called a perceptrone.
A neural network performs pattern recognition. You could for instance imagine a neural network that reads handwritten letters. By automatic tracking, a handwritten letter can be transformed into a set of findings on curves (not a job for the network). The network will have an input-node for every possible kind of finding and an output-node for each letter in the alphabet. When a set of findings is fed into the network, the system will match the pattern of the findings with equivalent patterns of the different letters.
Technically, the input-nodes are given a value (0 or 1). This value is transmitted to the nodes in the next layer. These nodes perform a weighted sum of the incoming values, and if this sum is greater than a certain threshold, it fires downward with the value 1. The values of the output-nodes determine the letter.
So, apart from the architecture of the network (the number of layers and the number of nodes in each layer) the weights and the thresholds determine the behaviour of the network. Weights and thresholds are set in order for the network to perform as well as possible. This is achieved by training: You have a large number of examples where both input- and output values are known. These are fed into the training algorithm of the network. This algorithm determines weights and thresholds in such a way that the distance between the set of outputs from the network and the desired sets of outputs from the examples is as small as possible.
There is nothing to prevent the use of neural networks for domains requiring the handling of uncertainty. If relations are uncertain (for example in medical diagnosis) a neural network with the proper training will be able to give the most probable diagnosis given a set of symptoms. However you will not be able to read the uncertainty of the conclusion from the network, you will not be able to get the next-most probable diagnosis and - probably the most severe set-back - you will not know under which assumptions about the domain the suggested diagnosis is the most probable.
A Bayesian network consists of a set of nodes and a set of directed edges between these nodes. Edges reflect cause-effect relations within the domain. These effects are normally not completely deterministic (e.g. disease -> symptom). The strength of an effect is modelled as a probability:
If 6) and 7) are read as 'If otherwise healthy and...then...', there also needs to be a specification of how the two causes combine. That is, we need the probability of having a fever if both symptoms are present and if the patient is completely healthy. All in all you have to specify the conditional probabilities:
Where 'whooping cough' and 'tonsillitis' each can take the states yes and no. So, you must for any node specify the strength of all combinations of states for the possible causes.
Fundamentally, Bayesian networks are used to update probabilities when information comes in. The mathematical basis for this is Bayes' theorem:
Contradictory to the methods of rule based systems, the updating method of Bayesian networks uses a global perspective, and if model and information are correct, it can be proved that the method calculates the new probabilities correctly (correctly regarding the axioms of the classical probability theory).
Any node in the network can receive information as the method doesn't distinguish between inference in or opposite to the direction of the edges. Also, simultaneous input of information into several nodes will not affect the updating algorithm.
An essential difference between rule based systems and systems based on Bayesian networks is that in rule based systems you try to model the experts way of reasoning (hence the name expert systems), while with Bayesian networks you try to model dependencies in the domain itself. The latter are often called decision support systems or normative expert systems.
The meaning of a node and its probability tables can be the subject of an external discussion, regardless of their function in the network. This does not make any sense when speaking of neural networks. Perceptrones in the hidden layers only have a meaning in the context of the functionality of the network.
This means that the construction of a Bayesian network requires detailed knowledge of the domain in question. If such knowledge can only be obtained through a series of examples neural networks seem to be an easier approach. This might be true in cases such as the reading of handwritten letters, face recognition and other areas where the activity is a 'craftsmanlike' skill based solely on experience.
It is often criticized that in order to construct a Bayesian network you have to 'know' too many probabilities. However, there is not a considerable difference between this number and the number of weights and thresholds that have to be 'known' in order to build a neural network, and these can only be learnt by training. It is an enormous weakness of neural networks that you are unable to utilize the knowledge you might have in advance.
Probabilities, on the other hand, can be assessed using a combination of theoretical insight, empiric studies independent of the constructed system, training and various more or less subjective estimates.
Finally it should be mentioned that in the construction of a neural network the route of inference is fixed. It is decided in advance, about which relations information is gathered, and which relations the system is expected to calculate. Bayesian networks are much more flexible in that respect.