The Net Language

The HUGIN System uses a special-purpose language, called the net language, for description of belief networks and influence diagrams. This language allows the user to create complete descriptions of belief networks and influence diagrams, containing specifications of the structure of the network model, the conditional probability and the utility functions, and the temporal ordering of decisions (for influence diagrams).

This chapter describes the second revision of the net language. This revision is substantially different from the first revision. The reason is that the first revision of the language used a fixed format (i.e., the semantics of the different elements were determined by their position within the description). This implied that it was impossible to extend this language in such a way that descriptions in the old language retained their meaning in the new language. This second revision has been designed with that goal in mind.

Nodes

The basic element of a belief network or an influence diagram model is the node. In ordinary belief networks, a node represents a random variable (discrete or continuous); in influence diagrams, a node may also represent a decision, controlled by the decision maker, or a utility function, which is used to assign preferences to different configurations of variables.

Example 1

The following node description is taken from the "Chest Clinic" example, the famous example from [Lauritzen&Spiegelhalter88].
    node T
    { 
        states = ("yes" "no");
        label = "Has tuberculosis?";
        position = (25 275);
    }

This describes a binary random variable named T, with states labeled "yes" and "no". The description also gives the label and position, which are used by the HUGIN Runtime system.

A node description is introduced by one of the keywords: [<prefix>] node, decision, or utility where the optional prefix on node is either discrete or continuous (omitting the prefix causes discrete to be used as default). The keywords are followed by a name that must be unique within the model. Then follows a sequence of name/value pairs of the form

<name> = <value>;

enclosed in braces.

The example shows the field names currently defined in the net language for nodes: states, label, and position. All of these fields are optional; if any field is absent, a default value is supplied instead.

Apart from these fields, you can specify your own fields for nodes. These can be used for a specific application needing some extra information about the nodes.

Example 2

In this situation the T node has being assigned the application specific field MY_APPL_my_field.
    node T
    { 
        states = ("yes" "no");
        label = "Has tuberculosis?";
        position = (25 275);
        MY_APPL_my_field = "1000";
    }

The value of such application specific fields can only be text strings encapsulated in quote characters (") (see section "Lexical Matters" for precise definition of text strings).

It would be regarded as good style to start field name with an application specific prefix to avoid confusion (in example 2 the MY_APPL prefix).

Example 3

In HUGIN Runtime some extra fields are used to save descriptions of both nodes and their states. These are the fields prefixed with HR.
    node T
    { 
        states = ("yes" "no");
        label = "Has tuberculosis?";
        position = (25 275);
        HR_State_0 = "Yes, the patient HAS tuberculosis.";
        HR_State_1 = "No, the patient has NOT tuberculosis.";
        HR_Desc = "Represents the fact that the patient has\
    tuberculosis or not.";
    }

The structure of the model

The structure (i.e., the edges of the underlying graph) is specified indirectly. We have two kinds of edges: directed and undirected edges.

Example 4

This is a typical specification of directed edges:
    potential ( A | B C ) { }

This specifies that node A has two parents: B and C. That is, there is a directed edge from B to A, and there is a directed edge from C to A.

The model may also contain undirected edges. Such a model is called a chain graph model.

Example 5

    potential ( A B | C D ) { }

This specifies that there is an undirected edge between A and B. Moreover, as usual, it specifies that both A and B have C and D as parents.

If there are no parents, the vertical bar may be omitted.

A maximal set of nodes, connected by undirected edges, is called a chain graph component.

Not all graps are permitted. The following restrictions are imposed on the structure of the network.

The graph may not contain any (directed) cycles.

Example 6

The following specification is not allowed, because of the cycle A B C A.
    potential ( B | A ) { }
    potential ( C | B ) { }
    potential ( A | C ) { }

However, the following specification is legal.

    potential ( B | A ) { }
    potential ( C | B ) { }
    potential ( C | A ) { }

Example 7

The following specification is not allowed either, since there is a cycle A B C ~ A (the edge between A and C counts as "bidirectional").
    potential ( B | A ) { }
    potential ( C | B ) { }
    potential ( A C ) { }

However, the following specification is legal.

    potential ( A | B ) { }
    potential ( C | B ) { }
    potential ( A C ) { }

Continuous chance nodes are not allowed in influence diagrams, i.e., there cannot be continuous nodes in a net also containing utility or decision nodes.

Utility nodes may not have any children in the graph. This implies that utility nodes may only appear to the left of the vertical bar (never to the right).

Undirected edges can only appear between discrete chance nodes.

Continuous nodes can only have continuous nodes as children.

If a decision node appears to the left of the vertical bar, it must appear alone. In this case, so-called informational links are specified; such links specify which variables are known when the decision is to be made. There must be a total ordering of all decisions in the influence diagram, and this ordering must follow from the network structure, i.e., there must be a directed path containing all decisions.

Example 8

Assume we want to specify an influence diagram with two decisions, D1 and D2, and with three discrete chance variables, A, B, and C. First, A is observed; then, decision D1 is made; then, B is observed; finally, decision D2 is made. This sequence of events can be specified as follows:
    potential ( D1 | A) { }
    potential ( D2 | D1 B ) { }

Finally, no node may be referenced in any potential-specification before it has been declared by a node-, decision-, or a utility-specification.

Potentials

We also need to specify the quantitative part of the model. This part consists of conditional probability functions for random variables and the values a utility function may assume. Thus, we distinguish between discrete probability, continuous probability, and utility potentials.

All types of potentials are different in the numerical specification between the braces of the potential-specification.

Example 9

The following description is taken from the "Chest Clinic" example and specifies the conditional probability table of the discrete variable T.
    potential ( T | A )
    {
        data = (( 0.05 0.95 )          %  A=yes
                ( 0.01 0.99 ));        %  A=no
    }

This specifies that the probability of tuberculosis given a trip to Asia is 5 %, whereas it is only 1 % if the subject has not been to Asia. The data field may also be specified as an unstructured list of numbers.

    potential ( T | A )
    {
        data = ( 0.05 0.95           %  A=yes
                 0.01 0.99 );        %  A=no
    }

As the example shows, the numerical data is specified through the data field of a potential-specification. This data has the form of a list of real numbers. The structure of the list must either correspond to that of a multi-dimensional table with node list comprised of the parent nodes followed by the child nodes, or it must be a flat list with no structure at all. The "layout" of the data list is row-major (see section "Row-major Representation").

Example 10

    potential ( D E F | A B C ) { }

The data field of this potential-specification corresponds to a multi-dimensional table with dimension list <A, B, C, D, E, F>.

The data field of a utility potential has only the dimension of the nodes on the right side of the vertical bar.

Example 11

The following description is taken from the "Oil Wildcatter" example and shows a utility potential. Drillpay is a utility node while Oil is a discrete chance node with three states and Drill is a decision node with two states.
    potential (Drillpay | Oil Drill)
    {
        data = (( -70 0 )         %  dr
                ( 50 0 )          %  wt
                ( 200 0 ));       %  sk
    }

The data field of this potential-specification corresponds to a multi-dimensional table with dimension list < Oil, Drill>.

The table in the data field of a continuous probability potential has the dimensions of the discrete chance nodes to the right of the vertical bar. All the discrete chance nodes must be listed first on the right side of the vertical bar (then follows the continuous nodes). However, the items in the multi-dimensional table are no longer values but instead continuous distribution functions. Currently, only Gauss normal distribution can be used. A normal distribution can be specified by its mean and variance. In the following example, a continuous probability potential is described.

Example 12

Suppose A is a continuous node with parents B and C which are both discrete. Also, both B and C have two states: B has states b1 and b2 while C has states c1 and c2.
    potential (A | B C)
    {
        data = (( normal ( 0, 2 )       %  b1  c1
                  normal ( 3, 2 ) )     %  b1  c2
                ( normal ( 1, 2 )       %  b2  c1
                  normal ( 2, 2 ) ));   %  b2  c2
    }

The data field of this potential-specification is a table with the dimension list < B, C>. Each entry contains a probability distribution for the continuous node A.

All entries in the above example contains a Gauss normal distribution (the only continuous distribution currently available). A normal distribution is specified with the keyword normal followed by a list of two parameters. The first parameter is the mean and the second is the variance of the normal distribution.

Example 13

In this example, suppose A is a continuous node with one discrete parent B and one continuous parent C. B has two states b1 and b2 and C has a normal distribution.
    potential (A | B C)
    {
        data = ( normal ( 1 + C, 2 )            %  b1
                 normal ( 1 + 2 * C, 2 ) );     %  b2
    }

The data field of this potential-specification is a table with the dimension list <B> (B is the only discrete parent which is then listed first on the right side of the vertical bar). Each entry again contains a continuous distribution function for A. The influence of C on A now comes from the use of C in an expression specifying the mean parameter of the normal distributions.

Only the mean parameter of a normal distribution can be specified as a an expression. The variance parameter must be a numeric constant. The operators allowed in the expression are +, -, and * (addition, subtraction, and multiplication).

Since a decision node has no function assigned, it cannot have a data field. Thus, the decision potential specification does not really specify a potential but is rather a trick for specification of informational links.

If the data field is omitted from a potential-specification, a list of ones is supplied for discrete probability potentials, whereas a list of zeros is supplied for utility potentials. For a continuous probability potential, a list of normal distributions with both mean and variance set to 0 is supplied.

The values of the data field of discrete probability potentials may only contain non-negative numbers. In the specification of a normal distribution for a continuous probability potential, only non-negative numbers are allowed for the variance parameter. There is no such restriction on the values of utility potentials or the mean parameter of a normal distribution.

Global information

Information pertaining to the belief network or influence diagram model as a whole can be specified at the beginning of the description, initiated by the keyword net.

Example 14

    net
    {
        node_size = (100 40);
    }

This specifies that nodes are drawn on the display with width 100 and height 40. This information is used by HUGIN Runtime.

Currently, only the node_size field has been defined for net-specifications. However, as with nodes, you can add all the additional fields you want.

Example 15

    net
    {
        node_size = (100 40);
        MY_APPL_my_field = "1000";
    }

This specification has an application specific field named MY_APPL_my_field.

Example 16

The newest version of HUGIN Runtime uses a series of application specific fields. Some of them are shown here:
    net
    {
        node_size = (80 40);
        HR_Grid_X = "10";
        HR_Grid_Y = "10";
        HR_Grid_GridSnap = "1";
        HR_Grid_GridShow = "0";
        HR_Font_Name = "Arial";
        HR_Font_Size = "-12";
        HR_Font_Weight = "400";
        HR_Font_Italic = "0";
        HR_Propagate_Auto = "0";
    }

HUGIN Runtime uses the prefix HR on all fields.

Lexical matters

A name has the same structure as an identifier in the C programming language. This means that a name is a non-empty sequence of letters and digits, beginning with a letter. In this context, the underscore character ( _ ) is considered a letter. The case of letters is significant. The sequence of letters and digits forming a name extends as far as possible; it is terminated by the first non-letter/digit character (for example, braces or whitespace).

A string is a sequence of characters not containing a quote character ( " ) or a newline character; its start and ending are indicated by quote characters.

A number is comprised of an optional sign, followed by a sequence of digits, possibly containing a decimal point character, and an optional exponent field containing an 'E' or 'e' followed by a possibly signed integer.

Comments can be placed in a net description anywhere (except within a name, a number, or other multi-character lexical elements). It is considered equivalent to whitespace. A comment is introduced by a percent character ( % ) and extends to the end of the line.

Row-major Representation

This section describes the row-major "layout" of a table. If you do not have interest in this particular subject, you can just ignore it.

To find a value corresponding to a specific configuration in the row-major representation of a table, we index the values from 0 to s-1 where s is the number of values in the list. Suppose that the corresponding list of nodes is (A1, A2,..., An) and that node Ai has si states indexed from 0 to si-1. Then,

What we want is the index x of a configuration (a1, a2,..., an). Now, suppose that the state index of ai is ji (0 ji si-1). Then, x can be calculated as

- where


Back