Априорное знание

Arguments constraints and skipping useless hypotheses

Background knowledge fulfills a variety of functions in the data mining process. One of the most important is reducing the number of hypotheses to be tested to speed up learning and make this process tractable. There are several approaches to reduce the size of the hypothesis space. Below two of them are presented. They use constraints on arguments of predicates from background knowledge B. The difference is that the first approach uses constraints on a single argument and the second one uses constraints on several arguments of a predicate defined in B.

The first approach called typing approach is based on information about individual data types of arguments of a predicate. For instance, suppose only an integer can be the first argument of predicate P and only the date (M/D/Y) can be the second argument of this predicate. It would be wasteful to test hypotheses with the following typing P(date, integer), P(integer, integer) and P(date, date). The only one correct type here is P(integer, date).

The second approach is called inter-argument constraints approach. For example, predicate Equal(x,x) is always true if both arguments are the same. Similarly, it is possible that for some predicate P for all x P(x,x)=0. Therefore, testing hypotheses extended by adding Equal(x,x) or P(x,x) should be avoided and the size of the hypothesis space explored can be reduced.

The value of inter-argument constraints is illustrated by the experimental fact that the FOCL algorithm, using typing and inter-argument constraints, was able to test 2.3 times less literals and examples than using only typing. [Pazzani, Kibler, 1992]. Table 4.12 summarizes properties of the two discussed approaches for reducing the number of hypotheses.

Approaches for reducing hypothesis space

Initial rules and improving search of hypotheses

This section considers another useful sort of background knowledge - a (possibly incorrect) partial initial rule that approximates the concept (rule) to be learned. There are two basic forms of this initial rule:

- extensional form;
- intensional form.

If a predicate is defined by other predicates, we say the definition is intensional. Otherwise, a predicate given by example is called extensional. It is also possible that background knowledge B contains a predicate in a mixed way partially by examples and partially by other predicates. In general, background knowledge presented in a mixed way reduces the search. [Passani, Kibler, 1992].

Learning using initial extensional rule. An expert or another learning system can provide an initial extensional rule [Widmer,1990]. Then this rule (initial concept) is refined by adding clauses [Passani,Kibler, 1992]:

1. An algorithm computes the criterion of optimality (usually information gain) of each clause in the initial concept;
2. The literal (or conjunction of literals) with the maximum gain is added to the end of the current clause (start clause can be null);
3. If the current clause covers some negative tuples (examples), additional literals are added to rule out the negative tuples.

Learning using initial intensional rules. Next, consider domain knowledge defined in terms of extensional and intensional initial predicates. Systems such as CIGOL [Muggleton & Buntine, 1988] make use of (or invent) background knowledge of this form. For example, if an extensional definition of the predicate GrowingStock(x,y,z) is not given, it could be defined in terms of the intensional predicate GreaterPrice by:

GrowingStock(x,y,z) <- GreaterPrice(x,y), GreaterPrice(y,z),

where x, y, and z are prices of the stock for days t, t+1, and t+2, respectively.

It is possible that the intensional predicate GrowingStock(x,y,z) added to the hypothesis improves it, but each of predicates GreaterPrice(x,y) and GreaterPrice(y,z) does not improve the hypothesis. Therefore, common search methods may not discover a valuable stock regularity.

Pazzani and Kibler [1992] suggested that if the literal with the maximum of the optimality criterion (gain) is intensional, then the literal is made extensional and the extensional definition is added to the clause under construction. Table 4.13 shows this idea more specifically. The process is called operationalization

Note that computation of the optimality criterion, which guides the search, is different for extensional and intensional predicates. For intensional predicates it is usually involves a Prolog proof.

Table Operationalization [Pazzani, Kibler, 1992]

Potentially operationalization can generate very long rules, similarly to large decision trees discussed in Chapter 3, when this extensional approach was illustrated versus short intensional formulas.

Table Partial background knowledge for stock market

Learning using initial intensional and extensional rules. The previous consideration has shown that adding background knowledge can increase the ability of algorithms to find solutions. Table 4.14 shows an example of a partial background knowledge for a stock market forecast. It consists of:

- a definition of the target predicate UP(x,y,w,z) with four arguments to be learned;
- typing information about x,y,w and z;
- intensional predicate Q(x,y,w) with three arguments to be used for discovering predicate Up(x,y,w,z), and
- extensional predicates Monday(t) and Tuesday(t) to be used for discovering Up(x,y,w,z).

In addition, following table provides an initial intensional rule for the target concept Up(x,y,w,z)

Table Intensional initial rule for the target concept

This rule assumes that if growth was accelerated from date t to t+2 then the stock will grow further on date t+3.

Background knowledge is called extended background knowledge if it includes:

- extensional knowledge (training examples and extensional predicates);
- initial rules;
- intensional target concept definition.

Pazzani and Kibler [1992] found in experiments that extended background knowledge with a correct intensional target definition avoids exhaustive testing every variable of every predicate and increases the speed of the search. In their experiments, a correct extensional definition of the target concept was found by testing only 2.35% of literals needed for rule discovery if the target concept is not provided. However, the same research has shown that extended background knowledge

- can increase the search space;
- can decrease the accuracy of the resulting hypothesis, if the background knowledge is partially irrelevant to the task;
- can increase the number of training examples required to achieve a given accuracy.

These observations show the need for balancing initial intensional and extensional predicates in background knowledge. One of them can be more accurate and can speed up the search for regularity more than other. Therefore, the following procedure will be more efficient:

1) Compare accuracy of intensional and extensional knowledge;
2) Include a more accurate one in the background knowledge;
3) Discover regularities using the most accurate background knowledge from 2);
4) Discover regularities using all background knowledge.

The modification of this mechanism includes use of probabilities assigned to all types of background knowledge.
There are several ways to combine extensional and intensional knowledge in discovering regularities. One of them is converting initial rules (predicates) into extensional form (operationalize a clause) if it has positive information gain.

The extensional predicates are compared to the induced literal with the maximum information gain. This approach is used in the FOIL algorithm

In an explanation -based learning approach, the target concept is assumed a correct, intensional definition of the concept to be learned and the domain knowledge is assumed correct as well. An approach that is more realistic is implemented in algorithms such as FOCL and MMDR. These methods relax the assumption that the target concept and the domain knowledge are correct.