11.15. twol.twrule module

This module forms the rule FSTs out of the collection of component FSTs which have been compiled by twol.twparse out of the center parts, left and right contexts of the rules.

11.15.1. Compiling the two-level rules

The actual compilation starts from the tuples which contain the components as FSTs. Each rule is parsed and compiled separately using the alphabet extracted from the FST containing the examples. The method for the compilation is described in [ylijyrä2006]. The main result of the compilation is the rule FST but two other FSTs are compiled in addition:

  • rule_fst which accepts any examples which conform to the rule. The rule_fst accepts strings of symbol pairs. A rule FST ought to accept all symbol pair strings where the X part occurs in one of the contexts in the rule and/or reject thse strings where this is not the case.

  • selector_fst produces the relevant subset of the examples when intersected with examples_fst. The selector is needed only for testing that the rule accepts all relevant positive examples. The point is that not all examples are relevant for testing. If an example does not contain the center (or the input part of the center) of a rule, the rule accepts such an example anyway. The function selector_from_x(X) builds this simply by concatenating PI* X PI*.

  • scrambler_fsa which is an encoded FSA which transforms a positive example into a negative example which ought to be rejected by the rule (and is also needed only for the testing of the rule). The purpose of scrambling is to alter the output symbol in order to produce occurrences in wrong places for right-arrow rules or wrong correspondences in a context for left-arrow rules.

11.15.2. Compilation of the rule FST

There is a separate function for each type of rules: rightarrow() for => rules, output_coercion() for <= rules, input_coercion() for the new <--- type of rules, doublearrow() for <=> rules, and center_exclusion() for /<= rules. The rule FSTs are compiled using the generalized restriction method in [ylijyrä2006].

The rule formalism has a special symbol .#. for the beginning of the left context or the end of the right context. The compilation makes some effort to implement this literally so that the FSTs do not have transitions for stuff beyond the boundary symbol. The parser inserts BEGIN for a left context boundary and END symbol in the right context. Function begin_end() takes care of the cleaning the FSTs.

11.15.3. Constructing the scrambler_fsa

Let us suppose we have a rule: {ao}:o => _ {ij}: ; and an example:

k a l {ao}:a s s {}:a

In order to produce negative examples for this rules, we have to change occurrences of {ao}:o into {ao}:a at least in one place of an example, e.g.:

k a l {ao}:o s s {}:a

In order to make such changes, the original examples are encoded as FSAs where the original input and output symbols are separated by a ‘^’:

k^k a^a l^l {ao}^a s^s s^s {}^a

Such sequences can be converted with appropriate FSTs if its input symbols are pairs (such as k^k or {ao}^a). In order not to lose information, the output symbols have to be similar pairs.

11.15.4. Functions of the module

Module for compiling two-level rules out of FSTs for the components.

twol.twrule.begin_end(expr_fst)[source]

Removes everything before BEGIN and after END from expr_fst

expr_fst – an FST representing a set of pair strings str

Returns an FST which represents pair strings which are maximal substrings of the above which do not contain any BEGIN or END symbols.

twol.twrule.center_exclusion(name, x_fst, *contexts)[source]

Compiles rules like X /<= [LC1,RC1),…(LCk,RCk)]

name – name to be given to the rule FST

x_fst – the center (X) of the rule

*contents – list of contexts, i.e. pairs of left and right context

Returns a triple:

rule_fst – the compiled rule

selector_fst – FST which selects examples which are relevant for this rule

scrambler_fst – empty_fst (negative examples not relevant for these rules)

twol.twrule.context_to_condition(left_context_fst, right_context_fst)[source]

Convert one context into a condition (for the generalized restriction)

left_context_fst – the left context as an FST

right_context_fst – the right context as an FST

Returns [PI* LC ¤ PI* ¤ RC PI*] as an FST

twol.twrule.contexts_to_condition(*contexts)[source]

A list of contexsts is converted into a condition.

Each context in the list is converted separately and the result is the union of these and is returned as an FST.

twol.twrule.correct_to_incorrect(x_fst, side)[source]

used for creating negative examples for <= rules

In order to make negative examples for <= rules we need to transform the examples so that some correct input:output pairs are changed so that the output part becomes different. The computed encoded FST maps correct inputs to any possible outputs (correct or incorrect).

x_fst – the FST for the X part of the rule

side – either “input” or “output”

returns: an fst (encoded as a fsa) which maps correct examples into incorrect exs

twol.twrule.doublearrow(name, x_fst, *contexts)[source]

Compiles rules like X <=> [LC1,RC1),…(LCk,RCk)]

name – name to be given to the rule FST

x_fst – the center (X) of the rule

*contexts – list of contexts, i.e. pairs of left and right context

Returns a triple:

rule_fst – the compiled rule

selector_fst – FST which selects examples which are relevant for this rule

scrambler_fst – an encoded FST which produces negative examples

twol.twrule.e(str)[source]

Convert a two-level component expression into a FST.

str – a string containing a (two-level) regular expression Returns an FST which performs the mapping represented by str corresponding to the expression.

twol.twrule.generalized_restriction(precondition_fst, postcondition_fst)[source]

Combines the precondition FST and the postcondition FST into a rule FST.

Conditions are formed out of the center/contexts of a rule. Each condition contains exactly two diamond symbols.

Returns the generalized restriction as an FST.

twol.twrule.incorrect_to_correct(x_fst)[source]

Compute a transformation for right-arrow (=>) rules

In order to make negative examples for the => rules we need to modify the examples so that some correct occurrences of X are modified so that the output part of X becomes something else, i.e. incorrect because it is in an unexpected context.

x_fst – FST for the center part (X) of a rule

Returns: scrambler_fst – an encoded FST which maps encoded instances of X into all possible correct and incorrect pairs (where the input symbol is the same but the output symbol perhaps different).

twol.twrule.init()[source]

Initializes the module by computing several common FSTs

Assumes that twexamp.read_fst() has read in cfg.examples_fst and initialized sone symbol sets.

twol.twrule.input_coercion(name, x_fst, *contexts)[source]

Compiles rules like X <– LC1 _ RC1, …, LCk _ RCk

name – name to be given to the rule FST

x_fst – the center (X) of the rule

*contexts – list of contexts, i.e. pairs of left and right context

Returns a triple: rule_fst – the compiled rule

selector_fst – FST which selects examples which are relevant for this rule

scrambler_fst – an encoded FST which produces negative examples

twol.twrule.mix_input(x_fst)[source]

Computes an FSA that is used when creating negative examples

First, it computes an expression Y which represent all possible (correct and incorrect) inputs for the output side of X. Then, Y is transformed into an encoded FSA which can be a component of the transformation of correct examples into incorrect ones.

x_fst – the center FST (X part) of a rule Returns [PI* .o. X.l] encoded as an FSA (i.e. maps pairs to themselves)

twol.twrule.mix_output(x_fst)[source]

Computes an FSA that is used when creating negative examples

First, it computes an expression Y which represent all possible (correct and incorrect) realizations of the input side of X. Then, Y is transformed into an encoded FSA which can be a component of the transformation of correct examples into incorrect ones.

x_fst – the center FST (X part) of a rule Returns [X.u .o. PI*] encoded as an FSA (i.e. maps pairs to themselves)

twol.twrule.output_coercion(name, x_fst, *contexts)[source]

Compiles rules like X <= LC1 _ RC1, …, LCk _ RCk

name – name to be given to the rule FST

x_fst – the center (X) of the rule

*contexts – list of contexts, i.e. pairs of left and right context

Returns a triple: rule_fst – the compiled rule

selector_fst – FST which selects examples which are relevant for this rule

scrambler_fst – an encoded FST which produces negative examples

twol.twrule.quote(str)[source]

Protect ‘{’ and ‘}’ with a % in xerox regular expressions.

>>> quote("a {ij}:j [a:c | b]")
"a %{ij%}:j [a:c | b]"
twol.twrule.rightarrow(name, x_fst, *contexts)[source]

Compiles rules like X => [LC1,RC1),…(LCk,RCk)]

name – name to be given to the rule FST

x_fst – the center (X) of the rule

*contexts – list of contexts, i.e. pairs of left and right context

Returns a triple: rule_fst – the compiled rule

selector_fst – FST which selects examples which are relevant for this rule

scrambler_fst – an encoded FST which produces negative examples

twol.twrule.selector_from_x(x_fst)[source]

Compute and return [PI* X PI*]

twol.twrule.x_to_condition(x_fst)[source]

Computes and returns a condition FST out of the center FST of a rule.

x_fst – the center or X-part of a rule (located in front of the operator)

returns [PI* ¤ X ¤ PI*] where ¤ is the diamond (not in PI) as an FST