11.12. twol.twexamp module

A module for reading two-level examples

The examples are assumed to be as space-separated one-level representation and they are compiled into a single automaton. At the same time, the alphabet used in the examples is collected in several forms.

cfg.examples_fst – the transducer which accepts exactly the examples

cfg.symbol_pair_set – a tuple of string pairs suitable for e.g. hfst.rules.restriction

twol.twexamp.main()[source]

The twexamp.py module can also be used as a standalone script or command in order to convert examples in pair string format into a finite-state transducer (FST). Examples in pair string format are plain human readable text files, one example per line, where each example is give as a space-separated sequence of pair symbols, e.g.:

k a u p {}:Ø {ao}:a s s {}:a

The invocation of the program could be e.g.:

$ twol-examp examples.pstr examples.fst
twol.twexamp.pairs_to_fst(pair_set)[source]

Converts a seq of symbol pairs into a fst that accepts any of them

twol.twexamp.read_examples(filename_lst=['test.pstr'], build_fsts=True)[source]

Reads the examples from files whose names are ‘filename_lst’.

The file must contain one example per line and each line consists of a space separated sequence of pair-symbols. The examples are processed to a FST which is a union of all examples.

twol.twexamp.read_fst(filename='examples.fst')[source]

Reads in a previously stored example FST file