11.6. twol.multialign module

multialign.py

Aligns two words or more (or morphs) by adding some zero symbols so that phonemes in corresponding positions are optimally similar.

Copyright 2020, Kimmo Koskenniemi

This is free software according to GNU GPL 3 license.

twol.multialign.accum_input_labels(fst, separator='')[source]

Encode, weight and prune a transducer

fst – transducer to be processed, input labels are strings of alphabet symbols and output labels are single alphabet symbols

separator – null string or a symbol not part of the alphabet

Returns a transducer where input labels of thrasitions are concatenations of the input label and the output label of the original transition, the weights are according to the weights of the resulting morphophonemes and all transitions with invalid morphophoneme labels are discarded.

twol.multialign.adjustment(mphon_lst)[source]

Computes an context based adjustment of a result

mphon_lst – a list of morphophonemes from the result of align_words

Returns a number to be added to the weight.

twol.multialign.align_words(word_lst, zero='Ø', extra_zeros=0, best_count=10)[source]

Aligns a list of words

word_lst – the list of words to be aligned

zero – the symbol inserted as a mark for deletion or epenthesis

extra_zeros – the maximun number of zeros to be added in the longest

words (the shorter may have more)

best_count – the maximum number of results to be returned (maybe less

if no feasible results are found)

Returns a list of tuples where each tuple consists of a weight and a list morphophonemes.

twol.multialign.init(alphabet_file_name, all_zero_weight=1)[source]

Initializes multialign by initializing the alphabet module alphabet_file_name – an alphabet definition as described in https://pytwolc.readthedocs.io/en/latest/alignment.html#alphabet

all_zero_weight – penalty for an intermediate morphophoneme of only zeros (which will get a non-zero component in the final morphophoneme)

twol.multialign.list_of_aligned_words(mphon_lst)[source]

Converts a list of morphophonemes into a list of aligned words

mphon_lst – list of same length morphophonemes, e.g.

[“lll”, “ooo”, “vvv”, “ieØ”]

Returns a list of words constructed out of the 1st, 2nd … alphabetic symbols of the morphophonemes, e.g. [“lll”, “ooo”, “vvv”, “ieØ”] –> [“lovi”, “love”, “lovØ”]

twol.multialign.main()[source]
twol.multialign.multialign(word_lst, zero='Ø', max_zeros=1, best_count=1)[source]

Aligns a list of words according to similarity of their phonemes

word_lst – a list of words (or morphs) to be aligned

zero – the symbol that will mark deletions or epentheses

max_zeros – maximum number of zeros to be inserted into the longest

word

best_count – number of results to returned

Returns a list of results each of which is a tuple of a weight and a list of words aligned by inserting zeros in an optimal way.

twol.multialign.prefer_final_zeros(raw_paths)[source]

Select the symbol pair sequence where the zeros are near the end

sym_lst_lst – a list of results, each consisting of a list

of symbols (already selected according to other criteria)

Returns a sequence of (single) symbols where the zeros occur near the end. This normalizes gemination and lengthening so that the latter component is the one which alternates with a zero.

twol.multialign.print_best(fst, num)[source]
twol.multialign.print_result(aligned_result, comments, weights, layout='horizontal')[source]

Prints the result of the alignment in one of the three formats

aligned_result – tuple of the weight and a list of aligned words where each aligned word is a list of

comments – possible comments which will be passed over

weights – whether to print also the overall weight of this alignment

layout – one of “horizontal” (a sequence of morphophonemes on a single line), “vertical” (each zero-filled word on a line of its own) or “list” (all zero-filled words on a single line)

twol.multialign.word_to_fsa_with_zeros(word, target_len, zero='Ø')[source]

Insert zeros freeley to make the word into a fst of target_lenth words

word – a string

target_len – length of the strings accepted by the result fsa

Returns a fsa that accepts all strings of length target_length where some zeros have been added to make the word be target_len long.