11.6. twol.multialign module¶
multialign.py
Aligns two words or more (or morphs) by adding some zero symbols so that phonemes in corresponding positions are optimally similar.
Copyright 2020, Kimmo Koskenniemi
This is free software according to GNU GPL 3 license.
- twol.multialign.accum_input_labels(fst, separator='')[source]¶
Encode, weight and prune a transducer
fst – transducer to be processed, input labels are strings of alphabet symbols and output labels are single alphabet symbols
separator – null string or a symbol not part of the alphabet
Returns a transducer where input labels of thrasitions are concatenations of the input label and the output label of the original transition, the weights are according to the weights of the resulting morphophonemes and all transitions with invalid morphophoneme labels are discarded.
- twol.multialign.adjustment(mphon_lst)[source]¶
Computes an context based adjustment of a result
mphon_lst – a list of morphophonemes from the result of align_words
Returns a number to be added to the weight.
- twol.multialign.align_words(word_lst, zero='Ø', extra_zeros=0, best_count=10)[source]¶
Aligns a list of words
word_lst – the list of words to be aligned
zero – the symbol inserted as a mark for deletion or epenthesis
- extra_zeros – the maximun number of zeros to be added in the longest
words (the shorter may have more)
- best_count – the maximum number of results to be returned (maybe less
if no feasible results are found)
Returns a list of tuples where each tuple consists of a weight and a list morphophonemes.
- twol.multialign.init(alphabet_file_name, all_zero_weight=1)[source]¶
Initializes multialign by initializing the alphabet module alphabet_file_name – an alphabet definition as described in https://pytwolc.readthedocs.io/en/latest/alignment.html#alphabet
all_zero_weight – penalty for an intermediate morphophoneme of only zeros (which will get a non-zero component in the final morphophoneme)
- twol.multialign.list_of_aligned_words(mphon_lst)[source]¶
Converts a list of morphophonemes into a list of aligned words
- mphon_lst – list of same length morphophonemes, e.g.
[“lll”, “ooo”, “vvv”, “ieØ”]
Returns a list of words constructed out of the 1st, 2nd … alphabetic symbols of the morphophonemes, e.g. [“lll”, “ooo”, “vvv”, “ieØ”] –> [“lovi”, “love”, “lovØ”]
- twol.multialign.multialign(word_lst, zero='Ø', max_zeros=1, best_count=1)[source]¶
Aligns a list of words according to similarity of their phonemes
word_lst – a list of words (or morphs) to be aligned
zero – the symbol that will mark deletions or epentheses
- max_zeros – maximum number of zeros to be inserted into the longest
word
best_count – number of results to returned
Returns a list of results each of which is a tuple of a weight and a list of words aligned by inserting zeros in an optimal way.
- twol.multialign.prefer_final_zeros(raw_paths)[source]¶
Select the symbol pair sequence where the zeros are near the end
- sym_lst_lst – a list of results, each consisting of a list
of symbols (already selected according to other criteria)
Returns a sequence of (single) symbols where the zeros occur near the end. This normalizes gemination and lengthening so that the latter component is the one which alternates with a zero.
- twol.multialign.print_result(aligned_result, comments, weights, layout='horizontal')[source]¶
Prints the result of the alignment in one of the three formats
aligned_result – tuple of the weight and a list of aligned words where each aligned word is a list of
comments – possible comments which will be passed over
weights – whether to print also the overall weight of this alignment
layout – one of “horizontal” (a sequence of morphophonemes on a single line), “vertical” (each zero-filled word on a line of its own) or “list” (all zero-filled words on a single line)
- twol.multialign.word_to_fsa_with_zeros(word, target_len, zero='Ø')[source]¶
Insert zeros freeley to make the word into a fst of target_lenth words
word – a string
target_len – length of the strings accepted by the result fsa
Returns a fsa that accepts all strings of length target_length where some zeros have been added to make the word be target_len long.