python - Calculate PMI values using a given context window -
given following basis:
basis = "each word of text converted follows: move consonant (or consonant cluster) appears @ start of word end, append ay."
and following words:
words = "word, text, bank, tree"
how can calculate pmi-values of each word in "words" compared each word in "basis", can use context window size 5 (that 2 positions before , 2 after target word)?
i know how calculate pmi, don't know how handle fact of context window.
i calculate 'normal' pmi-values follows:
def pmi(contingencytable): (a,b,c,d,n) = contingencytable # avoid log(0) += 1 b += 1 c += 1 d += 1 n += 4 r_1 = + b c_1 = + c return log(float(a)/(float(r_1)*float(c_1))*float(n),2)
i did little searching on pmi, looks heavy duty packages out there, "windowing" included
in pmi "mutual" seems refer joint probability of 2 different words need firm idea respect problem statement
i took on smaller problem of generating short windowed lists in problem statement own exercise
def wndw(wrd_l, m_l, pre, post): """ returns list of lists of sequential words in input wrd_l within range -pre , +post of word in wrd_l matches word in m_l wrd_l = list of words m_l = list of words match on pre, post = ints giving range of indices include in window size """ wndw_l = list() i, w in enumerate(wrd_l): if w in m_l: wndw_l.append([wrd_l[i + k] k in range(-pre, post + 1) if 0 <= (i + k ) < len(wrd_l)]) return wndw_l basis = """each word of text converted follows: move consonant (or consonant cluster) appears @ start of word end, append ay.""" words = "word, text, bank, tree" print(*wndw(basis.split(), [x.strip() x in words.split(',')], 2, 2), sep="\n") ['each', 'word', 'of', 'the'] ['of', 'the', 'text', 'is', 'converted'] ['of', 'the', 'word', 'to', 'the']
Comments
Post a Comment