python - Calculate PMI values using a given context window -


given following basis:

basis = "each word of text converted follows: move consonant (or consonant cluster) appears @ start of word end, append ay." 

and following words:

words = "word, text, bank, tree" 

how can calculate pmi-values of each word in "words" compared each word in "basis", can use context window size 5 (that 2 positions before , 2 after target word)?

i know how calculate pmi, don't know how handle fact of context window.

i calculate 'normal' pmi-values follows:

def pmi(contingencytable):     (a,b,c,d,n) = contingencytable     # avoid log(0)     += 1     b += 1     c += 1     d += 1     n += 4      r_1 = + b     c_1 = + c      return log(float(a)/(float(r_1)*float(c_1))*float(n),2) 

i did little searching on pmi, looks heavy duty packages out there, "windowing" included

in pmi "mutual" seems refer joint probability of 2 different words need firm idea respect problem statement

i took on smaller problem of generating short windowed lists in problem statement own exercise

def wndw(wrd_l, m_l, pre, post):     """     returns list of lists of sequential words in input wrd_l     within range -pre , +post of word in wrd_l matches     word in m_l      wrd_l      = list of words     m_l        = list of words match on     pre, post  = ints giving range of indices include in window size           """     wndw_l = list()     i, w in enumerate(wrd_l):         if w in m_l:            wndw_l.append([wrd_l[i + k] k in range(-pre, post + 1)                                            if 0 <= (i + k ) < len(wrd_l)])     return wndw_l  basis = """each word of text converted follows: move              consonant (or consonant cluster) appears @ start              of word end, append ay."""  words = "word, text, bank, tree"  print(*wndw(basis.split(), [x.strip() x in words.split(',')], 2, 2),       sep="\n") ['each', 'word', 'of', 'the'] ['of', 'the', 'text', 'is', 'converted'] ['of', 'the', 'word', 'to', 'the'] 

Comments

Popular posts from this blog

java - SSE Emitter : Manage timeouts and complete() -

jquery - uncaught exception: DataTables Editor - remote hosting of code not allowed -

java - How to resolve error - package com.squareup.okhttp3 doesn't exist? -