machine learning - is this tomek link implementation faulty? -
the algorithm definition tomek links states: " suppose {e1,…,en}⊂rk dataset, each ei having 1 of 2 labels + or −. pair (ei,ej) called tomek link if ei , ej have different labels, , there not el such d(ei,el)< d(ei,ej) or d(ej,el)< d(ei,ej) ", d(x,y) distance between x , y.
i created "toy" data set understand tomek links better (code attached). used package "unbalanced" function ubtomek. function implementation (it's on github) looks nearest neighbor of minority class points, , if belong majority class - couple declared tomek link. think missing something, because checking d(ei,ej) , should check d(ej,ei).
opinions on this? if right - i'll drop developers message bug, if i'm wrong - i'll understand tomek links better.
Comments
Post a Comment