python - Using fillna() selectively in pandas -

i fill n/a values in dataframe in selective manner. in particular, if there sequence of consequetive nans within column, want them filled preceeding non-nan value, if length of nan sequence below specified threshold. example, if threshold 3 within-column sequence of 3 or less filled preceeding non-nan value, whereas sequence of 4 or more nans left is.

that is, if input dataframe is

    2   5   4     nan nan nan     nan nan nan     5   nan nan     9   3   nan     7   9   1

i want output be:

    2   5   4     2   5   nan     2   5   nan     5   5   nan     9   3   nan     7   9   1

the fillna function, when applied dataframe, has method , limit options. these unfortunately not sufficient acheive task. tried specify method='ffill' , limit=3, fills in first 3 nans of sequence, not selectively described above.

i suppose can coded going column column conditional statements, suspect there must more pythonic. suggestinos on efficient way acheive this?

working contiguous groups still little awkward in pandas.. or @ least don't know of slick way this, isn't @ same thing. :-)

one way want use compare-cumsum-groupby pattern:

in [68]: nulls = df.isnull()     ...: groups = (nulls != nulls.shift()).cumsum()     ...: to_fill = groups.apply(lambda x: x.groupby(x).transform(len) <= 3)     ...: df.where(~to_fill, df.ffill())     ...:  out[68]:       0    1    2 0  2.0  5.0  4.0 1  2.0  5.0  nan 2  2.0  5.0  nan 3  5.0  5.0  nan 4  9.0  3.0  nan 5  7.0  9.0  1.0

okay, alternative don't because it's tricky:

def method_2(df):     nulls = df.isnull()     filled = df.ffill(limit=3)     unfilled = nulls & (~filled.notnull())     nf = nulls.replace({false: 2.0, true: np.nan})     do_not_fill = nf.combine_first(unfilled.replace(false, np.nan)).bfill() == 1     return df.where(do_not_fill, df.ffill())

this doesn't use groupby tools , should faster. note different approach manually (using shifts) determine elements filled because they're group of length 1, 2, or 3.

Search This Blog

WIKI

python - Using fillna() selectively in pandas -

Comments

Post a Comment

Popular posts from this blog

java - SSE Emitter : Manage timeouts and complete() -

jquery - uncaught exception: DataTables Editor - remote hosting of code not allowed -

java - How to resolve error - package com.squareup.okhttp3 doesn't exist? -