python - Using fillna() selectively in pandas -
i fill n/a values in dataframe in selective manner. in particular, if there sequence of consequetive nans within column, want them filled preceeding non-nan value, if length of nan sequence below specified threshold. example, if threshold 3 within-column sequence of 3 or less filled preceeding non-nan value, whereas sequence of 4 or more nans left is.
that is, if input dataframe is
2 5 4 nan nan nan nan nan nan 5 nan nan 9 3 nan 7 9 1
i want output be:
2 5 4 2 5 nan 2 5 nan 5 5 nan 9 3 nan 7 9 1
the fillna
function, when applied dataframe, has method , limit options. these unfortunately not sufficient acheive task. tried specify method='ffill'
, limit=3
, fills in first 3 nans of sequence, not selectively described above.
i suppose can coded going column column conditional statements, suspect there must more pythonic. suggestinos on efficient way acheive this?
working contiguous groups still little awkward in pandas.. or @ least don't know of slick way this, isn't @ same thing. :-)
one way want use compare-cumsum-groupby pattern:
in [68]: nulls = df.isnull() ...: groups = (nulls != nulls.shift()).cumsum() ...: to_fill = groups.apply(lambda x: x.groupby(x).transform(len) <= 3) ...: df.where(~to_fill, df.ffill()) ...: out[68]: 0 1 2 0 2.0 5.0 4.0 1 2.0 5.0 nan 2 2.0 5.0 nan 3 5.0 5.0 nan 4 9.0 3.0 nan 5 7.0 9.0 1.0
okay, alternative don't because it's tricky:
def method_2(df): nulls = df.isnull() filled = df.ffill(limit=3) unfilled = nulls & (~filled.notnull()) nf = nulls.replace({false: 2.0, true: np.nan}) do_not_fill = nf.combine_first(unfilled.replace(false, np.nan)).bfill() == 1 return df.where(do_not_fill, df.ffill())
this doesn't use groupby
tools , should faster. note different approach manually (using shifts) determine elements filled because they're group of length 1, 2, or 3.
Comments
Post a Comment