python 3.x - Conditional column selection in pandas -

i want select columns dataframe according particular condition. know can done loop, df large efficiency crucial. condition column selection having either non-nan entries or sequence of nans followed sequence of non-nan entries.

here example. consider following dataframe:

pd.dataframe([[1, np.nan, 2, np.nan], [2, np.nan, 5, np.nan], [4, 8, np.nan, 1], [3, 2, np.nan, 2], [3, 2, 5, np.nan]])     0    1    2    3 0  1  nan  2.0  nan 1  2  nan  5.0  nan 2  4  8.0  nan  1.0 3  3  2.0  nan  2.0 4  3  2.0  5.0  nan

from it, select columns 0 , 1. advice on how efficiently without looping?

logic

count nulls in each column. if nulls in beginning, number of nulls in column should equal the position of first valid index.
get first valid index
slice index null count , compare against first valid indices. if equal, thats column

cnull = df.isnull().sum() fvald = df.apply(pd.series.first_valid_index) cols = df.index[cnull] == fvald df.loc[:, cols]

edited speed improvements

old answer

def pir1(df):     cnull = df.isnull().sum()     fvald = df.apply(pd.series.first_valid_index)     cols = df.index[cnull] == fvald     return df.loc[:, cols]

much faster answer using same logic

def pir2(df):     nulls = np.isnan(df.values)     null_count = nulls.sum(0)     first_valid = nulls.argmin(0)     null_on_top = null_count == first_valid     filtered_data = df.values[:, null_on_top]     filtered_columns = df.columns.values[null_on_top]     return pd.dataframe(filtered_data, df.index, filtered_columns)

Search This Blog

WIKI

python 3.x - Conditional column selection in pandas -

Comments

Post a Comment

Popular posts from this blog

jquery - uncaught exception: DataTables Editor - remote hosting of code not allowed -

java - SSE Emitter : Manage timeouts and complete() -

java - How to resolve error - package com.squareup.okhttp3 doesn't exist? -