pandas - Using scalar values in series as variables in user defined function -


i want define function applied element wise each row in dataframe, comparing each element scalar value in separate series. started function below.

def greater_than(array, value):            g = array[array >= value].count(axis=1)            return g 

but applying mask along axis 0 , need apply along axis 1. can do?

e.g.

in [3]: df = pd.dataframe(np.arange(16).reshape(4,4))  in [4]: df out[4]:     0   1   2   3 0   0   1   2   3 1   4   5   6   7 2   8   9  10  11 3  12  13  14  15  in [26]: s out[26]: array([   1, 1000, 1000, 1000])  in [25]: greater_than(df,s) out[25]: 0    0 1    1 2    1 3    1 dtype: int64  in [27]: g = df[df >= s]  in [28]: g out[28]:       0   1   2   3 0   nan nan nan nan 1   4.0 nan nan nan 2   8.0 nan nan nan 3  12.0 nan nan nan 

the result should like:

in [29]: greater_than(df,s) out[29]: 0    3 1    0 2    0 3    0 dtype: int64 

as 1,2, & 3 >= 1 , none of remaining values greater or equal 1000.

your best bet may transposes (no copies made, if that's concern)

in [164]: df = pd.dataframe(np.arange(16).reshape(4,4))  in [165]: s = np.array([   1, 1000, 1000, 1000])  in [171]: df.t[(df.t>=s)].t out[171]:      0    1    2    3 0 nan  1.0  2.0  3.0 1 nan  nan  nan  nan 2 nan  nan  nan  nan 3 nan  nan  nan  nan  in [172]: df.t[(df.t>=s)].t.count(axis=1) out[172]:  0    3 1    0 2    0 3    0 dtype: int64 

you can sum mask directly, if count you're after.

in [173]: (df.t>=s).sum(axis=0) out[173]:  0    3 1    0 2    0 3    0 dtype: int64 

Comments

Popular posts from this blog

java - SSE Emitter : Manage timeouts and complete() -

jquery - uncaught exception: DataTables Editor - remote hosting of code not allowed -

java - How to resolve error - package com.squareup.okhttp3 doesn't exist? -