Solving problem is about exposing yourself to as many situations as possible like Find the column name which has the maximum value for each row and practice these strategies over and over. With time, it becomes second nature and a natural way you approach any problems in general. Big or small, always start with a plan, use other strategies mentioned here till you are confident and ready to code the solution.
In this post, my aim is to share an overview the topic about Find the column name which has the maximum value for each row, which can be followed any time. Take easy to follow this discuss.
I have a DataFrame like this one:
In [7]:
frame.head()
Out[7]:
Communications and Search Business General Lifestyle
0 0.745763 0.050847 0.118644 0.084746
0 0.333333 0.000000 0.583333 0.083333
0 0.617021 0.042553 0.297872 0.042553
0 0.435897 0.000000 0.410256 0.153846
0 0.358974 0.076923 0.410256 0.153846
In here, I want to ask how to get column name which has maximum value for each row, the desired output is like this:
In [7]:
frame.head()
Out[7]:
Communications and Search Business General Lifestyle Max
0 0.745763 0.050847 0.118644 0.084746 Communications
0 0.333333 0.000000 0.583333 0.083333 Business
0 0.617021 0.042553 0.297872 0.042553 Communications
0 0.435897 0.000000 0.410256 0.153846 Communications
0 0.358974 0.076923 0.410256 0.153846 Business
Answer #1:
You can use idxmax
with axis=1
to find the column with the greatest value on each row:
>>> df.idxmax(axis=1)
0 Communications
1 Business
2 Communications
3 Communications
4 Business
dtype: object
To create the new column ‘Max’, use df['Max'] = df.idxmax(axis=1)
.
To find the row index at which the maximum value occurs in each column, use df.idxmax()
(or equivalently df.idxmax(axis=0)
).
Answer #2:
And if you want to produce a column containing the name of the column with the maximum value but considering only a subset of columns then you use a variation of @ajcr’s answer:
df['Max'] = df[['Communications','Business']].idxmax(axis=1)
Answer #3:
You could apply
on dataframe and get argmax()
of each row via axis=1
In [144]: df.apply(lambda x: x.argmax(), axis=1)
Out[144]:
0 Communications
1 Business
2 Communications
3 Communications
4 Business
dtype: object
Here’s a benchmark to compare how slow apply
method is to idxmax()
for len(df) ~ 20K
In [146]: %timeit df.apply(lambda x: x.argmax(), axis=1)
1 loops, best of 3: 479 ms per loop
In [147]: %timeit df.idxmax(axis=1)
10 loops, best of 3: 47.3 ms per loop