Question :
I’m trying to select a subset of a subset of a dataframe, selecting only some columns, and filtering on the rows.
df.loc[df.a.isin(['Apple', 'Pear', 'Mango']), ['a', 'b', 'f', 'g']]
However, I’m getting the error:
Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.
What ‘s the correct way to slice and filter now?
Answer #1:
TL;DR: There is likely a typo or spelling error in the column header names.
This is a change introduced in v0.21.1
, and has been explained in the docs at length –
Previously, selecting with a list of labels, where one or more labels
were missing would always succeed, returningNaN
for missing labels.
This will now show aFutureWarning
. In the future this will raise a
KeyError
(GH15747). This warning will trigger on aDataFrame
or a
Series
for using.loc[]
or[[]]
when passing a list-of-labels with at
least 1 missing label.
For example,
df
A B C
0 7.0 NaN 8
1 3.0 3.0 5
2 8.0 1.0 7
3 NaN 0.0 3
4 8.0 2.0 7
Try some kind of slicing as you’re doing –
df.loc[df.A.gt(6), ['A', 'C']]
A C
0 7.0 8
2 8.0 7
4 8.0 7
No problem. Now, try replacing C
with a non-existent column label –
df.loc[df.A.gt(6), ['A', 'D']]
FutureWarning: Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.
A D
0 7.0 NaN
2 8.0 NaN
4 8.0 NaN
So, in your case, the error is because of the column labels you pass to loc
. Take another look at them.
Answer #2:
This error also occurs with .append
call when the list contains new columns. To avoid this
Use:
df=df.append(pd.Series({'A':i,'M':j}), ignore_index=True)
Instead of,
df=df.append([{'A':i,'M':j}], ignore_index=True)
Full error message:
C:ProgramDataAnaconda3libsite-packagespandascoreindexing.py:1472:
FutureWarning: Passing list-likes to .loc or with any missing label
will raise KeyError in the future, you can use .reindex() as an
alternative.
Answer #3:
If you want to retain the index you can pass list comprehension instead of a column list:
loan_data_inputs_train.loc[:,[i for i in List_col_without_reference_cat]]
Answer #4:
Sorry, I’m not sure that I correctly understood you, but seems that next way could be acceptable for you:
df[df['a'].isin(['Apple', 'Pear', 'Mango'])][['a', 'b', 'f', 'g']]
Snippet description:
df['a'].isin(['Apple', 'Pear', 'Mango']) # it's "filter" by data in each row in column *a*
df[['a', 'b', 'f', 'g']] # it's "column filter" that provide ability select specific columns set