Solving problem is about exposing yourself to as many situations as possible like Delete column from pandas DataFrame and practice these strategies over and over. With time, it becomes second nature and a natural way you approach any problems in general. Big or small, always start with a plan, use other strategies mentioned here till you are confident and ready to code the solution.
In this post, my aim is to share an overview the topic about Delete column from pandas DataFrame, which can be followed any time. Take easy to follow this discuss.
When deleting a column in a DataFrame I use:
And this works great. Why can’t I use the following?
Since it is possible to access the column/Series as
df.column_name, I expected this to work.
As you’ve guessed, the right syntax is
It’s difficult to make
del df.column_name work simply as the result of syntactic limitations in Python.
del df[name] gets translated to
df.__delitem__(name) under the covers by Python.
The best way to do this in pandas is to use
df = df.drop('column_name', 1)
1 is the axis number (
0 for rows and
1 for columns.)
To delete the column without having to reassign
df you can do:
df.drop('column_name', axis=1, inplace=True)
Finally, to drop by column number instead of by column label, try this to delete, e.g. the 1st, 2nd and 4th columns:
df = df.drop(df.columns[[0, 1, 3]], axis=1) # df.columns is zero-based pd.Index
Also working with “text” syntax for the columns:
df.drop(['column_nameA', 'column_nameB'], axis=1, inplace=True)
Note: Introduced in v0.21.0 (October 27, 2017), the drop() method accepts index/columns keywords as an alternative to specifying the axis.
So we can now just do:
columns = ['Col1', 'Col2', ...] df.drop(columns, inplace=True, axis=1)
This will delete one or more columns in-place. Note that
inplace=True was added in pandas v0.13 and won’t work on older versions. You’d have to assign the result back in that case:
df = df.drop(columns, axis=1)
Drop by index
Delete first, second and fourth columns:
df.drop(df.columns[[0,1,3]], axis=1, inplace=True)
Delete first column:
df.drop(df.columns[], axis=1, inplace=True)
There is an optional parameter
inplace so that the original
data can be modified without creating a copy.
df = DataFrame.from_items([('A', [1, 2, 3]), ('B', [4, 5, 6]), ('C', [7,8, 9])], orient='index', columns=['one', 'two', 'three'])
one two three A 1 2 3 B 4 5 6 C 7 8 9
df.drop(df.columns[], axis=1, inplace=True)
two three A 2 3 B 5 6 C 8 9
three = df.pop('three')
two A 2 B 5 C 8
The actual question posed, missed by most answers here is:
Why can’t I use
At first we need to understand the problem, which requires us to dive into python magic methods.
As Wes points out in his answer
del df['column'] maps to the python magic method
df.__delitem__('column') which is implemented in pandas to drop the column
However, as pointed out in the link above about python magic methods:
__del__should almost never be used because of the precarious circumstances under which it is called; use it with caution!
You could argue that
del df['column_name'] should not be used or encouraged, and thereby
del df.column_name should not even be considered.
However, in theory,
del df.column_name could be implemeted to work in pandas using the magic method
__delattr__. This does however introduce certain problems, problems which the
del df['column_name'] implementation already has, but in lesser degree.
What if I define a column in a dataframe called “dtypes” or “columns”.
Then assume I want to delete these columns.
del df.dtypes would make the
__delattr__ method confused as if it should delete the “dtypes” attribute or the “dtypes” column.
Architectural questions behind this problem
- Is a dataframe a
collection of columns?
- Is a dataframe a collection of rows?
- Is a column an attribute of a dataframe?
- Yes, in all ways
- No, but if you want it to be, you can use the
- Maybe, do you want to read data? Then yes, unless the name of the attribute is already taken by another attribute belonging to the dataframe. Do you want to modify data? Then no.
You cannot do
del df.column_name because pandas has a quite wildly grown architecture that needs to be reconsidered in order for this kind of cognitive dissonance not to occur to its users.
Don’t use df.column_name, It may be pretty, but it causes cognitive dissonance
Zen of Python quotes that fits in here:
There are multiple ways of deleting a column.
There should be one– and preferably only one –obvious way to do it.
Columns are sometimes attributes but sometimes not.
Special cases aren’t special enough to break the rules.
del df.dtypes delete the dtypes attribute or the dtypes column?
In the face of ambiguity, refuse the temptation to guess.
A nice addition is the ability to drop columns only if they exist. This way you can cover more use cases, and it will only drop the existing columns from the labels passed to it:
Simply add errors=’ignore’, for example.:
df.drop(['col_name_1', 'col_name_2', ..., 'col_name_N'], inplace=True, axis=1, errors='ignore')
- This is new from pandas 0.16.1 onward. Documentation is here.
from version 0.16.1 you can do
df.drop(['column_name'], axis = 1, inplace = True, errors = 'ignore')
It’s good practice to always use the
 notation. One reason is that attribute notation (
df.column_name) does not work for numbered indices:
In : df = DataFrame([[1, 2, 3], [4, 5, 6]]) In : df Out: 0 2 1 5 Name: 1 In : df.1 File "<ipython-input-3-e4803c0d1066>", line 1 df.1 ^ SyntaxError: invalid syntax