Solving problem is about exposing yourself to as many situations as possible like Count the frequency that a value occurs in a dataframe column and practice these strategies over and over. With time, it becomes second nature and a natural way you approach any problems in general. Big or small, always start with a plan, use other strategies mentioned here till you are confident and ready to code the solution.
In this post, my aim is to share an overview the topic about Count the frequency that a value occurs in a dataframe column, which can be followed any time. Take easy to follow this discuss.
I have a dataset
category
cat a
cat b
cat a
I’d like to be able to return something like (showing unique values and frequency)
category freq
cat a 2
cat b 1
Answer #1:
Use groupby
and count
:
In [37]:
df = pd.DataFrame({'a':list('abssbab')})
df.groupby('a').count()
Out[37]:
a
a
a 2
b 3
s 2
[3 rows x 1 columns]
See the online docs: https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html
Also value_counts()
as @DSM has commented, many ways to skin a cat here
In [38]:
df['a'].value_counts()
Out[38]:
b 3
a 2
s 2
dtype: int64
If you wanted to add frequency back to the original dataframe use transform
to return an aligned index:
In [41]:
df['freq'] = df.groupby('a')['a'].transform('count')
df
Out[41]:
a freq
0 a 2
1 b 3
2 s 2
3 s 2
4 b 3
5 a 2
6 b 3
[7 rows x 2 columns]
Answer #2:
If you want to apply to all columns you can use:
df.apply(pd.value_counts)
This will apply a column based aggregation function (in this case value_counts) to each of the columns.
Answer #3:
df.category.value_counts()
This short little line of code will give you the output you want.
If your column name has spaces you can use
df['category'].value_counts()
Answer #4:
df.apply(pd.value_counts).fillna(0)
value_counts – Returns object containing counts of unique values
apply – count frequency in every column. If you set axis=1
, you get frequency in every row
fillna(0) – make output more fancy. Changed NaN to 0
Answer #5:
In 0.18.1 groupby
together with count
does not give the frequency of unique values:
>>> df
a
0 a
1 b
2 s
3 s
4 b
5 a
6 b
>>> df.groupby('a').count()
Empty DataFrame
Columns: []
Index: [a, b, s]
However, the unique values and their frequencies are easily determined using size
:
>>> df.groupby('a').size()
a
a 2
b 3
s 2
With df.a.value_counts()
sorted values (in descending order, i.e. largest value first) are returned by default.
Answer #6:
If your DataFrame has values with the same type, you can also set return_counts=True
in numpy.unique().
index, counts = np.unique(df.values,return_counts=True)
np.bincount() could be faster if your values are integers.
Answer #7:
Using list comprehension and value_counts for multiple columns in a df
[my_series[c].value_counts() for c in list(my_series.select_dtypes(include=['O']).columns)]
Answer #8:
Without any libraries, you could do this instead:
def to_frequency_table(data):
frequencytable = {}
for key in data:
if key in frequencytable:
frequencytable[key] += 1
else:
frequencytable[key] = 1
return frequencytable
Example:
to_frequency_table([1,1,1,1,2,3,4,4])
>>> {1: 4, 2: 1, 3: 1, 4: 2}