# Count the frequency that a value occurs in a dataframe column

Posted on

Solving problem is about exposing yourself to as many situations as possible like Count the frequency that a value occurs in a dataframe column and practice these strategies over and over. With time, it becomes second nature and a natural way you approach any problems in general. Big or small, always start with a plan, use other strategies mentioned here till you are confident and ready to code the solution.
In this post, my aim is to share an overview the topic about Count the frequency that a value occurs in a dataframe column, which can be followed any time. Take easy to follow this discuss.

Count the frequency that a value occurs in a dataframe column

I have a dataset

``````category
cat a
cat b
cat a
``````

I’d like to be able to return something like (showing unique values and frequency)

``````category   freq
cat a       2
cat b       1
``````

Use `groupby` and `count`:

``````In [37]:
df = pd.DataFrame({'a':list('abssbab')})
df.groupby('a').count()
Out[37]:
a
a
a  2
b  3
s  2
[3 rows x 1 columns]
``````

See the online docs: https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html

Also `value_counts()` as @DSM has commented, many ways to skin a cat here

``````In [38]:
df['a'].value_counts()
Out[38]:
b    3
a    2
s    2
dtype: int64
``````

If you wanted to add frequency back to the original dataframe use `transform` to return an aligned index:

``````In [41]:
df['freq'] = df.groupby('a')['a'].transform('count')
df
Out[41]:
a freq
0  a    2
1  b    3
2  s    2
3  s    2
4  b    3
5  a    2
6  b    3
[7 rows x 2 columns]
``````

If you want to apply to all columns you can use:

``````df.apply(pd.value_counts)
``````

This will apply a column based aggregation function (in this case value_counts) to each of the columns.

``````df.category.value_counts()
``````

This short little line of code will give you the output you want.

If your column name has spaces you can use

``````df['category'].value_counts()
``````

``````df.apply(pd.value_counts).fillna(0)
``````

value_counts – Returns object containing counts of unique values

apply – count frequency in every column. If you set `axis=1`, you get frequency in every row

fillna(0) – make output more fancy. Changed NaN to 0

In 0.18.1 `groupby` together with `count` does not give the frequency of unique values:

``````>>> df
a
0  a
1  b
2  s
3  s
4  b
5  a
6  b
>>> df.groupby('a').count()
Empty DataFrame
Columns: []
Index: [a, b, s]
``````

However, the unique values and their frequencies are easily determined using `size`:

``````>>> df.groupby('a').size()
a
a    2
b    3
s    2
``````

With `df.a.value_counts()` sorted values (in descending order, i.e. largest value first) are returned by default.

If your DataFrame has values with the same type, you can also set `return_counts=True` in numpy.unique().

```index, counts = np.unique(df.values,return_counts=True) ```

np.bincount() could be faster if your values are integers.

Using list comprehension and value_counts for multiple columns in a df

``````[my_series[c].value_counts() for c in list(my_series.select_dtypes(include=['O']).columns)]
``````

https://stackoverflow.com/a/28192263/786326

Without any libraries, you could do this instead:

``````def to_frequency_table(data):
frequencytable = {}
for key in data:
if key in frequencytable:
frequencytable[key] += 1
else:
frequencytable[key] = 1
return frequencytable
``````

Example:

``````to_frequency_table([1,1,1,1,2,3,4,4])
>>> {1: 4, 2: 1, 3: 1, 4: 2}
``````