Count the frequency that a value occurs in a dataframe column

Posted on

Solving problem is about exposing yourself to as many situations as possible like Count the frequency that a value occurs in a dataframe column and practice these strategies over and over. With time, it becomes second nature and a natural way you approach any problems in general. Big or small, always start with a plan, use other strategies mentioned here till you are confident and ready to code the solution.
In this post, my aim is to share an overview the topic about Count the frequency that a value occurs in a dataframe column, which can be followed any time. Take easy to follow this discuss.

Count the frequency that a value occurs in a dataframe column

I have a dataset

category
cat a
cat b
cat a

I’d like to be able to return something like (showing unique values and frequency)

category   freq
cat a       2
cat b       1

Answer #1:

Use groupby and count:

In [37]:
df = pd.DataFrame({'a':list('abssbab')})
df.groupby('a').count()
Out[37]:
   a
a
a  2
b  3
s  2
[3 rows x 1 columns]

See the online docs: https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html

Also value_counts() as @DSM has commented, many ways to skin a cat here

In [38]:
df['a'].value_counts()
Out[38]:
b    3
a    2
s    2
dtype: int64

If you wanted to add frequency back to the original dataframe use transform to return an aligned index:

In [41]:
df['freq'] = df.groupby('a')['a'].transform('count')
df
Out[41]:
   a freq
0  a    2
1  b    3
2  s    2
3  s    2
4  b    3
5  a    2
6  b    3
[7 rows x 2 columns]
Answered By: EdChum

Answer #2:

If you want to apply to all columns you can use:

df.apply(pd.value_counts)

This will apply a column based aggregation function (in this case value_counts) to each of the columns.

Answered By: Arran Cudbard-Bell

Answer #3:

df.category.value_counts()

This short little line of code will give you the output you want.

If your column name has spaces you can use

df['category'].value_counts()
Answered By: Satyajit Dhawale

Answer #4:

df.apply(pd.value_counts).fillna(0)

value_counts – Returns object containing counts of unique values

apply – count frequency in every column. If you set axis=1, you get frequency in every row

fillna(0) – make output more fancy. Changed NaN to 0

Answered By: Roman Kazakov

Answer #5:

In 0.18.1 groupby together with count does not give the frequency of unique values:

>>> df
   a
0  a
1  b
2  s
3  s
4  b
5  a
6  b
>>> df.groupby('a').count()
Empty DataFrame
Columns: []
Index: [a, b, s]

However, the unique values and their frequencies are easily determined using size:

>>> df.groupby('a').size()
a
a    2
b    3
s    2

With df.a.value_counts() sorted values (in descending order, i.e. largest value first) are returned by default.

Answered By: Vidhya G

Answer #6:

If your DataFrame has values with the same type, you can also set return_counts=True in numpy.unique().

index, counts = np.unique(df.values,return_counts=True)

np.bincount() could be faster if your values are integers.

Answered By: user666

Answer #7:

Using list comprehension and value_counts for multiple columns in a df

[my_series[c].value_counts() for c in list(my_series.select_dtypes(include=['O']).columns)]

https://stackoverflow.com/a/28192263/786326

Answered By: Shankar ARUL

Answer #8:

Without any libraries, you could do this instead:

def to_frequency_table(data):
    frequencytable = {}
    for key in data:
        if key in frequencytable:
            frequencytable[key] += 1
        else:
            frequencytable[key] = 1
    return frequencytable

Example:

to_frequency_table([1,1,1,1,2,3,4,4])
>>> {1: 4, 2: 1, 3: 1, 4: 2}
Answered By: Timz95

Leave a Reply

Your email address will not be published. Required fields are marked *