Color by Column Values in Matplotlib

Posted on

Question :

Color by Column Values in Matplotlib

One of my favorite aspects of using the ggplot2 library in R is the ability to easily specify aesthetics. I can quickly make a scatterplot and apply color associated with a specific column and I would love to be able to do this with python/pandas/matplotlib. I’m wondering if there are there any convenience functions that people use to map colors to values using pandas dataframes and Matplotlib?

##ggplot scatterplot example with R dataframe, `df`, colored by col3
ggplot(data = df, aes(x=col1, y=col2, color=col3)) + geom_point()

##ideal situation with pandas dataframe, 'df', where colors are chosen by col3
df.plot(x=col1,y=col2,color=col3)

EDIT:
Thank you for your responses but I want to include a sample dataframe to clarify what I am asking. Two columns contain numerical data and the third is a categorical variable. The script I am thinking of will assign colors based on this value.

import pandas as pd
df = pd.DataFrame({'Height':np.random.normal(10),
                   'Weight':np.random.normal(10),
                   'Gender': ["Male","Male","Male","Male","Male",
                              "Female","Female","Female","Female","Female"]})
Asked By: zach

||

Answer #1:

Update October 2015

Seaborn handles this use-case splendidly:

import numpy 
import pandas
from  matplotlib import pyplot
import seaborn
seaborn.set(style='ticks')

numpy.random.seed(0)
N = 37
_genders= ['Female', 'Male', 'Non-binary', 'No Response']
df = pandas.DataFrame({
    'Height (cm)': numpy.random.uniform(low=130, high=200, size=N),
    'Weight (kg)': numpy.random.uniform(low=30, high=100, size=N),
    'Gender': numpy.random.choice(_genders, size=N)
})

fg = seaborn.FacetGrid(data=df, hue='Gender', hue_order=_genders, aspect=1.61)
fg.map(pyplot.scatter, 'Weight (kg)', 'Height (cm)').add_legend()

Which immediately outputs:

enter image description here

Old Answer

In this case, I would use matplotlib directly.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

def dfScatter(df, xcol='Height', ycol='Weight', catcol='Gender'):
    fig, ax = plt.subplots()
    categories = np.unique(df[catcol])
    colors = np.linspace(0, 1, len(categories))
    colordict = dict(zip(categories, colors))  

    df["Color"] = df[catcol].apply(lambda x: colordict[x])
    ax.scatter(df[xcol], df[ycol], c=df.Color)
    return fig

if 1:
    df = pd.DataFrame({'Height':np.random.normal(size=10),
                       'Weight':np.random.normal(size=10),
                       'Gender': ["Male","Male","Unknown","Male","Male",
                                  "Female","Did not respond","Unknown","Female","Female"]})    
    fig = dfScatter(df)
    fig.savefig('fig1.png')

And that gives me:

scalle plot with categorized colors
As far as I know, that color column can be any matplotlib compatible color (RBGA tuples, HTML names, hex values, etc).

I’m having trouble getting anything but numerical values to work with the colormaps.

Answered By: Paul H

Answer #2:

Actually you could use ggplot for python:

from ggplot import *
import numpy as np
import pandas as pd

df = pd.DataFrame({'Height':np.random.randn(10),
                   'Weight':np.random.randn(10),
                   'Gender': ["Male","Male","Male","Male","Male",
                              "Female","Female","Female","Female","Female"]})


ggplot(aes(x='Height', y='Weight', color='Gender'), data=df)  + geom_point()

ggplot in python

Answered By: Anton Protopopov

Answer #3:

You can use the color parameter to the plot method to define the colors you want for each column. For example:

from pandas import DataFrame
data = DataFrame({'a':range(5),'b':range(1,6),'c':range(2,7)})
colors = ['yellowgreen','cyan','magenta']
data.plot(color=colors)

Three lines with custom colors

You can use color names or Color hex codes like ‘#000000’ for black say. You can find all the defined color names in matplotlib’s color.py file. Below is the link for the color.py file in matplotlib’s github repo.

https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/colors.py

Answered By: tarotcard

Answer #4:

https://seaborn.pydata.org/generated/seaborn.scatterplot.html

import numpy 
import pandas
import seaborn as sns

numpy.random.seed(0)
N = 37
_genders= ['Female', 'Male', 'Non-binary', 'No Response']
df = pandas.DataFrame({
    'Height (cm)': numpy.random.uniform(low=130, high=200, size=N),
    'Weight (kg)': numpy.random.uniform(low=30, high=100, size=N),
    'Gender': numpy.random.choice(_genders, size=N)
})

sns.scatterplot(data=df, x='Height (cm)', y='Weight (kg)', hue='Gender')

enter image description here

Answered By: Egor Ignatenkov

Leave a Reply

Your email address will not be published.