Skip to content Skip to sidebar Skip to footer

Seaborn Pairplot Error When Dataset Has Nan Values

I have a pandas DataFrame with multiple columns filled with numbers and rows which have the 1st columns categorical data. Obviously, I have NaN values (and zeros) in multiple rows

Solution 1:

Seaborn's PairGrid function will allow you to create your desired plot. PairGrid is much more flexible than sns.pairplot. Any PairGrid created has three sections: the upper triangle, the lower triangle and the diagonal.

For each part, you can define a customized plotting function. The upper and lower triangle sections can take any plotting function that accepts two arrays of features (such as plt.scatter) as well as any associated keywords (e.g. marker). The diagonal section accepts a plotting function that has a single feature array as input (such as plt.hist) in addition to the relevant keywords.

For your purpose, you can filter out the NaNs in your customized function(s):

from sklearn import datasets
import pandas as pd
import numpy as np
import seaborn as sns

data = datasets.load_iris()
iris = pd.DataFrame(data.data, columns=data.feature_names)

# break iris dataset to create NaNs
iris.iat[1, 0] = np.nan
iris.iat[4, 0] = np.nan
iris.iat[4, 2] = np.nan
iris.iat[5, 2] = np.nan

# create customized scatterplot that first filters out NaNs in feature pair
def scatterFilter(x, y, **kwargs):

    interimDf = pd.concat([x, y], axis=1)
    interimDf.columns = ['x', 'y']
    interimDf = interimDf[(~ pd.isnull(interimDf.x)) & (~ pd.isnull(interimDf.y))]

    ax = plt.gca()
    ax = plt.plot(interimDf.x.values, interimDf.y.values, 'o', **kwargs)

# Create an instance of the PairGrid class.
grid = sns.PairGrid(data=iris, vars=list(iris.columns), size = 4)

# Map a scatter plot to the upper triangle
grid = grid.map_upper(scatterFilter, color='darkred')

# Map a histogram to the diagonal
grid = grid.map_diag(plt.hist, bins=10, edgecolor='k', color='darkred')

# Map a density plot to the lower triangle
grid = grid.map_lower(scatterFilter, color='darkred')

This will yield the following plot: Iris Seaborn PairPlot

PairPlot allows you to plot contour plots, annotate the panels with descriptive statistics etc. For more detail, see here.


Post a Comment for "Seaborn Pairplot Error When Dataset Has Nan Values"