Seaborn Pairplot Error When Dataset Has Nan Values
Solution 1:
Seaborn's PairGrid
function will allow you to create your desired plot. PairGrid
is much more flexible than sns.pairplot
. Any PairGrid
created has three sections: the upper triangle, the lower triangle and the diagonal.
For each part, you can define a customized plotting function. The upper and lower triangle sections can take any plotting function that accepts two arrays of features (such as plt.scatter
) as well as any associated keywords (e.g. marker
). The diagonal section accepts a plotting function that has a single feature array as input (such as plt.hist
) in addition to the relevant keywords.
For your purpose, you can filter out the NaNs in your customized function(s):
from sklearn import datasets
import pandas as pd
import numpy as np
import seaborn as sns
data = datasets.load_iris()
iris = pd.DataFrame(data.data, columns=data.feature_names)
# break iris dataset to create NaNs
iris.iat[1, 0] = np.nan
iris.iat[4, 0] = np.nan
iris.iat[4, 2] = np.nan
iris.iat[5, 2] = np.nan
# create customized scatterplot that first filters out NaNs in feature pair
def scatterFilter(x, y, **kwargs):
interimDf = pd.concat([x, y], axis=1)
interimDf.columns = ['x', 'y']
interimDf = interimDf[(~ pd.isnull(interimDf.x)) & (~ pd.isnull(interimDf.y))]
ax = plt.gca()
ax = plt.plot(interimDf.x.values, interimDf.y.values, 'o', **kwargs)
# Create an instance of the PairGrid class.
grid = sns.PairGrid(data=iris, vars=list(iris.columns), size = 4)
# Map a scatter plot to the upper triangle
grid = grid.map_upper(scatterFilter, color='darkred')
# Map a histogram to the diagonal
grid = grid.map_diag(plt.hist, bins=10, edgecolor='k', color='darkred')
# Map a density plot to the lower triangle
grid = grid.map_lower(scatterFilter, color='darkred')
This will yield the following plot: Iris Seaborn PairPlot
PairPlot
allows you to plot contour plots, annotate the panels with descriptive statistics etc. For more detail, see here.
Post a Comment for "Seaborn Pairplot Error When Dataset Has Nan Values"