How To Reverse Label Encoder From Sklearn For Multiple Columns?
Solution 1:
In order to inverse transform the data you need to remember the encoders that were used to transform every column. A possible way to do this is to save the LabelEncoder
s in a dict inside your object. The way it would work:
- when you call
fit
the encoders for every column are fit and saved - when you call
transform
they get used to transform data - when you call
inverse_transform
they get used to do the inverse transformation
Example code:
classMultiColumnLabelEncoder:
def__init__(self, columns=None):
self.columns = columns # array of column names to encodedeffit(self, X, y=None):
self.encoders = {}
columns = X.columns if self.columns isNoneelse self.columns
for col in columns:
self.encoders[col] = LabelEncoder().fit(X[col])
return self
deftransform(self, X):
output = X.copy()
columns = X.columns if self.columns isNoneelse self.columns
for col in columns:
output[col] = self.encoders[col].transform(X[col])
return output
deffit_transform(self, X, y=None):
return self.fit(X,y).transform(X)
definverse_transform(self, X):
output = X.copy()
columns = X.columns if self.columns isNoneelse self.columns
for col in columns:
output[col] = self.encoders[col].inverse_transform(X[col])
return output
You can then use it like this:
multi = MultiColumnLabelEncoder(columns=['city','size'])
df = pd.DataFrame({'city': ['London','Paris','Moscow'],
'size': ['M', 'M', 'L'],
'quantity':[12, 1, 4]})
X = multi.fit_transform(df)
print(X)
# city size quantity# 0 0 1 12# 1 2 1 1# 2 1 0 4
inv = multi.inverse_transform(X)
print(inv)
# city size quantity# 0 London M 12# 1 Paris M 1# 2 Moscow L 4
There could be a separate implementation of fit_transform
that would call the same method of LabelEncoder
s. Just make sure to keep the encoders around for when you need the inverse transformation.
Solution 2:
You do not need to modify it this way. It's already implemented as a method inverse_transform
.
Example:
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
df = ["paris", "paris", "tokyo", "amsterdam"]
le_fitted = le.fit_transform(df)
inverted = le.inverse_transform(le_fitted)
print(inverted)
# array(['paris', 'paris', 'tokyo', 'amsterdam'], dtype='|S9')
Solution 3:
LabelEncoder()
should only be used to encode the target. That's why you can't use it on multiple columns at the same time as any other transformers. The alternative is the OrdinalEncoder
which does the same job as LabelEncoder but can be used on all categorical columns at the same time just like OneHotEncoder
:
from sklear.preprocessing importOrdinalEncoderoe= OrdinalEncoder()
X = or.fit_transform(X)
Post a Comment for "How To Reverse Label Encoder From Sklearn For Multiple Columns?"