"valueerror: Codes Need To Be Between -1 And Len(categories)-1" When Extracting Null Values In Rpy2
Solution 1:
Why is rpy2 not handling this gracefully?
This seems like a bug triggered during the conversion of the R factor to pandas with rpy2 versions 2.9.x (the dev branch default
, future 3.0.x, does not have this issue). Specifically when doing:
res = pandas.Categorical.from_codes(numpy.asarray(obj) - 1,
categories = obj.do_slot('levels'),
ordered = 'ordered' in obj.rclass)
R "factor" objects are vector of integers, with each integer an index in an associated vector of "levels". The converter is simply subtracting one because R arrays are one-indexed and Python arrays are zero-index, but this is breaking whenever there are missing values (NAs) because R is using a specific integer to encode missing integers (an extreme value) and Python, numpy, and pandas does not have an equivalence for this.
I opened an issue to track this and in the meantime, workarounds can be to replace the NAs on the R side to a level (and call them, say, "missing" or "NA"), change the factors to arrays of strings, or to modify the pandas converter for R factors. For example:
robjects.r("""
SD2011_nofactor <- SD2011 %>%
dplyr::mutate_if(is.factor,
funs(as.character(.))
""")
(Or use rpy2's Pythonic interface to dplyr)
Note:
Few things are succcessively happening when doing:
robjects.r('SD2011[3, 27]')
- the R code
SD2011[3, 27]
is evaluated - the result of that evaluation is going through the robjects-level conversion
- the object resulting from that conversion is shown in your notebook
If unsure, finding which one of the Python statements below is the first to fail can tell it:
Evaluate the R code (the added
TRUE
is to prevent the evaluation from returningx
).robjects.r('x <- SD2011[3, 27]; TRUE')
Fetch the object
x
obtained from the evaluation above and bind it to a Python symbol (the conversion will be aplied).x = robjects.r('x')
Show a text representation of the converted object
repr(x)
Post a Comment for ""valueerror: Codes Need To Be Between -1 And Len(categories)-1" When Extracting Null Values In Rpy2"