Skip to content Skip to sidebar Skip to footer

Understanding Format Of Data In Scikit-learn

I am trying to work with multi-label text classification using scikit-learn in Python 3.x. I have data in libsvm format which I am loading using load_svmlight_file module. The data

Solution 1:

This has nothing to do with multilabel classification per se. The feature matrix X that you get from load_svmlight_file is a SciPy CSR matrix, as explained in the docs, and those print in a rather unfortunate format:

>>> from scipy.sparse import csr_matrix
>>> X = csr_matrix([[0, 0, 1], [2, 3, 0]])
>>> X
<2x3 sparse matrix of type '<type 'numpy.int64'>'
    with 3 stored elements in Compressed Sparse Row format>
>>> X.toarray()
array([[0, 0, 1],
       [2, 3, 0]])
>>> print(X)
  (0, 2)    1
  (1, 0)    2
  (1, 1)    3

Post a Comment for "Understanding Format Of Data In Scikit-learn"