Quantcast
Channel: Robin on Linux
Viewing all articles
Browse latest Browse all 237

Using XGBoost to predict large sparse data

$
0
0

For using XGBoost to predict, I wrote code like this:

test = load_npz('test.npz')
test = csr_matrix(test, dtype = 'float32')

xgb_model = xgb.Booster({'n_jobs': -1})
xgb_model.load_model(MODEL_PATH)
result = xgb_model.predict(test)

But it reported error:

File "/usr/lib/python3.7/site-packages/scipy/sparse/base.py", line 689, in __getattr__
    raise AttributeError(attr + " not found")
AttributeError: feature_names not found

Looks csr_matrix in SciPy is not supported by XGBoost. Maybe I need to transfer sparse data to dense:

result = xgb_model.predict(test.todense())

But it still reported:

File "/usr/lib/python3.7/site-packages/scipy/sparse/base.py", line 1187, in _process_toarray_args
    return np.zeros(self.shape, dtype=self.dtype, order=order)
MemoryError

The ‘test’ data is too big so it cann’t even be transfered to dense data!
XGBoost doesn’t support the sparse format, and my sparse data cannot be changed to dense. Then what should I do?

Actually, the solution is inredibale simple — just use XGBoost’s DMatrix!

result = xgb_model.predict(xgb.DMatrix(test))


Viewing all articles
Browse latest Browse all 237

Trending Articles