The first time I noticed PyCaret is from the recommendation page from Google Chrome. Recent days I got time to test it.
The original test program failed because there is a column which has a number like 1000000000123456789 that cause the error:
... MemoryError: Unable to allocate 711. PiB for an array with shape (1000000000123456789,) and data type int64
Seems PyCaret isn’t strong enough, right?
After I remove this column, I still got an error about display_id and that’s because I was using console instead of Jupyter Notebook.
When everything has been set up
df = pd.read_csv(TRAIN_CSV_FILE, nrows=10000).drop(columns=["custom_id"]) clf = setup(data=df, target="target", session_id=1023) compare_models(verbose = False)
I got this result:

Because I have already got 0.88 AUC from LightGBM, the 0.88 AUC of GradientBoostingClassifier is not a big surprise for me. Since LightGBM and GradientBoostingClassifier both use tree algorithms, the result of ensembling them may not be attractive. But after looking through the results, a quick idea jump out of my mind: why not trying AdaBoostClassifier by using LogisticRegression as the base estimator?
I will give it a try later.