[autoML][python][mljar] automated machine learning

1. previous posting (install)

2. mode (manual, custom)

3. model save, load

4. features importance

1. previous posting (install)

설치 및 간단한 소개에 대해서는 이전글 참고

2021.05.26 - [Data Insider] - [python][mljar] automated machine learning - Part2 : mljar

[python][mljar] automated machine learning - Part2 : mljar

순서 1. mljar : automated machine learning 2. install - pip 3. run code 4. report 1. mljar : automated machine learning - 머신러닝 자동화 프레임 워크 그림참고 참고 : https://mljar.com/automated-ma..

datainsider.tistory.com

추가로 R에서도 사용가능하다고 함.

https://github.com/mljar/mljar-api-R

GitHub - mljar/mljar-api-R: R wrapper for MLJAR API

R wrapper for MLJAR API. Contribute to mljar/mljar-api-R development by creating an account on GitHub.

github.com

2. mode (built-in, custom)

built-in : 기본 적으로 4종류를 제공하며 각 특성을 잘 이용하면 손쉽게 코딩 가능

모드	무게	목적	사용 알고리즘
explain	light	데이터 탐구 및 간단한 ML model 서치	Baseline, Linear, Decision Tree, Random Forest, XGBoost, Neural Network
perform	medium	ML Pipeline 5-fold Cross-Validation	Linear, Random Forest, LightGBM, XGBoost, CatBoost, Neural Network
compete	heavy	competitions Deep ML Pipeline advanced feature engineering	Linear, Decision Tree, Random Forest, Extra Trees, XGBoost, LightGBM, CatBoost, Neural Network, Ensemble, Stacked Ensemble
optuna	super-heavy	시간제약이 없을때 10-fold Cross-Validation	Random Forest, Extra Trees, LightGBM, XGBoost, CatBoost

automl = AutoML(mode="Explain")
automl = AutoML(mode="Perform")
automl = AutoML(mode="Compete")
automl = AutoML(mode="Optuna", optuna_time_budget=3600)

custom : 본인 취향에 맞게 이용 가능

automl = AutoML(
    algorithms=["CatBoost", "Xgboost", "LightGBM"],
    model_time_limit=30*60,
    start_random_models=10,
    hill_climbing_steps=3,
    top_models_to_improve=3,
    golden_features=True,
    features_selection=False,
    stack_models=True,
    train_ensemble=True,
    
    explain_level=0,           
    #0=no
    #1=learning curves, 
    #importance plot (with permutation method), 
    #for decision trees produce tree plots, 
    #for linear models save coefficients.
    #2=1+SHAP
    
    validation_strategy={
        "validation_type": "kfold",
        "k_folds": 4,
        "shuffle": False,
        "stratify": True,
    }
)

https://supervised.mljar.com/features/modes/

AutoML modes - AutoML mljar-supervised

AutoML Modes Built-in modes There are 3 built-in modes available in AutoML: Explain - to be used when the user wants to explain and understand the data. Perform - to be used when the user wants to train a model that will be used in real-life use cases. Com

supervised.mljar.com

3. model save, load, predict, predict probability

model save

automl.fit()을 진행하여 output이 있다면 따로 저장할 필요가 없음
AutoML config에 results_path 지정가능

automl = AutoML(mode="Compete", ml_task="binary_classification", 
    eval_metric='auc', 
    train_ensemble = False,
    stack_models = False, 
    random_state=SEED,
    results_path="test1")
automl.fit(X_train, y_train, weights)

model load