본문 바로가기
Python

[autoML][python][mljar] automated machine learning - Part2 : mljar.version2

by Chandler.j 2022. 4. 5.
반응형

fig1. title

1. previous posting (install)

2. mode (manual, custom)

3. model save, load

4. features importance


1. previous posting (install)

설치 및 간단한 소개에 대해서는 이전글 참고

2021.05.26 - [Data Insider] - [python][mljar] automated machine learning - Part2 : mljar

 

[python][mljar] automated machine learning - Part2 : mljar

순서 1. mljar : automated machine learning 2. install - pip 3. run code 4. report 1. mljar : automated machine learning - 머신러닝 자동화 프레임 워크 그림참고 참고 : https://mljar.com/automated-ma..

datainsider.tistory.com

추가로 R에서도 사용가능하다고 함.

https://github.com/mljar/mljar-api-R

 

GitHub - mljar/mljar-api-R: R wrapper for MLJAR API

R wrapper for MLJAR API. Contribute to mljar/mljar-api-R development by creating an account on GitHub.

github.com


2. mode (built-in, custom)

fig2. types of mode

 

built-in : 기본 적으로 4종류를 제공하며 각 특성을 잘 이용하면 손쉽게 코딩 가능

모드 무게 목적 사용 알고리즘
explain light 데이터 탐구 및 간단한 ML model 서치 Baseline, Linear, Decision Tree, Random Forest, XGBoost, Neural Network
perform medium ML Pipeline
5-fold Cross-Validation
Linear, Random Forest, LightGBM, XGBoost, CatBoost, Neural Network
compete heavy competitions
Deep ML Pipeline
advanced feature engineering
Linear, Decision Tree, Random Forest, Extra Trees, XGBoost, LightGBM, CatBoost, Neural Network, Ensemble, Stacked Ensemble
optuna super-heavy 시간제약이 없을때
10-fold Cross-Validation
Random Forest, Extra Trees, LightGBM, XGBoost, CatBoost
automl = AutoML(mode="Explain")
automl = AutoML(mode="Perform")
automl = AutoML(mode="Compete")
automl = AutoML(mode="Optuna", optuna_time_budget=3600)

 

custom : 본인 취향에 맞게 이용 가능

automl = AutoML(
    algorithms=["CatBoost", "Xgboost", "LightGBM"],
    model_time_limit=30*60,
    start_random_models=10,
    hill_climbing_steps=3,
    top_models_to_improve=3,
    golden_features=True,
    features_selection=False,
    stack_models=True,
    train_ensemble=True,
    
    explain_level=0,           
    #0=no
    #1=learning curves, 
    #importance plot (with permutation method), 
    #for decision trees produce tree plots, 
    #for linear models save coefficients.
    #2=1+SHAP
    
    validation_strategy={
        "validation_type": "kfold",
        "k_folds": 4,
        "shuffle": False,
        "stratify": True,
    }
)

https://supervised.mljar.com/features/modes/

 

AutoML modes - AutoML mljar-supervised

AutoML Modes Built-in modes There are 3 built-in modes available in AutoML: Explain - to be used when the user wants to explain and understand the data. Perform - to be used when the user wants to train a model that will be used in real-life use cases. Com

supervised.mljar.com


3. model save, load, predict, predict probability

 

model save

  • automl.fit()을 진행하여 output이 있다면 따로 저장할 필요가 없음
  • AutoML config에 results_path 지정가능
automl = AutoML(mode="Compete", ml_task="binary_classification", 
    eval_metric='auc', 
    train_ensemble = False,
    stack_models = False, 
    random_state=SEED,
    results_path="test1")
automl.fit(X_train, y_train, weights)

model load

  • results_path에서 가져오기만 하면됨
automl = AutoML(results_path="test1")

predict

  • train 데이터 말고 새 데이터를 예측해볼 수 있음
automl.predict(X_test)

output of predict()

predict probability

  • probability도 계산가능함
automl.predict_proba(X_test)

output of predict_proba()


4. features importance

  • explain과 perform에서는 feature 중요도도 확인해 볼 수 있음
  • results path에 model readme를 참고하거나 model들 안에 해당 이미지 파일이 있음
  • explain_level hyperparameter로 custom 가능 (explain_level=2, SHAP 추가)

 

요약 : mljar는 생각보다 굉장히 쓸만한 툴이다.


TOP

Designed by 티스토리