순서
1. mljar : automated machine learning
2. install - pip
3. run code
4. report
1. mljar : automated machine learning
- 머신러닝 자동화 프레임 워크 그림참고
참고 : https://mljar.com/automated-machine-learning/
What is Automated Machine Learning?
The MLJAR Automated Machine Learning (AutoML) Framework
mljar.com
2. install - pip, conda
설치는 pip로만 가능 아직 conda 불가능
pip install mljar-supervised
참고 : https://pypi.org/project/mljar-supervised/
mljar-supervised
Automated Machine Learning for Humans
pypi.org
conda install -c conda-forge mljar-supervised
참고 : https://anaconda.org/conda-forge/mljar-supervised
Mljar Supervised :: Anaconda.org
Description The mljar-supervised is an Automated Machine Learning Python package that works with tabular data. It is designed to save time for a data scientist. It abstracts the common way to preprocess the data, construct the machine learning models, and
anaconda.org
3. run code
소스코드 : https://github.com/mljar/mljar-supervised
mljar/mljar-supervised
Automated Machine Learning Pipeline with Feature Engineering and Hyper-Parameters Tuning :rocket: - mljar/mljar-supervised
github.com
보통 3가지 방법이 쓰임
1.optuna, 2.explain, 3.perform 성격에 맞게 골라서 쓰면됨 사실 성능이 제일 좋은걸 쓰면됨.
hyperparameter 튜닝하는 개수가 달라지는 걸로 알고 있음.
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from supervised.automl import AutoML
from sklearn.utils.class_weight import compute_sample_weight
# data load and split X, y
df=pd.read_csv("data/60model_train_share.csv")
X_train = df.drop('eGFR_ab', axis=1)
y_train = df['eGFR_ab'].astype("int64")
weights = compute_sample_weight(class_weight="balanced", y=y_train)
# 1. optuna
automl = AutoML(mode="Optuna", ml_task="binary_classification",
algorithms=["CatBoost"], eval_metric='auc',
optuna_time_budget=10*60,
total_time_limit = 24*3600,
golden_features = False,
features_selection = False,
train_ensemble= True,
stack_models = 'auto',
random_state=SEED, results_path="optuna")
automl.fit(X_train, y_train, weights)
# 2. explain
automl = AutoML(mode="Explain", ml_task="binary_classification",
algorithms=["Baseline", "CatBoost", "Xgboost", "Random Forest", "Extra Trees", "LightGBM", "Neural Network"],
eval_metric='auc',
train_ensemble= False,
random_state=SEED,
results_path="explain-wt")
automl.fit(X_train, y_train, weights)
# 3. Perform
automl = AutoML(mode="Perform", ml_task="binary_classification",
algorithms=["CatBoost", "Xgboost"],
eval_metric='auc',
golden_features=False,
features_selection=False,
train_ensemble = False,
stack_models = False,
random_state=SEED,
results_path="perform")
automl.fit(X_train, y_train, weights)
4. report
주피터나 다른 환경에서도 돌아가지만 결과를 확인할때 Visual studio code가 제일 호환의 좋음
- report가 아주 보기 편하게 뽑혀나옴
- 성능이 가장 좋은 모델을 꺼내올 수 있음
- 비교적 짧은 시간에 좋은 모델을 뽑을 수 있음
추가
MLJAR 다른 AutoML과 비교해서 우수한 성능을 보였다는 보고도 있음