[python][mljar] automated machine learning

순서

1. mljar : automated machine learning

2. install - pip

3. run code

4. report

1. mljar : automated machine learning

- 머신러닝 자동화 프레임 워크 그림참고

참고 : https://mljar.com/automated-machine-learning/

What is Automated Machine Learning?

The MLJAR Automated Machine Learning (AutoML) Framework

mljar.com

2. install - pip, conda

설치는 pip로만 가능 아직 ~~conda 불가능~~

pip install mljar-supervised

참고 : https://pypi.org/project/mljar-supervised/

mljar-supervised

Automated Machine Learning for Humans

pypi.org

conda install -c conda-forge mljar-supervised

참고 : https://anaconda.org/conda-forge/mljar-supervised

Mljar Supervised :: Anaconda.org

Description The mljar-supervised is an Automated Machine Learning Python package that works with tabular data. It is designed to save time for a data scientist. It abstracts the common way to preprocess the data, construct the machine learning models, and

anaconda.org

3. run code

소스코드 : https://github.com/mljar/mljar-supervised

mljar/mljar-supervised

Automated Machine Learning Pipeline with Feature Engineering and Hyper-Parameters Tuning :rocket: - mljar/mljar-supervised

github.com

보통 3가지 방법이 쓰임

1.optuna, 2.explain, 3.perform ~~성격에 맞게 골라서 쓰면됨~~ 사실 성능이 제일 좋은걸 쓰면됨.

hyperparameter 튜닝하는 개수가 달라지는 걸로 알고 있음.

import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from supervised.automl import AutoML
from sklearn.utils.class_weight import compute_sample_weight

# data load and split X, y
df=pd.read_csv("data/60model_train_share.csv")

X_train = df.drop('eGFR_ab', axis=1)
y_train = df['eGFR_ab'].astype("int64")

weights = compute_sample_weight(class_weight="balanced", y=y_train)

# 1. optuna

automl = AutoML(mode="Optuna", ml_task="binary_classification", 
    algorithms=["CatBoost"], eval_metric='auc',
    optuna_time_budget=10*60,
    total_time_limit = 24*3600,
    golden_features = False, 
    features_selection = False,
    train_ensemble= True,
    stack_models = 'auto',
    random_state=SEED, results_path="optuna")

automl.fit(X_train, y_train, weights)

# 2. explain

automl = AutoML(mode="Explain", ml_task="binary_classification", 
    algorithms=["Baseline", "CatBoost", "Xgboost", "Random Forest", "Extra Trees", "LightGBM", "Neural Network"], 
    eval_metric='auc',
    train_ensemble= False, 
    random_state=SEED,
    results_path="explain-wt")

automl.fit(X_train, y_train, weights)

# 3. Perform

automl = AutoML(mode="Perform", ml_task="binary_classification", 
    algorithms=["CatBoost", "Xgboost"], 
    eval_metric='auc',
    golden_features=False,
    features_selection=False,
    train_ensemble = False,
    stack_models = False, 
    random_state=SEED,
    results_path="perform")

automl.fit(X_train, y_train, weights)