본문 바로가기
Python

[python] Oversampling, SMOTE –ADASYN

by Chandler.j 2020. 12. 9.
반응형

fig1. title

 

imbalanced dataset을 이용해서 maching learning modeling을 할때 크게 두가지 방법이 있음

1. hyperparameter tuning 과정에서 scale weight

2. oversampling


Oversampling 기법으로 SMOTE(synthetic minority oversampling technique)가 일반적으로 많이 쓰임

그 일환으로 ADASYN(Adaptive synthetic sampling approach for imbalanced learning)을 이용해서 oversampling 해보겠음.


#1 데이터 준비

X_train_features_imputed.info()

fig2. output #1


#2 ADASYN 이용 oversampling

- hyper parameter에서 smpling_strategy는 비율

- k(default=5)

- random_state

- 세가지 설정만 유의하면 됨 나머지는 url참조

from collections import Counter
from sklearn.datasets import make_classification
from imblearn.over_sampling import ADASYN # doctest: +NORMALIZE_WHITESPACE
SEED=42

print('Original dataset shape %s' % Counter(y_train))

ada = ADASYN(sampling_strategy=0.02, random_state=SEED)
X_res, y_res = ada.fit_resample(X_train_features_imputed, y_train)
print('Resampled dataset shape %s' % Counter(y_res))

fig2. output #2

adasyn docs : imbalanced-learn.readthedocs.io/en/stable/generated/imblearn.over_sampling.ADASYN.html#imblearn.over_sampling.ADASYN


#3 result visualization

# visualization oversampling
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt

pca = PCA(n_components=2)
X_vis = pca.fit_transform(X_train_features_imputed)
X_res_vis = pca.fit_transform(X_res)

# Two subplots, unpack the axes array immediately
f, (ax1, ax2) = plt.subplots(1, 2)

c0 = ax1.scatter(X_vis[y_train == 0, 0], X_vis[y_train == 0, 1], label="Class #0",
                 alpha=0.5)
c1 = ax1.scatter(X_vis[y_train == 1, 0], X_vis[y_train == 1, 1], label="Class #1",
                 alpha=0.5)
ax1.set_title('Original set')

ax2.scatter(X_res_vis[y_res == 0, 0], X_res_vis[y_res == 0, 1],
            label="Class #0", alpha=.5)
ax2.scatter(X_res_vis[y_res == 1, 0], X_res_vis[y_res == 1, 1],
            label="Class #1", alpha=.5)
ax2.set_title('ADASYN')

# make nice plotting
for ax in (ax1, ax2):
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    ax.get_xaxis().tick_bottom()
    ax.get_yaxis().tick_left()
    ax.spines['left'].set_position(('outward', 10))
    ax.spines['bottom'].set_position(('outward', 10))
    ax.set_xlim([-6, 8])
    ax.set_ylim([-6, 6])

plt.rcParams['figure.figsize'] = (15, 10)
plt.figlegend((c0, c1), ('Class #0', 'Class #1'), loc='lower center',
              ncol=2, labelspacing=0.)
plt.tight_layout(pad=3)
plt.show()

fig3, output #3

 


TOP

Designed by 티스토리