본문 바로가기
Python

[Imputation][python] Missing value Imputation, simple and multivariate

by Chandler.j 2022. 3. 15.
반응형

fig1. title

 

아마 거의 모든 데이터 셋에는 missing value가 존재

 

missing value 처리 방법은 간단히 두가지

  1. 지운다. : deletion
  2. 채운다. : imputation

 

imputation 방법은 크게 두가지

  1. simple
  2. multivariate

 

simple imputation

  • If “mean”, then replace missing values using the mean along each column. Can only be used with numeric data.
  • If “median”, then replace missing values using the median along each column. Can only be used with numeric data.
  • If “most_frequent”, then replace missing using the most frequent value along each column. Can be used with strings or numeric data. If there is more than one such value, only the smallest is returned.
  • If “constant”, then replace missing values with fill_value. Can be used with strings or numeric data.

ref: https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html#sklearn.impute.SimpleImputer

 

multivariate imputation (MICE, Multiple Imputation by Chained Equations)

simple imputation 방법보다 더 정교한 방법

sklearn의 IterativeImputer와 함께 4가지 알고리즘으로 활용가능

ref: https://scikit-learn.org/stable/modules/generated/sklearn.impute.IterativeImputer.html

ref: https://scikit-learn.org/stable/auto_examples/impute/plot_iterative_imputer_variants_comparison.html#sphx-glr-auto-examples-impute-plot-iterative-imputer-variants-comparison-py


TOP

Designed by 티스토리