imbalace dataset을 이용해서 modeling한 classification model에서 predict probabilities를 calibration 해주면 performance의 개선이 있을 수 있음 참고 : machinelearningmastery.com/probability-calibration-for-imbalanced-classification/ How to Calibrate Probabilities for Imbalanced Classification Many machine learning models are capable of predicting a probability or probability-like scores for class membership. Probabilitie..
파이썬에서 조건 걸고 새칼럼 추가하기 R에서는 mutate + ifelse 로 간단하게 가능함 파이썬에서는 조건의 개수에 따라 편하게 쓰는 방법이 다른것 같음 #1. 조건이 1개일 때 : np.where df2['eGFR_ab90'] = np.where(df2['eGFR_ckd']
machine learning에서 데이터 구조가 imbalance할 때 down sampling 기법 중 가장 간단한 방법 무작위 추출 : random sampling random state를 꼭 설정해주어야 reproducible 함. #1. DataFrame.sample 원하는 개수만큼 parameter : n= 원하는 개수 전체 dataframe의 길이의 비율 ; parameter : frac= 원하는 비율(0~1) ; replace=true 해줘야함 df=pd.read_csv("C:/Users/comcom/knhanes_eGFR/ua_full.v1.csv") abnormal = df.query('eGFR_ab==1') normal_sample = df.query('eGFR_ab==0').sampl..
binary classification에서 best threshold를 찾고 roc-curve에 표시해보자 best threshold는 Youden’s J statistic를 이용한다. 참고: en.wikipedia.org/wiki/Youden%27s_J_statistic Youden's J statistic - Wikipedia From Wikipedia, the free encyclopedia Jump to navigation Jump to search Index that describes the performance of a dichotomous diagnostic test Youden's J statistic (also called Youden's index) is a single statisti..
bland altman plot을 이용해서 regression model의 성능을 확인해 볼 수 있다. bland altman 참고 : en.wikipedia.org/wiki/Bland%E2%80%93Altman_plot Bland–Altman plot - Wikipedia Bland–Altman plot example A Bland–Altman plot (difference plot) in analytical chemistry or biomedicine is a method of data plotting used in analyzing the agreement between two different assays. It is identical to a Tukey mean-difference plot,[1..
sklearn 에서 classification_report module을 사용하여 평가 지표를 한번에 볼 수 있음 #1. classification_report from sklearn.metrics import classification_report y_pred = model.predict(X_test_features) print(classification_report(y_test, y_pred, target_names=['normal', 'abnormal'])) 참고 : scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html sklearn.metrics.classification_report — scikit..
#1. source data print(df_age_error.info()) #2. seaborn import seaborn as sns sns.boxplot(y='pred-true', x='age_gp', data=df_age_error) reference : https://cmdlinetips.com/2019/03/how-to-make-grouped-boxplots-in-python-with-seaborn/ How To Make Grouped Boxplots in Python with Seaborn? - Python and R Tips Boxplots are one of the most common ways to visualize data distributions from multiple groups..