learning_python::로지스틱 회귀 분석 (Logistic Regression)

Logistic Regression은 분류 (Classification)

from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression()
classifier.fit(x_train, y_train)
y_pred = classifier.predict(x_test)
y_pred_prob = classifier.predict_proba(x_test)

시그모이드함수 p

x_range = np.arange(min(x), max(x), 0.1)
# p=1/(1+e^-y), y = mx+b = classifier.coef_ * x + classifier.intercept_)
p = 1/(1+np.exp(-(classifier.coef_ * x_range + classifier.intercept_))) 
p = p.reshape(-1) # 1차원 배열로 변경, y와 동일한 형태로 변환

시각화

plt.scatter(x_train, y_train)
plt.plot(x_range, p, 'orange')
plt.scatter(x_test, y_test, color = 'green')
plt.scatter(x_test, classifier.predict(x_test), color = 'red' )
# 확률 0.5 기준선
plt.plot(x_range, np.full(len(x_range), 0.5), color = 'grey')

Confusion Matrix

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, classifier.predict(x_test))

True Negative (TP) – 불합격으로 예측 & 진짜 불합격	False Positive (FP) – 합격으로 예측 & 실제 불합격
False Negative (FN) – 불합격으로 예측 & 실제 합격	True Positive (TP) – 합격으로 예측 & 진짜 합격