1 邏輯回歸模型介紹
????????邏輯回歸(Logistic Regression, LR)又稱為邏輯回歸分析,是一種機器學習算法,屬于分類和預測算法中的一種,主要用于解決二分類問題。邏輯回歸通過歷史數(shù)據(jù)的表現(xiàn)對未來結(jié)果發(fā)生的概率進行預測。例如,我們可以將購買的概率設(shè)置為因變量,將用戶的特征屬性,例如性別,年齡,注冊時間等設(shè)置為自變量。根據(jù)特征屬性預測購買的概率。
????????邏輯回歸它通過建立一個邏輯回歸模型來預測輸入樣本屬于某個類別的概率。邏輯回歸模型的核心思想是使用一個稱為sigmoid函數(shù)(或者稱為邏輯函數(shù))的函數(shù)來建模概率。sigmoid函數(shù)的公式為:
2 邏輯回歸的應用場景
????????邏輯回歸是一種簡單而高效的機器學習算法,它具有多個優(yōu)勢。首先,邏輯回歸模型易于理解和實現(xiàn),計算效率高,特別適用于大規(guī)模數(shù)據(jù)集。其次,邏輯回歸提供了對結(jié)果的解釋和推斷能力,模型的系數(shù)可以揭示哪些特征對分類結(jié)果的影響較大或較小。此外,邏輯回歸適用于高維數(shù)據(jù),能夠處理具有大量特征的問題,并捕捉到不同特征之間的關(guān)系。另外,邏輯回歸能夠輸出概率預測,而不僅僅是分類結(jié)果,對于需要概率估計或不確定性分析的任務(wù)非常有用。最后,邏輯回歸對于數(shù)據(jù)中的噪聲和缺失值具有一定的魯棒性,能夠適應現(xiàn)實世界中的不完美數(shù)據(jù)。綜上所述,邏輯回歸是一種強大而實用的分類算法,在許多實際應用中被廣泛采用。以下是邏輯回歸常見的應用場景。
- 金融領(lǐng)域:邏輯回歸可用于信用評分、欺詐檢測、客戶流失預測等金融風險管理任務(wù)。
-
醫(yī)學領(lǐng)域:邏輯回歸可以用于疾病診斷、患者預后評估、藥物反應預測等醫(yī)學決策支持任務(wù)。
-
市場營銷:邏輯回歸可用于客戶分類、用戶行為分析、廣告點擊率預測等市場營銷領(lǐng)域的任務(wù)。
-
自然語言處理:邏輯回歸可用于文本分類、情感分析、垃圾郵件過濾等自然語言處理任務(wù)。
-
圖像識別:邏輯回歸可以應用于圖像分類、目標檢測中的二分類問題。
????????邏輯回歸的簡單性和可解釋性使其在許多實際應用中得到廣泛應用。然而,對于復雜的非線性問題,邏輯回歸可能不適用,此時可以考慮使用其他更復雜的模型或者結(jié)合特征工程技術(shù)來改進性能。
?3 基于pytorch實現(xiàn)銀行欺詐人員的二分類判別
(1)數(shù)據(jù)集
在一個銀行欺詐數(shù)據(jù)集上,通過15個特征,得到二分類的判別結(jié)果:是否為欺詐失信人員。建的模型依舊是線性模型。輸出的值通過sigmoid進行轉(zhuǎn)換,變成0~1的概率。一般認為大于0.5就是1,小于0.5就是0。
0 |
56.75 |
12.25 |
0 |
0 |
6 |
0 |
1.25 |
0 |
0 |
4 |
0 |
0 |
200 |
0 |
-1 |
0 |
31.67 |
16.165 |
0 |
0 |
1 |
0 |
3 |
0 |
0 |
9 |
1 |
0 |
250 |
730 |
-1 |
1 |
23.42 |
0.79 |
1 |
1 |
8 |
0 |
1.5 |
0 |
0 |
2 |
0 |
0 |
80 |
400 |
-1 |
1 |
20.42 |
0.835 |
0 |
0 |
8 |
0 |
1.585 |
0 |
0 |
1 |
1 |
0 |
0 |
0 |
-1 |
0 |
26.67 |
4.25 |
0 |
0 |
2 |
0 |
4.29 |
0 |
0 |
1 |
0 |
0 |
120 |
0 |
-1 |
0 |
34.17 |
1.54 |
0 |
0 |
2 |
0 |
1.54 |
0 |
0 |
1 |
0 |
0 |
520 |
50000 |
-1 |
1 |
36 |
1 |
0 |
0 |
0 |
0 |
2 |
0 |
0 |
11 |
1 |
0 |
0 |
456 |
-1 |
0 |
25.5 |
0.375 |
0 |
0 |
6 |
0 |
0.25 |
0 |
0 |
3 |
1 |
0 |
260 |
15108 |
-1 |
0 |
19.42 |
6.5 |
0 |
0 |
9 |
1 |
1.46 |
0 |
0 |
7 |
1 |
0 |
80 |
2954 |
-1 |
0 |
35.17 |
25.125 |
0 |
0 |
10 |
1 |
1.625 |
0 |
0 |
1 |
0 |
0 |
515 |
500 |
-1 |
0 |
32.33 |
7.5 |
0 |
0 |
11 |
2 |
1.585 |
0 |
1 |
0 |
0 |
2 |
420 |
0 |
1 |
1 |
38.58 |
5 |
0 |
0 |
2 |
0 |
13.5 |
0 |
1 |
0 |
0 |
0 |
980 |
0 |
1 |
最后一列為-1是失信欺詐人員,為1不是失信欺詐人員
(2)pytorch完整代碼
import torch
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from torch import nn
from torch.utils.data import TensorDataset, DataLoader
from sklearn.model_selection import train_test_split
def accuracy(y_pred,y_true):
y_pred = (y_pred>0.5).type(torch.int32)
acc = (y_pred == y_true).float().mean()
return acc
loss_fn = nn.BCELoss()
epochs = 1000
batch = 16
lr = 0.0001
data = pd.read_csv("credit.csv",header=None)
X = data.iloc[:,:-1]
Y = data.iloc[:,-1].replace(-1,0)
X = torch.from_numpy(X.values).type(torch.float32)
Y = torch.from_numpy(Y.values).type(torch.float32)
train_x,test_x,train_y,test_y = train_test_split(X,Y)
train_ds = TensorDataset(train_x,train_y)
train_dl = DataLoader(train_ds,batch_size=batch,shuffle=True)
test_ds = TensorDataset(test_x,test_y)
test_dl = DataLoader(test_ds,batch_size=batch)
model = nn.Sequential(
nn.Linear(15,1),
nn.Sigmoid()
)
optim = torch.optim.Adam(model.parameters(),lr=lr)
accuracy_rate = []
for epoch in range(epochs):
for x,y in train_dl:
y_pred = model(x)
y_pred = y_pred.squeeze()
loss = loss_fn(y_pred,y)
optim.zero_grad()
loss.backward()
optim.step()
with torch.no_grad():
# 訓練集的準確率和loss
y_pred = model(train_x)
y_pred = y_pred.squeeze()
epoch_accuracy = accuracy(y_pred,train_y)
epoch_loss = loss_fn(y_pred,train_y).data
accuracy_rate.append(epoch_accuracy*100)
# 測試集的準確率和loss
y_pred = model(test_x)
y_pred = y_pred.squeeze()
epoch_test_accuracy = accuracy(y_pred,test_y)
epoch_test_loss = loss_fn(y_pred,test_y).data
print('epoch:',epoch,
'train_loss:',round(epoch_loss.item(),3),
"train_accuracy",round(epoch_accuracy.item(),3),
'test_loss:',round(epoch_test_loss.item(),3),
"test_accuracy",round(epoch_test_accuracy.item(),3)
)
accuracy_rate = np.array(accuracy_rate)
times = np.linspace(1, epochs, epochs)
plt.xlabel('epochs')
plt.ylabel('accuracy rate')
plt.plot(times, accuracy_rate)
plt.show()
(3)輸出結(jié)果
epoch: 951 train_loss: 0.334 train_accuracy 0.869 test_loss: 0.346 test_accuracy 0.866
epoch: 952 train_loss: 0.334 train_accuracy 0.863 test_loss: 0.348 test_accuracy 0.866
epoch: 953 train_loss: 0.337 train_accuracy 0.867 test_loss: 0.358 test_accuracy 0.86
epoch: 954 train_loss: 0.334 train_accuracy 0.867 test_loss: 0.35 test_accuracy 0.866
epoch: 955 train_loss: 0.334 train_accuracy 0.871 test_loss: 0.346 test_accuracy 0.866
epoch: 956 train_loss: 0.333 train_accuracy 0.865 test_loss: 0.348 test_accuracy 0.872
epoch: 957 train_loss: 0.333 train_accuracy 0.871 test_loss: 0.349 test_accuracy 0.866
epoch: 958 train_loss: 0.333 train_accuracy 0.867 test_loss: 0.347 test_accuracy 0.866
epoch: 959 train_loss: 0.334 train_accuracy 0.863 test_loss: 0.352 test_accuracy 0.866
epoch: 960 train_loss: 0.333 train_accuracy 0.867 test_loss: 0.35 test_accuracy 0.878
epoch: 961 train_loss: 0.334 train_accuracy 0.873 test_loss: 0.346 test_accuracy 0.866
epoch: 962 train_loss: 0.334 train_accuracy 0.865 test_loss: 0.353 test_accuracy 0.866
epoch: 963 train_loss: 0.333 train_accuracy 0.873 test_loss: 0.35 test_accuracy 0.866
epoch: 964 train_loss: 0.334 train_accuracy 0.863 test_loss: 0.345 test_accuracy 0.872
epoch: 965 train_loss: 0.333 train_accuracy 0.861 test_loss: 0.351 test_accuracy 0.866
epoch: 966 train_loss: 0.333 train_accuracy 0.873 test_loss: 0.348 test_accuracy 0.866
epoch: 967 train_loss: 0.333 train_accuracy 0.863 test_loss: 0.348 test_accuracy 0.866
epoch: 968 train_loss: 0.333 train_accuracy 0.867 test_loss: 0.351 test_accuracy 0.866
epoch: 969 train_loss: 0.334 train_accuracy 0.869 test_loss: 0.345 test_accuracy 0.878
epoch: 970 train_loss: 0.333 train_accuracy 0.869 test_loss: 0.348 test_accuracy 0.872
epoch: 971 train_loss: 0.335 train_accuracy 0.865 test_loss: 0.344 test_accuracy 0.86
epoch: 972 train_loss: 0.333 train_accuracy 0.867 test_loss: 0.35 test_accuracy 0.86
epoch: 973 train_loss: 0.334 train_accuracy 0.871 test_loss: 0.345 test_accuracy 0.872
epoch: 974 train_loss: 0.333 train_accuracy 0.865 test_loss: 0.351 test_accuracy 0.866
epoch: 975 train_loss: 0.333 train_accuracy 0.873 test_loss: 0.351 test_accuracy 0.86
epoch: 976 train_loss: 0.333 train_accuracy 0.869 test_loss: 0.346 test_accuracy 0.878
epoch: 977 train_loss: 0.333 train_accuracy 0.863 test_loss: 0.351 test_accuracy 0.866
epoch: 978 train_loss: 0.332 train_accuracy 0.865 test_loss: 0.351 test_accuracy 0.866
epoch: 979 train_loss: 0.332 train_accuracy 0.871 test_loss: 0.349 test_accuracy 0.866
epoch: 980 train_loss: 0.333 train_accuracy 0.865 test_loss: 0.345 test_accuracy 0.872
epoch: 981 train_loss: 0.332 train_accuracy 0.867 test_loss: 0.348 test_accuracy 0.872
epoch: 982 train_loss: 0.332 train_accuracy 0.863 test_loss: 0.349 test_accuracy 0.872
epoch: 983 train_loss: 0.333 train_accuracy 0.865 test_loss: 0.353 test_accuracy 0.866
epoch: 984 train_loss: 0.332 train_accuracy 0.865 test_loss: 0.35 test_accuracy 0.872
epoch: 985 train_loss: 0.333 train_accuracy 0.867 test_loss: 0.353 test_accuracy 0.86
epoch: 986 train_loss: 0.333 train_accuracy 0.871 test_loss: 0.345 test_accuracy 0.866
epoch: 987 train_loss: 0.331 train_accuracy 0.865 test_loss: 0.349 test_accuracy 0.872
epoch: 988 train_loss: 0.332 train_accuracy 0.869 test_loss: 0.345 test_accuracy 0.872
epoch: 989 train_loss: 0.332 train_accuracy 0.865 test_loss: 0.353 test_accuracy 0.866
epoch: 990 train_loss: 0.331 train_accuracy 0.865 test_loss: 0.348 test_accuracy 0.872
epoch: 991 train_loss: 0.333 train_accuracy 0.875 test_loss: 0.344 test_accuracy 0.86
epoch: 992 train_loss: 0.332 train_accuracy 0.865 test_loss: 0.351 test_accuracy 0.866
epoch: 993 train_loss: 0.331 train_accuracy 0.869 test_loss: 0.348 test_accuracy 0.872
epoch: 994 train_loss: 0.331 train_accuracy 0.871 test_loss: 0.348 test_accuracy 0.872
epoch: 995 train_loss: 0.331 train_accuracy 0.865 test_loss: 0.347 test_accuracy 0.872
epoch: 996 train_loss: 0.331 train_accuracy 0.865 test_loss: 0.347 test_accuracy 0.872
epoch: 997 train_loss: 0.331 train_accuracy 0.867 test_loss: 0.35 test_accuracy 0.872
epoch: 998 train_loss: 0.331 train_accuracy 0.867 test_loss: 0.349 test_accuracy 0.872
epoch: 999 train_loss: 0.331 train_accuracy 0.865 test_loss: 0.348 test_accuracy 0.872
文章來源:http://www.zghlxwxcb.cn/news/detail-487370.html
?4 完整數(shù)據(jù)集及代碼下載
完整代碼及數(shù)據(jù)集:代碼和數(shù)據(jù)集文章來源地址http://www.zghlxwxcb.cn/news/detail-487370.html
到了這里,關(guān)于機器學習之邏輯回歸模型的文章就介紹完了。如果您還想了解更多內(nèi)容,請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!