Pearson相關(guān)性分析 & plot繪圖(相關(guān)性系數(shù)柱狀圖、繪制非空值數(shù)量柱狀圖)
1.Pearson相關(guān)性分析
- Pearson相關(guān)性分析是一種用于檢測(cè)兩個(gè)變量之間線性關(guān)系強(qiáng)度的統(tǒng)計(jì)方法,其結(jié)果介于-1和1之間。一個(gè)相關(guān)系數(shù)為1表示完全正相關(guān),-1表示完全負(fù)相關(guān),0則表示沒(méi)有線性關(guān)系。 Pearson相關(guān)性分析假設(shè)數(shù)據(jù)來(lái)自正態(tài)分布,并且對(duì)異常值敏感。
2.Pearson相關(guān)性分析實(shí)例
# 計(jì)算pearsonr相關(guān)系數(shù)
def calculate_pearsonr(pd):
head = pd.head().columns.values
GDM = pd["目標(biāo)變量"].tolist()
coefficient_of_association = {}
significance_level = {}
feature_cnt = {}
for feature in head:
if feature != "目標(biāo)變量":
ftc = 0
feature_values = pd[feature].tolist()
GDM_temp, feature_temp, tag = [], [], 0
for v in feature_values:
if str(v) != "nan":
ftc += 1
GDM_temp.append(GDM[tag])
feature_temp.append(v)
tag += 1
feature_cnt[feature] = ftc
if len(feature_temp) > 1:
pc = pearsonr(np.array(feature_temp), np.array(GDM_temp))
if str(pc[0]) != "nan":
ca = pc[0]
if ca < -0.0001:
ca = ca*-1
coefficient_of_association[feature] = ca
significance_level[feature] = pc[1]
elif ca > 0.0001:
coefficient_of_association[feature] = ca
significance_level[feature] = pc[1]
dp_ca = sorted(
coefficient_of_association.items(),
key=lambda x: x[1],
reverse=True)
print("pearsonr-相關(guān)系數(shù):",dp_ca)
dp_ca_Nempty=[(i[0], feature_cnt[i[0]]) for i in dp_ca]
print("非空值的數(shù)量:",dp_ca_Nempty)
return dp_ca
import matplotlib.pyplot as plt
def plot1(dp_ca):
# 將元組列表轉(zhuǎn)換為字典
dp_ca_dict = dict(dp_ca)
# 創(chuàng)建子圖
# fig, ax = plt.subplots()
fig = plt.figure(figsize=(16, 10))
ax = fig.add_subplot(1, 1, 1)
# 繪制相關(guān)性系數(shù)柱狀圖
ax.bar(dp_ca_dict.keys(), dp_ca_dict.values())
ax.set_title('Correlation between Feature and 目標(biāo)變量')
ax.set_xlabel('Features')
ax.set_ylabel('Correlation Coefficient')
# 調(diào)整布局并顯示圖形
plt.xticks(rotation=45,ha='right') ## # 將x軸標(biāo)簽旋轉(zhuǎn)45度,并以最后一個(gè)字符為旋轉(zhuǎn)中心
# 設(shè)置x軸刻度標(biāo)簽字體大小為8
ax.tick_params(axis='x', labelsize=10)
plt.tight_layout()
plt.savefig("./Pearson.jpeg")
plt.show()
if __name__ == '__main__':
file = pd.read_excel("./filename.xlsx")
dp_ca=calculate_pearsonr(file)
plot1(dp_ca)
文章來(lái)源:http://www.zghlxwxcb.cn/news/detail-707803.html
3.plot繪圖(相關(guān)性系數(shù)柱狀圖、繪制非空值數(shù)量柱狀圖)
import matplotlib.pyplot as plt
# 獲取數(shù)據(jù)
dp_ca = [('feature1', 0.8), ('feature2', 0.6), ('feature3', 0.4),('feature4', 0.77), ('feature5', 0.2), ('feature6', 0.4)]
dp_ca_Nempty = [('feature1', 100), ('feature3', 50), ('feature2', 20),('feature4', 70), ('feature5', 10), ('feature6', 26)]
# 將元組列表轉(zhuǎn)換為字典
dp_ca_dict = dict(dp_ca)
dp_ca_Nempty_dict = dict(dp_ca_Nempty)
# 創(chuàng)建子圖
fig, axs = plt.subplots(1, 2, figsize=(10, 5))
# 繪制相關(guān)性系數(shù)柱狀圖
axs[0].bar(dp_ca_dict.keys(), dp_ca_dict.values())
axs[0].set_title('Pearson correlation coefficients')
axs[0].set_xlabel('Features')
axs[0].set_ylabel('Correlation coefficient')
# 繪制非空值數(shù)量柱狀圖
axs[1].bar(dp_ca_Nempty_dict.keys(), dp_ca_Nempty_dict.values())
axs[1].set_title('Number of non-empty values')
axs[1].set_xlabel('Features')
axs[1].set_ylabel('Count')
# 調(diào)整布局并顯示圖形
plt.xticks(rotation=45,ha='right') ## # 將x軸標(biāo)簽旋轉(zhuǎn)45度,并以最后一個(gè)字符為旋轉(zhuǎn)中心
# 設(shè)置x軸刻度標(biāo)簽字體大小為10
axs[0].tick_params(axis='x', labelsize=10)
axs[1].tick_params(axis='x', labelsize=10)
# 調(diào)整布局并顯示圖形
plt.tight_layout()
plt.show()
文章來(lái)源地址http://www.zghlxwxcb.cn/news/detail-707803.html
到了這里,關(guān)于Pearson相關(guān)性分析& plot繪圖(相關(guān)性系數(shù)柱狀圖、繪制非空值數(shù)量柱狀圖)的文章就介紹完了。如果您還想了解更多內(nèi)容,請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!