国产无码综合区,色欲AV无码国产永久播放,无码天堂亚洲国产AV,国产日韩欧美女同一区二区

<ul id="cgjd6"><label id="cgjd6"><kbd id="cgjd6"></kbd></label></ul>

<menuitem id="cgjd6"></menuitem>

python數(shù)據(jù)分析學(xué)習(xí)筆記之matplotlib、numpy、pandas

2年前作者：cx-young分類：Toy博客閱讀(25)違法舉報(bào)

這篇具有很好參考價(jià)值的文章主要介紹了python數(shù)據(jù)分析學(xué)習(xí)筆記之matplotlib、numpy、pandas。希望對(duì)大家有所幫助。如果存在錯(cuò)誤或未考慮完全的地方，請(qǐng)大家不吝賜教，您也可以點(diǎn)擊"舉報(bào)違法"按鈕提交疑問。

為了學(xué)習(xí)機(jī)器學(xué)習(xí)，在此先學(xué)習(xí)以下數(shù)據(jù)分析的matplotlib，numpy，pandas，主要是為自己的學(xué)習(xí)做個(gè)記錄，如有不會(huì)的可以隨時(shí)查閱。希望大家可以一起學(xué)習(xí)共同進(jìn)步，我們最終都可以說：功不唐捐，玉汝于成。就算遇到困難也不要?dú)怵H，大聲說：我不怕，我敏而好學(xué)！！

數(shù)據(jù)分析

把大量的數(shù)據(jù)進(jìn)行統(tǒng)計(jì)和整理，得出結(jié)論，為后續(xù)的決策提供數(shù)據(jù)支持

matplotlib

1.什么是matplotlib
2.matplotlib基本要點(diǎn)
3.matolotlib的散點(diǎn)圖、直方圖、柱狀圖
4.更多的畫圖工具

為什么要學(xué)習(xí)matplotlib

能將數(shù)據(jù)進(jìn)行可視化，更直觀的呈現(xiàn)
使數(shù)據(jù)更加客觀、更具說服力

什么是matplotlib

最流行的python底層繪圖庫，主要做數(shù)據(jù)可視化圖表，名字取材于MATLAB，模仿MATLAB構(gòu)建
matplotlib可以繪制折線圖、散點(diǎn)圖、柱狀圖、直方圖、箱線圖、餅圖等。

折線圖

以折線的上升或下降來表示統(tǒng)計(jì)數(shù)量的增減變化的統(tǒng)計(jì)圖
特點(diǎn)：能夠顯示數(shù)據(jù)的變化趨勢(shì)，反映事務(wù)的變化情況（變化）
python數(shù)據(jù)分析學(xué)習(xí)筆記之matplotlib、numpy、pandas

直方圖

由一系列高度不等的縱向條紋或線段表示數(shù)據(jù)分布的情況，一般用橫軸表示數(shù)據(jù)范圍，縱軸表示分布情況
特點(diǎn)：繪制連續(xù)性的數(shù)據(jù)，展示一組或多組數(shù)據(jù)的分布狀況（統(tǒng)計(jì)）
python數(shù)據(jù)分析學(xué)習(xí)筆記之matplotlib、numpy、pandas

條形圖

排列在工作表的列或行中的數(shù)據(jù)可以繪制到條形圖中
特點(diǎn)：繪制離散的數(shù)據(jù)，能夠一眼看出各個(gè)數(shù)據(jù)的大小，比較數(shù)據(jù)之間的差別（統(tǒng)計(jì)）
python數(shù)據(jù)分析學(xué)習(xí)筆記之matplotlib、numpy、pandas

散點(diǎn)圖

用兩組數(shù)據(jù)構(gòu)成多個(gè)坐標(biāo)點(diǎn)，考慮坐標(biāo)點(diǎn)的分布，判斷兩變量之間是否存在某種關(guān)聯(lián)或總結(jié)坐標(biāo)點(diǎn)的分布模式
特點(diǎn)：
判斷變量之間是否存在數(shù)量關(guān)聯(lián)趨勢(shì)，展示離群點(diǎn)（分布規(guī)律）
python數(shù)據(jù)分析學(xué)習(xí)筆記之matplotlib、numpy、pandas

matplotlib基本要點(diǎn)

python數(shù)據(jù)分析學(xué)習(xí)筆記之matplotlib、numpy、pandas
那么上面的每一個(gè)紅色的點(diǎn)是什么呢？
每個(gè)紅色的點(diǎn)是坐標(biāo)，把5個(gè)點(diǎn)的坐標(biāo)連接成一條線，組成了一個(gè)折線圖。

演示matplotlib簡(jiǎn)單的使用

假設(shè)一天中每隔兩個(gè)小時(shí)(range(2,26,2))的氣溫(℃)分別是
[15,13,14,5,17,20,25,26,26,27,22,18,15]

'''
假設(shè)一天中每隔兩個(gè)小時(shí)(range(2,26,2))的氣溫(℃)分別是
[15,13,14,5,17,20,25,26,26,27,22,18,15]
'''
from matplotlib import pyplot as plt#導(dǎo)入pyplot
#數(shù)據(jù)在x軸的位置，是一個(gè)可迭代對(duì)象
x = range(2,26,2)
#數(shù)據(jù)在y軸的位置，是一個(gè)可迭代對(duì)象
y = [15,13,14.5,17,20,25,26,26,27,22,18,15]
'''x軸和y軸的數(shù)據(jù)一起組成了所有要繪制的坐標(biāo)
分別是(2,15),(4,13),(6,14.5),(8,17)......'''
#傳入x和y，通過plot繪制出折線圖
plt.plot(x,y)
plt.show()#在執(zhí)行程序的時(shí)候展示圖形

運(yùn)行結(jié)果： python數(shù)據(jù)分析學(xué)習(xí)筆記之matplotlib、numpy、pandas

案例存在以下幾個(gè)問題

1.設(shè)置圖片大小(想要一個(gè)高清無碼大圖)
2.保存到本地
3.描述信息，比如x軸和y軸表示什么，這個(gè)圖表示什么
4.調(diào)整x或y的刻度的間距
5.線條的樣式(比如顏色，透明度等)
6.標(biāo)記出特殊的點(diǎn)(比如告訴別人最高點(diǎn)和最低點(diǎn)在哪里)
7.給圖片添加一個(gè)水印(防偽，防止盜用)

'''
假設(shè)一天中每隔兩個(gè)小時(shí)(range(2,26,2))的氣溫(℃)分別是
[15,13,14,5,17,20,25,26,26,27,22,18,15]
'''
from matplotlib import pyplot as plt#導(dǎo)入pyplot
### 設(shè)置圖片大小
'''設(shè)置圖片大小
figure圖形圖標(biāo)的意思，在這里指的就是我們畫的圖
通過實(shí)例化一個(gè)figure并且傳遞參數(shù)，能夠在后臺(tái)自動(dòng)使用該figure實(shí)例
在圖像模糊時(shí)，可以傳入dpi參數(shù)，讓圖片更清晰
'''
fig = plt.figure(figsize=(20,8),dpi=80)
#數(shù)據(jù)在x軸的位置，是一個(gè)可迭代對(duì)象
x = range(2,26,2)
#數(shù)據(jù)在y軸的位置，是一個(gè)可迭代對(duì)象
y = [15,13,14.5,17,20,25,26,26,27,22,18,15]
'''
x軸和y軸的數(shù)據(jù)一起組成了所有要繪制的坐標(biāo)
分別是(2,15),(4,13),(6,14.5),(8,17)......
'''
#傳入x和y，通過plot繪制出折線圖
plt.plot(x,y)
### 保存圖片，可以保存svg這種矢量圖格式，放大不會(huì)有鋸齒
# plt.savefig('./t1.png')

###設(shè)置x或y軸的刻度
# plt.xticks(x)
_xtick_labels = [i/2 for i in range(4,49)]
# plt.xticks(_xtick_labels)
# plt.xticks(_xtick_labels[::3])#當(dāng)刻度太密集時(shí)，使用列表的步長(zhǎng)(間隔取值)來解決，matplotlib會(huì)自動(dòng)幫我們對(duì)應(yīng)
plt.xticks(range(25,50))#設(shè)置x的刻度
plt.yticks(range(min(y),max(y)+1))
plt.show()#在執(zhí)行程序的時(shí)候展示圖形

那么問題來了：
如果列表a表示10點(diǎn)到12點(diǎn)的每一分鐘的氣溫，如何繪制折線圖觀察每分鐘氣溫的變化情況？

from matplotlib import pyplot as plt
import random

x = range(0,120)
y = [random.randint(20,35) for i in range(120)]
plt.figure(figsize=(20,8),dpi=80)
plt.plot(x,y)

plt.show()

運(yùn)行結(jié)果：
python數(shù)據(jù)分析學(xué)習(xí)筆記之matplotlib、numpy、pandas

根據(jù)每分鐘氣溫變化繪制折線圖

from matplotlib import pyplot as plt, font_manager
import random
import matplotlib
#windows和linux設(shè)置字體方法
my_font = {'family' : 'FangSong',
          'weight' : 'bold',
          'size'   : '16'}
# plt.rc( 'font' , ** font)        # 步驟一(設(shè)置字體的更多屬性)
# plt.rc( 'axes' , unicode_minus = False ) # 步驟二(解決坐標(biāo)軸負(fù)數(shù)的負(fù)號(hào)顯示問題)
matplotlib.rc('font',** my_font)

x = range(0,120)
y = [random.randint(20,35) for i in range(120)]
plt.figure(figsize=(20,8),dpi=80)
plt.plot(x,y)

#調(diào)整x的刻度
_xtick_labels = ["10點(diǎn){}分".format(i) for i in range(60)]
_xtick_labels += ["11點(diǎn){}分".format(i) for i in range(60)]
#取步長(zhǎng)，數(shù)字和字符串一一對(duì)應(yīng)，數(shù)據(jù)的長(zhǎng)度一樣 rotation旋轉(zhuǎn)的度數(shù)
plt.xticks(list(x)[::3],_xtick_labels[::3],rotation=45)
plt.show()

運(yùn)行結(jié)果：
python數(shù)據(jù)分析學(xué)習(xí)筆記之matplotlib、numpy、pandas

在上題基礎(chǔ)上添加描述信息

from matplotlib import pyplot as plt, font_manager
import random
import matplotlib
#windows和linux設(shè)置字體方法
my_font = {'family' : 'FangSong',
          'weight' : 'bold',
          'size'   : '16'}
# plt.rc( 'font' , ** font)        # 步驟一(設(shè)置字體的更多屬性)
# plt.rc( 'axes' , unicode_minus = False ) # 步驟二(解決坐標(biāo)軸負(fù)數(shù)的負(fù)號(hào)顯示問題)
matplotlib.rc('font',** my_font)

x = range(0,120)
y = [random.randint(20,35) for i in range(120)]
plt.figure(figsize=(20,8),dpi=80)
plt.plot(x,y)

#調(diào)整x的刻度
_xtick_labels = ["10:0{}".format(i) for i in range(10)]
_xtick_labels += ["10:{}".format(i) for i in range(10,60)]
_xtick_labels += ["11:0{}".format(i) for i in range(10)]
_xtick_labels += ["11:{}".format(i) for i in range(10,60)]
#取步長(zhǎng)，數(shù)字和字符串一一對(duì)應(yīng)，數(shù)據(jù)的長(zhǎng)度一樣 rotation旋轉(zhuǎn)的度數(shù)
plt.xticks(list(x)[::3],_xtick_labels[::3],rotation=45)

#添加描述信息
plt.xlabel("時(shí)間")
plt.ylabel("溫度 單位(℃)")
plt.title("10點(diǎn)到12點(diǎn)每分鐘的氣溫變化情況")

plt.show()

運(yùn)行結(jié)果：
python數(shù)據(jù)分析學(xué)習(xí)筆記之matplotlib、numpy、pandas

案例繪制11到30歲看書數(shù)量折線圖

from matplotlib import pyplot as plt
import matplotlib
'''
假設(shè)小明在30歲的時(shí)候，根據(jù)自己的實(shí)際情況，統(tǒng)計(jì)出
從1到30歲每年看過的書籍?dāng)?shù)量，請(qǐng)繪制折線圖，
以便分析自己每年所看書籍?dāng)?shù)量走勢(shì)
x軸表示歲數(shù)
y軸表示個(gè)數(shù)
'''
#windows和linux設(shè)置字體方法
my_font = {'family' : 'FangSong',
          'weight' : 'bold',
          'size'   : '16'}
# plt.rc( 'font' , ** font)        # 步驟一(設(shè)置字體的更多屬性)
# plt.rc( 'axes' , unicode_minus = False ) # 步驟二(解決坐標(biāo)軸負(fù)數(shù)的負(fù)號(hào)顯示問題)
matplotlib.rc('font',** my_font)

y = [1,0,1,1,2,4,3,2,3,4,4,5,6,5,4,3,3,1,1,1]
x = range(11,31)

#設(shè)置圖形大小
plt.figure(figsize=(20,8),dpi=80)

plt.plot(x,y)

#設(shè)置x軸刻度
_xtick_labels = ["{}歲".format(i) for i in x]
plt.xticks(x,_xtick_labels)
plt.yticks(range(0,9))

#繪制網(wǎng)格
plt.grid(alpha=0.4)

#添加描述信息
plt.xlabel("年齡")
plt.ylabel("書本數(shù)量")
plt.title("每年所看書籍?dāng)?shù)量走勢(shì)")

#展示
plt.show()

運(yùn)行結(jié)果：
python數(shù)據(jù)分析學(xué)習(xí)筆記之matplotlib、numpy、pandas

案例繪制自己和同桌兩人的看書數(shù)量折線圖

from matplotlib import pyplot as plt
import matplotlib
'''
假設(shè)小明在30歲的時(shí)候，根據(jù)自己的實(shí)際情況，統(tǒng)計(jì)出
從1到30歲每年看過的書籍?dāng)?shù)量，請(qǐng)繪制折線圖，
以便分析自己每年所看書籍?dāng)?shù)量走勢(shì)
x軸表示歲數(shù)
y軸表示個(gè)數(shù)
'''
#windows和linux設(shè)置字體方法
my_font = {'family' : 'FangSong',
          'weight' : 'bold',
          'size'   : '16'}
# plt.rc( 'font' , ** font)        # 步驟一(設(shè)置字體的更多屬性)
# plt.rc( 'axes' , unicode_minus = False ) # 步驟二(解決坐標(biāo)軸負(fù)數(shù)的負(fù)號(hào)顯示問題)
matplotlib.rc('font',** my_font)

y_1 = [1,0,1,1,2,4,3,2,3,4,4,5,6,5,4,3,3,1,1,1]
y_2 = [1,0,3,1,2,2,3,3,2,1,2,1,1,1,1,1,1,1,1,1]
x = range(11,31)

#設(shè)置圖形大小
plt.figure(figsize=(20,8),dpi=80)
#通過label指定顯示的圖例內(nèi)容
plt.plot(x,y_1,label="自己",color='orange',linestyle=':')
plt.plot(x,y_2,label="同桌",color='cyan',linestyle='--')

#設(shè)置x軸刻度
_xtick_labels = ["{}歲".format(i) for i in x]
plt.xticks(x,_xtick_labels)
plt.yticks(range(0,9))

#繪制網(wǎng)格
plt.grid(alpha=0.4)

#添加圖例
#通過prop指定圖例的字體
#通過loc指定圖例的位置，默認(rèn)右上角
plt.legend(prop=my_font,loc='upper left')
#添加描述信息
plt.xlabel("年齡")
plt.ylabel("書本數(shù)量")
plt.title("每年所看書籍?dāng)?shù)量走勢(shì)")
#展示
plt.show()

運(yùn)行結(jié)果：
python數(shù)據(jù)分析學(xué)習(xí)筆記之matplotlib、numpy、pandas

自定義繪制圖形風(fēng)格

plt.plot(
x,#x
y,#y
#在繪制的時(shí)候指定即可
color='r',#線條顏色 r紅色，g綠色，b藍(lán)色，w白色，y黃色
linestyle='--',#線條風(fēng)格 -實(shí)線 --虛線 -.點(diǎn)畫線 :點(diǎn)虛線
linewidth=5,#線條粗細(xì)
alpha=0.5#透明度
)

matplotlib繪制散點(diǎn)圖

假設(shè)通過爬蟲你獲取了某地3
月份，10月份每天白天的最高氣溫，那么此時(shí)繪制出它的散點(diǎn)圖
y_3 = [11,17,16,11,12,11,12,6,6,7,8,9,12,15,14,17,18,21,16,17,20,14,15,15,15,19,21,22,22,22,23]
y_10 = [26,26,28,19,21,17,16,19,18,20,20,19,22,23,17,20,21,20,22,15,11,15,5,13,17,10,11,13,12,13,6]

#繪制散點(diǎn)圖
from matplotlib import pyplot as plt
import matplotlib
#windows和linux設(shè)置字體方法
my_font = {'family' : 'FangSong',
          'weight' : 'bold',
          'size'   : '16'}
# plt.rc( 'font' , ** font)        # 步驟一(設(shè)置字體的更多屬性)
# plt.rc( 'axes' , unicode_minus = False ) # 步驟二(解決坐標(biāo)軸負(fù)數(shù)的負(fù)號(hào)顯示問題)
matplotlib.rc('font',** my_font)

y_3 = [11,17,16,11,12,11,12,6,6,7,8,9,12,15,14,17,18,21,16,17,20,14,15,15,15,19,21,22,22,22,23]
y_10 = [26,26,28,19,21,17,16,19,18,20,20,19,22,23,17,20,21,20,22,15,11,15,5,13,17,10,11,13,12,13,6]

x_3 = range(1,32)
x_10 = range(51,82)

#設(shè)置圖形大小
plt.figure(figsize=(20,8),dpi=80)
#使用scatter繪制散點(diǎn)圖，和之前繪制折線圖的唯一區(qū)別
plt.scatter(x_3,y_3,label='3月份')
plt.scatter(x_10,y_10,label='10月份')

#調(diào)整x軸的刻度
_x = list(x_3) + list(x_10)
_xtick_labels = ['3月{}日'.format(i) for i in x_3]
_xtick_labels += ['10月{}日'.format(i-50) for i in x_10]
plt.xticks(_x[::3],_xtick_labels[::3],rotation=45)

#添加圖例
plt.legend(loc = 'upper left')

#添加描述信息
plt.xlabel('時(shí)間')
plt.ylabel('溫度')
plt.show()

運(yùn)行結(jié)果：
python數(shù)據(jù)分析學(xué)習(xí)筆記之matplotlib、numpy、pandas

散點(diǎn)圖應(yīng)用場(chǎng)景

不同條件(維度)之間的內(nèi)在關(guān)聯(lián)關(guān)系
觀察數(shù)據(jù)的離散聚合程度

繪制條形圖

假設(shè)你獲得了電影以及其對(duì)應(yīng)的票房。

繪制豎條形圖

#繪制條形圖
from matplotlib import pyplot as plt
import matplotlib

#windows和linux設(shè)置字體方法
my_font = {'family' : 'FangSong',
          'weight' : 'bold',
          'size'   : '16'}
# plt.rc( 'font' , ** font)        # 步驟一(設(shè)置字體的更多屬性)
# plt.rc( 'axes' , unicode_minus = False ) # 步驟二(解決坐標(biāo)軸負(fù)數(shù)的負(fù)號(hào)顯示問題)
matplotlib.rc('font',** my_font)
a = ['電影1','電影2','電影3','電影4','電影5','電影6','電影7','電影8','電影9']
b = [56,26,17,16,12,11,10,9,8]
#設(shè)置圖片大小
plt.figure(figsize=(20,15),dpi=80)
#繪制條形圖 豎著的
plt.bar(range(len(a)),b,width=0.3)
#設(shè)置字符串到x軸
plt.xticks(range(len(a)),a,rotation=90)
plt.xlabel('電影名稱')
plt.ylabel('票房')
plt.show()

運(yùn)行結(jié)果：
python數(shù)據(jù)分析學(xué)習(xí)筆記之matplotlib、numpy、pandas

繪制橫條形圖

#繪制條形圖
from matplotlib import pyplot as plt
import matplotlib

#windows和linux設(shè)置字體方法
my_font = {'family' : 'FangSong',
          'weight' : 'bold',
          'size'   : '16'}
# plt.rc( 'font' , ** font)        # 步驟一(設(shè)置字體的更多屬性)
# plt.rc( 'axes' , unicode_minus = False ) # 步驟二(解決坐標(biāo)軸負(fù)數(shù)的負(fù)號(hào)顯示問題)
matplotlib.rc('font',** my_font)
a = ['電影1','電影2','電影3','電影4','電影5','電影6','電影7','電影8','電影9']
b = [56,26,17,16,12,11,10,9,8]
#設(shè)置圖片大小
plt.figure(figsize=(20,15),dpi=80)
#繪制條形圖 豎著的
plt.barh(range(len(a)),b,height=0.3,color='orange')
#設(shè)置字符串到x軸
plt.yticks(range(len(a)),a)
plt.grid(alpha=0.3)
plt.ylabel('電影名稱')
plt.xlabel('票房')
plt.show()

運(yùn)行結(jié)果：
python數(shù)據(jù)分析學(xué)習(xí)筆記之matplotlib、numpy、pandas

繪制三天數(shù)據(jù)條形圖

假設(shè)你知道了列表a中電影分別在2017-9-14(b_14),2017-9-15(b_15),2017-9-16(b_16)三天的票房，為了展示列表中電影本身的票房以及其它電影的數(shù)據(jù)對(duì)比情況，應(yīng)該如何更加直觀地呈現(xiàn)該數(shù)據(jù)
a=[“猩球崛起”,“敦刻爾克”,“蜘蛛俠”,“戰(zhàn)狼2”]
b_16 = [15746,312,4497,319]
b_15=[12357,156,2045,168]
b_14=[2358,399,2358,362]

條形圖應(yīng)用場(chǎng)景

數(shù)量統(tǒng)計(jì)
頻率統(tǒng)計(jì)(市場(chǎng)飽和度)

繪制直方圖

import matplotlib.pyplot as plt
import numpy as np

lst=[]
for _ in range(250):
    a = np.random.randint(80,160)
    lst.append(a)#生成數(shù)據(jù)

#計(jì)算組數(shù)
d = 5
num_bins = (max(lst) - min(lst))//d
print(lst)
plt.hist(lst,num_bins)
#設(shè)置x軸的刻度
plt.xticks(range(min(lst),max(lst)+d,d))
plt.grid()
plt.show()

python數(shù)據(jù)分析學(xué)習(xí)筆記之matplotlib、numpy、pandas

直方圖應(yīng)用場(chǎng)景

用戶的年齡分布狀態(tài)
一段時(shí)間內(nèi)用戶點(diǎn)擊次數(shù)的分布狀態(tài)
用戶活躍時(shí)間的分布狀態(tài)
python數(shù)據(jù)分析學(xué)習(xí)筆記之matplotlib、numpy、pandas

matplotlib淺淺總結(jié)

matplotlib
	plt.plot繪制折線圖	from matplotlib import pyplot as plt
	設(shè)置圖形大小和分辨率	plt.figure(figsize=(20,8),dpi=80)
	繪圖	plt.plot(x,y)	x(y):所有的坐標(biāo)的x(y)值
	調(diào)整x(y)軸的刻度	plt.xticks()
		調(diào)整間距：	傳一個(gè)參數(shù)(包含數(shù)字的可迭代對(duì)象)，步長(zhǎng)合適即可
		添加字符串到x(y)軸:	傳入兩個(gè)參數(shù)，分別是兩個(gè)可迭代對(duì)象，數(shù)字和字符串最終會(huì)一一對(duì)應(yīng)，只顯示字符串
	展示	plt.show()
	圖片保存	plt.savefig(“file_path”)
	顯示中文	matplotlib.rc	my_font = {‘family’ : ‘FangSong’, ‘weight’ : ‘bold’, ‘size’ : ‘16’} matplotlib.rc(‘font’,** my_font)
	font_manager	from matplotlib import font_manager
		my_font=font_manager.FontProperties(fname=“”)
	一個(gè)圖中繪制多個(gè)圖形	plt.plot()調(diào)用多次	plt.plot(x,y_1,label=“自己”,color=‘orange’,linestyle=‘:’)
			plt.plot(x,y_2,label=“同桌”,color=‘cyan’,linestyle=‘–’)
	圖例	展示當(dāng)前這個(gè)圖形是誰
		1.plot(label=“自己”)
		2.plot.legend(loc,prop)	loc表示的是圖例的位置
	圖形的樣式	color	linestyle,linewidth
	添加圖形的描述	plt.xlabel(“添加描述”)
		plt.ylabel(“添加描述”)
		plt.title(“添加描述”)
	網(wǎng)格	plt.grid(alpha=0.4,linestyle=)

numpy

1.什么是numpy
2.numpy基礎(chǔ)
3.numpy常用方法
4.numpy常用統(tǒng)計(jì)方法

為什么學(xué)習(xí)numpy

1.快速
2.方便
3.科學(xué)計(jì)算的基礎(chǔ)庫

對(duì)同樣的數(shù)據(jù)計(jì)算任務(wù)，使用Numpy比直接使用python代碼實(shí)現(xiàn)，優(yōu)點(diǎn)
代碼更簡(jiǎn)潔：Numpy直接以數(shù)組、矩陣為粒度計(jì)算并且支撐大量的數(shù)字函數(shù)，而python需要for循環(huán)從底層實(shí)現(xiàn)
性能更高效：Numpy的數(shù)組存儲(chǔ)效率和輸入輸出計(jì)算性能，比Python使用list或者嵌套list好很多
注：Numpy的數(shù)據(jù)存儲(chǔ)和Python原生的list是不一樣的
注：Numpy的大部分代碼都是C語言實(shí)現(xiàn)的，這是Numpy比純Python代碼高效的原因
Numpy是Python各種數(shù)據(jù)科學(xué)類庫的基礎(chǔ)庫
比如：Scipy、Scikit-Learn、TensorFlow、pandas等

什么是numpy

一個(gè)在python中做科學(xué)計(jì)算的基礎(chǔ)庫，重在數(shù)值計(jì)算，也是大部分python科學(xué)計(jì)算庫的急促庫，多用于大型、多多維數(shù)組上執(zhí)行數(shù)值運(yùn)算

Numerical Python
一個(gè)開源的python科學(xué)計(jì)算庫
使用Numpy可以方便地使用數(shù)組、矩陣進(jìn)行計(jì)算
包含線性代數(shù)、傅里葉變換、隨機(jī)數(shù)生成等大量函數(shù)

Numpy下載與安裝

在Windows系統(tǒng)下安裝Numpy有兩種常用方式
1.使用Python包管理器pip來安裝numpy，是一種最簡(jiǎn)單、最輕量級(jí)的方法。只需要執(zhí)行以下命令即可

		pip install numpy

2.使用anaconda(官網(wǎng)下載：https://www.anaconda.com/)是一個(gè)開源的python發(fā)行版，應(yīng)用較為廣泛。

numpy ndarray對(duì)象

numpy定義了一個(gè)n維數(shù)組對(duì)象，簡(jiǎn)稱ndarray對(duì)象，它是一個(gè)一系列相同類型元素組成的數(shù)組集合。數(shù)組中的每個(gè)元素都占有大小相同的內(nèi)存塊。
ndarray對(duì)象采用了數(shù)組的索引機(jī)制，將數(shù)組中的每個(gè)元素映射到內(nèi)存塊上，并且按照一定的布局對(duì)內(nèi)存塊進(jìn)行排列(行或列)

numpy創(chuàng)建數(shù)組(矩陣)

numpy.array(object,dtype = None,copy = True,order = None,subok = False,ndmin = 0)

參數(shù)

序號(hào)	參數(shù)	描述說明
1	object	表示一個(gè)數(shù)組序列
2	dtype	可選參數(shù)，通過它可以更改數(shù)組的數(shù)據(jù)類型
3	copy	可選參數(shù)，當(dāng)數(shù)據(jù)源是ndarray時(shí)表示數(shù)組能否被復(fù)制，默認(rèn)時(shí)True
4	order	可選參數(shù)，以哪種內(nèi)存布局創(chuàng)建數(shù)組，有3個(gè)可選值，分別是C(行序列)/F(列序列)/A(默認(rèn))
5	ndmin	可選參數(shù)用于指定數(shù)組的維度
6	subok	可選參數(shù)，類型為bool值，默認(rèn)False。為True：使用object的內(nèi)部數(shù)據(jù)類型；False：使用object數(shù)組的數(shù)據(jù)類型

import random
import numpy as np

# 使用numpy生成數(shù)組，得到ndarray的類型
t1 = np.array([1, 2, 3])
print(t1, type(t1))

t2 = np.array(range(10))
print(t2, type(t2))

t3 = np.arange(4, 10, 2)
print(t3, type(t3), t3.dtype)

# 調(diào)整數(shù)據(jù)類型
t4 = t3.astype(int)
print(t4, t4.dtype)

# numpy中的小數(shù)
t5 = np.array([random.random() for i in range(10)])
print(t5, t5.dtype)
print('------------')
# 取兩位小數(shù)
t8 = np.round(t5, 2)
print(t8)

運(yùn)行結(jié)果：

[1 2 3] <class 'numpy.ndarray'>
[0 1 2 3 4 5 6 7 8 9] <class 'numpy.ndarray'>
[4 6 8] <class 'numpy.ndarray'> int32
[4 6 8] int32
[0.15005218 0.04573021 0.16078498 0.81148836 0.69045563 0.50318601
 0.04133977 0.04835085 0.04299551 0.79446533] float64
------------
[0.15 0.05 0.16 0.81 0.69 0.5  0.04 0.05 0.04 0.79]

數(shù)組的形狀與修改

# 數(shù)組的形狀
import numpy as np

t1 = np.arange(12)
# 查看數(shù)組的形狀 x.shape
print(t1, 't1.shape',t1.shape)

print('*' * 15)
t2 = np.array([[1, 2, 3], [4, 5, 6]])
print(t2,'t2.shape',t2.shape)
print()
# 修改數(shù)組的形狀 x.reshape
t1 = t1.reshape(3, 4)#.reshape有返回值，不會(huì)對(duì)本身t1影響進(jìn)行改變
print('t1.reshape(3, 4)',t1)#若t1=None，原地操作，對(duì)數(shù)據(jù)本身進(jìn)行修改，沒有返回值

#轉(zhuǎn)成一維數(shù)組
t1 = t1.flatten()
print(t1)

運(yùn)行結(jié)果：

[ 0  1  2  3  4  5  6  7  8  9 10 11] t1.shape (12,)
***************
[[1 2 3]
 [4 5 6]] t2.shape (2, 3)

t1.reshape(3, 4) [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
[ 0  1  2  3  4  5  6  7  8  9 10 11]

軸(axis)

在numpy中可以理解為方向，使用0，1，2…數(shù)字表示，對(duì)于一個(gè)一維數(shù)組，只有一個(gè)0軸，對(duì)于二維數(shù)組(shape(2,2))，有0軸和1軸，對(duì)于三維數(shù)組(shape(2,2,3)),有0,1,2軸

有了軸的概念之后，計(jì)算會(huì)更加方便，比如計(jì)算一個(gè)2維數(shù)組的平均值，必須指定是計(jì)算哪個(gè)方向上面的數(shù)字的平均值

在前面的知識(shí)里，請(qǐng)問軸在哪里呢？
回顧np.arange(0,10).reshape(2,5)，reshape中2表示0軸長(zhǎng)度(包含數(shù)據(jù)的條數(shù))為2，1軸長(zhǎng)度為5，2x5一個(gè)10個(gè)數(shù)據(jù)。

numpy讀取數(shù)據(jù)

CSV:Comma-Separated Value，逗號(hào)分隔值文件
顯示:表格狀態(tài)
源文件:換行和逗號(hào)分割行列的格式化文本，每一行的數(shù)據(jù)表示一條記錄

由于CSV便于展示讀取和寫入，所以很多地方也是用CSV的格式存儲(chǔ)和傳輸中小型的數(shù)據(jù)。

np.loadtxt(frame,dtype=np.float,delimiter=',',skiprows=0,usecols=None,unpack=False)

參數(shù)解釋
frame	文件、字符串或產(chǎn)生器，可以是.gz或bz2壓縮文件
dtype	數(shù)據(jù)類型，可選，CSV的字符串以什么數(shù)據(jù)類型讀入數(shù)組中，默認(rèn)np.float
delimiter	分割字符串，默認(rèn)是任何空格，改為逗號(hào)
skiprows	跳過前x行，一般跳過第一行表頭
usecols	讀取指定的列，索引，元組類型
unpack	如果True，讀入屬性將分別寫入不同數(shù)組變量，相當(dāng)于轉(zhuǎn)置的效果；False讀入數(shù)據(jù)只寫入一個(gè)數(shù)組變量，默認(rèn)False

代碼演示：

#這個(gè)是自己胡亂寫的一個(gè).csv文件
143,456,789,100
1,2,3,5
4,111,124,556

代碼

import numpy as np
us_file_path ="file.csv"

t1 = np.loadtxt(us_file_path,delimiter=',',dtype='int',unpack=True)
t2 = np.loadtxt(us_file_path,delimiter=',',dtype='int')

print(t1)
print('*'*18)
print(t2)

運(yùn)行結(jié)果：

[[143   1   4]
 [456   2 111]
 [789   3 124]
 [100   5 556]]
******************
[[143 456 789 100]
 [  1   2   3   5]
 [  4 111 124 556]]

numpy中的轉(zhuǎn)置

轉(zhuǎn)置是一種變換，對(duì)于numpy中的數(shù)組來說，就是在對(duì)角線方向交換數(shù)據(jù)，目的也是為了更方便的去處理數(shù)據(jù)。以下代碼演示的三種方法都可以實(shí)現(xiàn)二維數(shù)組的轉(zhuǎn)置效果，轉(zhuǎn)置和交換軸的效果一樣。
代碼：

# numpy中的轉(zhuǎn)置
import numpy as np

t1 = np.arange(8).reshape(2,4)
print('轉(zhuǎn)置前：\n', t1)

t2 = t1.transpose()
print('方法1轉(zhuǎn)置后：\n',t2)

t3 = t1.swapaxes(1,0)
print('方法2轉(zhuǎn)置后：\n',t3)

t4 = t1.T
print('方法3轉(zhuǎn)置后：\n',t4)

運(yùn)行結(jié)果：

轉(zhuǎn)置前：
 [[0 1 2 3]
 [4 5 6 7]]
方法1轉(zhuǎn)置后：
 [[0 4]
 [1 5]
 [2 6]
 [3 7]]
方法2轉(zhuǎn)置后：
 [[0 4]
 [1 5]
 [2 6]
 [3 7]]
方法3轉(zhuǎn)置后：
 [[0 4]
 [1 5]
 [2 6]
 [3 7]]

numpy索引和切片

對(duì)于剛剛加載出來的數(shù)據(jù)，若只想選擇其中的某一行或某一列，應(yīng)該如何操作？
和python的列表一樣
具體如代碼所示：

import numpy as np

t1 = np.arange(20).reshape(4, 5)
print('輸出原t1\n', t1)

print('取一行:\n', t1[2])

print('取連續(xù)多行:\n', t1[1:])

print('取不連續(xù)多行:\n', t1[[0, 2]]) 
print()
# 逗號(hào)前表示行，逗號(hào)后表示列
print('取一列:\n', t1[:, 0])

print('取連續(xù)的多列:\n', t1[:, 2:])

print('取不連續(xù)的多列:\n', t1[:, [0, 2, 4]])
print('取多行和多列，取第2行到4行，第2列到第4列')
print('取的是交叉點(diǎn)的位置')
print(t1[1:4,1:4])

print('取多個(gè)不相鄰的點(diǎn)')
#選出來的結(jié)果是(0,1),(2,3)
print(t1[[0,2],[1,3]])


#取第2和第4行
print(t1[[1,3],:])
#取第1和第4列
print(t1[:,[0,3]])

運(yùn)行結(jié)果：

輸出原t1
 [[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]]
取一行:
 [10 11 12 13 14]
取連續(xù)多行:
 [[ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]]
取不連續(xù)多行:
 [[ 0  1  2  3  4]
 [10 11 12 13 14]]

取一列:
 [ 0  5 10 15]
取連續(xù)的多列:
 [[ 2  3  4]
 [ 7  8  9]
 [12 13 14]
 [17 18 19]]
取不連續(xù)的多列:
 [[ 0  2  4]
 [ 5  7  9]
 [10 12 14]
 [15 17 19]]
取多行和多列，取第2行到4行，第2列到第4列
取的是交叉點(diǎn)的位置
[[ 6  7  8]
 [11 12 13]
 [16 17 18]]
取多個(gè)不相鄰的點(diǎn)
[ 1 13]

[ 1 13]
[[ 5  6  7  8  9]
 [15 16 17 18 19]]
[[ 0  3]
 [ 5  8]
 [10 13]
 [15 18]]

numpy中數(shù)值的修改

修改行列的值，很容易實(shí)現(xiàn)，若想把數(shù)組中小于10的數(shù)字替換成3呢？

import numpy as np

t1 = np.arange(20).reshape(4, 5)
print('輸出原t1\n', t1)
#輸出行列<10的bool值
print('t1<10的bool值\n',t1<10)
#將<10的數(shù)字替換為3
t1[t1<10]=3
print('將<10的數(shù)字替換為3\n',t1)
#查看值>18的
print('查看值>18的\n',t1[t1>18])
#將>18的替換為100
t1[t1>18]=100
print('將>18的替換為100后\n',t1)

t1[:,2:4]=0
print(t1)

運(yùn)行結(jié)果：

輸出原t1
 [[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]]
t1<10的bool值
 [[ True  True  True  True  True]
 [ True  True  True  True  True]
 [False False False False False]
 [False False False False False]]
將<10的數(shù)字替換為3
 [[ 3  3  3  3  3]
 [ 3  3  3  3  3]
 [10 11 12 13 14]
 [15 16 17 18 19]]
查看值>18的
 [19]
將>18的替換為100后
 [[  3   3   3   3   3]
 [  3   3   3   3   3]
 [ 10  11  12  13  14]
 [ 15  16  17  18 100]]

[[  3   3   0   0   3]
 [  3   3   0   0   3]
 [ 10  11   0   0  14]
 [ 15  16   0   0 100]]

numpy中布爾索引

若想把數(shù)組中小于10的數(shù)字替換為0，把大于10的替換為10，如何做？

import numpy as np

t1 = np.arange(20).reshape(4, 5)
print(t1)
print()
#小于10的替換為10，大于15的替換為15
t1 = t1.clip(10, 15)
print(t1)
print()
#小于10的替換為100，大于10的替換為300
t1 = np.where(t1 < 11, 100, 300)
print(t1)

運(yùn)行結(jié)果：

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]]

[[10 10 10 10 10]
 [10 10 10 10 10]
 [10 11 12 13 14]
 [15 15 15 15 15]]

[[100 100 100 100 100]
 [100 100 100 100 100]
 [100 300 300 300 300]
 [300 300 300 300 300]]

numpy中的nan和常用方法

兩個(gè)nan是不相等的

	np.nan == np.nan
	#結(jié)果是False

	np.nan != np.nan #為True

根據(jù)以上特性，判斷數(shù)組中nan的個(gè)數(shù)

	np.count_nonzero(t != t)

通過np.isnan(a)判斷一個(gè)數(shù)字是否是nan，返回bool類型。比如希望把nan替換為0

	np.isnan(t)
	t[np.isnan(t)] = 0

nan和任何值計(jì)算都為nan
以下是代碼案例

	import numpy as np

	t = np.array([1., 2., 3.])
    t[0] = np.nan
    print(t)
    print('判斷數(shù)組中nan的個(gè)數(shù)',np.count_nonzero(t != t))
    print('判斷一個(gè)數(shù)字是否是nan',np.isnan(t))
    print('根據(jù)返回bool類型，希望將nan替換為0')
    t[np.isnan(t)] = 0
    print(t)

運(yùn)行結(jié)果：

	[nan  2.  3.]
	判斷數(shù)組中nan的個(gè)數(shù) 1
	判斷一個(gè)數(shù)字是否是nan [ True False False]
	根據(jù)返回bool類型，希望將nan替換為0
	[0. 2. 3.]

案例將數(shù)組中nan替換為該列的均值

#將數(shù)組中的nan替換為該列的均值
import numpy as np

def fill_ndarray(t1):
    for i in range(t1.shape[1]):  # 遍歷每一列
        temp_col = t1[:, i]  # 當(dāng)前的一列
        #np.count_nonzero 判斷數(shù)組中nan的個(gè)數(shù)
        nan_num = np.count_nonzero(temp_col != temp_col)
        if nan_num != 0:  # 不為0，說明當(dāng)前這一列有nan
            temp_not_nan_col = temp_col[temp_col == temp_col]
            # 選中當(dāng)前為nan的位置，把值賦值為不為nan的均值
            #判斷一個(gè)數(shù)字是否為nan，通過np.isnan()來判斷，通過布爾類型，比如希望nan替換為0
            temp_col[np.isnan(temp_col)] = temp_not_nan_col.mean()
    return t1

if __name__ == '__main__':
    t1 = np.arange(12).reshape(3, 4).astype('float')
    t1[1, 2:] = np.nan
    print(t1)
    print()
    t1 = fill_ndarray(t1)
    print(t1)

運(yùn)行結(jié)果：

[[ 0.  1.  2.  3.]
 [ 4.  5. nan nan]
 [ 8.  9. 10. 11.]]

[[ 0.  1.  2.  3.]
 [ 4.  5.  6.  7.]
 [ 8.  9. 10. 11.]]

numpy中常用的統(tǒng)計(jì)函數(shù)

常用的統(tǒng)計(jì)函數(shù)
求和	t.sum(axis=None)
均值	t.mean(a,axis=None) 受離群點(diǎn)的影響較大
中值	np.median(t.axis=None)
最大值	t.max(axis=None)
最小值	t.min(axis=None)
極值	np.ptp(t,axis=None) 即最大值和最小值之差
標(biāo)準(zhǔn)差	t.std(axis=None)
	默認(rèn)返回多維數(shù)組的全部的統(tǒng)計(jì)結(jié)果，若指定axis，則返回一個(gè)當(dāng)前軸上的結(jié)果

numpy小小結(jié)

切片和索引
選擇行	t[2]
	t[3:,:]
選擇列	t[:,4:]
選擇行列	連續(xù)的多行 t[2:,:3]
	不連續(xù)的t[[1,3],[2,4]]選擇的是(1,2),(3,4)兩個(gè)位置的值
索引	t[2,3]
賦值	t[2:,3]=3
布爾索引	t[t>10]=10
三元運(yùn)算符	np.where(t>10,20,0)
	把t中大于10的替換為20，其他的替換為0
裁剪	t.clip(10,20)
	把小于10的替換為10，大于20的替換為20
轉(zhuǎn)置	t.T
	t.transpose()
	t.swapaxes(1,0)
讀取本地文件	np.loadtxt(file,path,delimiter,dtype)
nan和inf
inf	表示無窮
nan	不是一個(gè)數(shù)字
	np.nan != np.nan
	np.count_nonzero(np.nan != np.nan)
	np.isnan(t1)效果和np.nan != np.nan相同

數(shù)組的拼接

import numpy as np

t1 = np.arange(0, 12).reshape(2, 6)
t2 = np.arange(12, 24).reshape(2, 6)
#豎直拼接
t = np.vstack((t1, t2))
print('豎直拼接\n',t)
#水平拼接
t=np.hstack((t1,t2))
print('水平拼接\n',t)

運(yùn)行結(jié)果：

豎直拼接
 [[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]]
水平拼接
 [[ 0  1  2  3  4  5 12 13 14 15 16 17]
 [ 6  7  8  9 10 11 18 19 20 21 22 23]]

數(shù)組的行列交換

import numpy as np

# 數(shù)組的行列交換
t = np.arange(12, 24).reshape(3, 4)
print(t)
print('行交換')
t[[1, 2], :] = t[[2, 1], :]
print(t)

print('列交換')
t[:, [0, 2]] = t[:, [2, 0]]
print(t)

運(yùn)行結(jié)果:

[[12 13 14 15]
 [16 17 18 19]
 [20 21 22 23]]
行交換
[[12 13 14 15]
 [20 21 22 23]
 [16 17 18 19]]
列交換
[[14 13 12 15]
 [22 21 20 23]
 [18 17 16 19]]

numpy一些好用的方法

獲取最大值最小值的位置

	np.argmax(t,axis=0)
	np.argmin(t,axis=1)

創(chuàng)建一個(gè)全0的數(shù)組

	np.zeros((3,4))

創(chuàng)建一個(gè)全1的數(shù)組

	np.ones((3,4))

創(chuàng)建一個(gè)對(duì)角線為1的正方形數(shù)組(矩陣)

	np.eye(3)

numpy生成隨機(jī)數(shù)

參數(shù)	解釋
.rand(d0,d1,…,dn)	創(chuàng)建d0-dn維度的均勻分布的隨機(jī)數(shù)數(shù)組，浮點(diǎn)數(shù)，范圍從0-1
.randn(d0,d1,…,dn)	創(chuàng)建d0-dn維度的標(biāo)準(zhǔn)正態(tài)分布的隨機(jī)數(shù)，浮點(diǎn)數(shù)，平均數(shù)0，標(biāo)準(zhǔn)差1
.randint(low,high,(shape))	從給定上下范圍選取隨機(jī)數(shù)整數(shù)，范圍是low,high,形狀是shape
.uniform(low,high,(size))	產(chǎn)生具有均勻分布的數(shù)組，low起始值，high結(jié)束值，size形狀
.normal(loc,scale,(size))	從指定正態(tài)分布中隨機(jī)抽取樣本，分布中心是loc（概率分布的均值),標(biāo)準(zhǔn)差是scale,形狀是size
.seed(s)	隨機(jī)數(shù)種子，s是給定的種子值。因?yàn)橛?jì)算生成的是偽隨機(jī)數(shù)，所以通過設(shè)定相同的隨機(jī)數(shù)種子，可以每次生成相同的隨機(jī)數(shù)

pandas

為什么學(xué)習(xí)pandas

numpy能夠處理數(shù)據(jù)，可以結(jié)合matplotlib解決數(shù)據(jù)分析的問題，那么學(xué)習(xí)pandas的目的是？
numpy能夠幫助我們處理數(shù)值型數(shù)據(jù)，但很多時(shí)候，數(shù)據(jù)除了數(shù)值之外，還有字符串，時(shí)間序列等。

numpy能夠處理數(shù)值，但是pandas除了處理數(shù)值之外的(基于numpy),還能處理其它類型的數(shù)據(jù)

pandas的常用數(shù)據(jù)類型

Series 一維，帶標(biāo)簽數(shù)組
DataFrame 二維，Series容器

pandas之Series創(chuàng)建

代碼演示

import pandas as pd

#通過列表或可迭代對(duì)象創(chuàng)建Series
t = pd.Series([1, 23, 22, 2, 0], index=list('abcde'))
print(t)
# 通過字典創(chuàng)建Series，索引就是字典的鍵
print('\n通過字典創(chuàng)建:')
temp_dict = {'name': '張三', 'gender': '男', 'age': 15}
t3 = pd.Series(temp_dict)
print(t3)

print('Series切片和索引')
#切片：直接傳入start end 或者步長(zhǎng)即可
#索引：一個(gè)的時(shí)候直接傳入序號(hào)或者index，多個(gè)的時(shí)候傳入序號(hào)或index的列表
print("t3['name']:",t3['name'])
print("t3['gender']:",t3['gender'])
print("t3['age']: ",t3['age'])
print("t3[0]: ",t3[0])
print("t3[1]: ",t3[1])
print("t3[2]: ",t3[2])
print('取前兩行\(zhòng)n',t3[:2])
print('取不連續(xù)的\n',t3[[1,2]])
print('取不連續(xù)的\n',t3[['gender','age']])
#Series對(duì)象本質(zhì)由兩個(gè)數(shù)組構(gòu)成
#一個(gè)數(shù)組構(gòu)成對(duì)象的鍵(index,索引)，一個(gè)數(shù)組構(gòu)成對(duì)象的值(values)，鍵->值
print(t3.index,'---',type(t3.index))
print(t3.values,'---',type(t3.values))
# ndarray的很多方法都可以運(yùn)用于series類型，比如argmax，clip
#Series具有where方法，但結(jié)果和ndarray不同

運(yùn)行結(jié)果：

a     1
b    23
c    22
d     2
e     0
dtype: int64

通過字典創(chuàng)建:
name      張三
gender     男
age       15
dtype: object
Series切片和索引
t3['name']: 張三
t3['gender']: 男
t3['age']:  15
t3[0]:  張三
t3[1]:  男
t3[2]:  15
取前兩行
 name      張三
gender     男
dtype: object
取不連續(xù)的
 gender     男
age       15
dtype: object
取不連續(xù)的
 gender     男
age       15
dtype: object
Index(['name', 'gender', 'age'], dtype='object') --- <class 'pandas.core.indexes.base.Index'>
['張三' '男' 15] --- <class 'numpy.ndarray'>

pandas之讀取外部數(shù)據(jù)

數(shù)據(jù)存儲(chǔ)在csv中，直接使用pd.read_csv即可
pd.read_sql(sql_sentence,connection)讀取數(shù)據(jù)庫數(shù)據(jù)

pandas之DataFrame

DataFrame對(duì)象既有行索引，又有列索引
行索引：表明不同行，橫向索引，叫index，0軸，axis=0
列索引：表明不同列，縱向索引，叫columns，1軸，axis=1
代碼演示：

import pandas as pd
import numpy as np

t = pd.DataFrame(np.arange(12).reshape(3,4))
print(t)
print('-'*30)
t1 = pd.DataFrame(np.arange(12).reshape(3,4),index=list('abc'),columns=list("WXYZ"))
print(t1)

運(yùn)行結(jié)果：

   0  1   2   3
0  0  1   2   3
1  4  5   6   7
2  8  9  10  11
------------------------------
   W  X   Y   Z
a  0  1   2   3
b  4  5   6   7
c  8  9  10  11

DataFrame的基本屬性

df.shape	行數(shù)、列數(shù)
df.dtypes	列數(shù)據(jù)類型
df.ndim	數(shù)據(jù)維度
df.index	行索引
df.columns	列索引
df.values	對(duì)象值，二維ndarray數(shù)組

DataFrame整體情況查詢

df.head(3)	顯示頭部幾行，默認(rèn)5行
df.tail(3)	顯示末尾幾行，默認(rèn)5行
df.info()	相關(guān)信息概覽：行數(shù)、列數(shù)、列索引、列非空值個(gè)數(shù)、列類型、內(nèi)存占用
df.describe()	快速綜合統(tǒng)計(jì)結(jié)果：計(jì)數(shù)、均值、標(biāo)準(zhǔn)差、最大值、四分位數(shù)、最小值
df.sort_values(by=‘XX’,ascending=False)

DataFrame的索引

pandas取行和列的注意點(diǎn)
方括號(hào)寫數(shù)，表示取行，對(duì)行進(jìn)行操作df[:20]
寫字符串，表示取列索引，具體要選擇某一列對(duì)列進(jìn)行操作df['列索引']
若同時(shí)選擇行和列，df[:100]['列索引']

pandas之Ioc和iloc

df.loc通過標(biāo)簽索引行數(shù)據(jù)
df.iloc通過位置獲取行數(shù)據(jù)
具體見代碼演示：

import numpy as np
import pandas as pd

t3=pd.DataFrame(np.arange(12).reshape(3,4),
                index=list("abc"),columns=list("WXYZ"))
print('t3：')
print(t3)

#逗號(hào)前表示行，逗號(hào)后表示列
print('1.',)
#表示取a行Z列
print(t3.loc['a','Z'])
#查看類型
print(type(t3.loc['a','Z']),end='\n\n')
#取第a行，t3.loc['a']等價(jià)于t3.loc['a',:]
print('2.')
t = t3.loc['a']
print(t)
print()
print("t3.loc['a']的類型",type(t),end='\n\n')
#取Y這一列
t = t3.loc[:,"Y"]
print('Y：')
print(t)
#取多行 eg：取a行和c行 t3.loc[['a','c']]等價(jià)于t3.loc[['a','c'],:]
t = t3.loc[['a','c']]
print('取a行和c行')
print(t)
#取多列 取W和Z列
t=t3.loc[:,['W','Z']]
print('取W和Z列')
print(t)
#取間隔的多行多列
t=t3.loc[['a','b'],['W','Z']]
print('取間隔的多行多列')
print(t)
#冒號(hào)在loc里面是閉合的
# 即會(huì)選擇到冒號(hào)后面的數(shù)據(jù)
t=t3.loc['a':'c',['W','Z']]
print('：選取多行')
print(t)
#通過位置獲取行數(shù)據(jù) 等價(jià)于.iloc[1,:]
t=t3.iloc[1]
print('取第二行')
print(t)

t=t3.iloc[:,2]
print('取第3列')
print(t)
#取多列
t=t3.iloc[:,[2,1]]
print('取多列')
print(t)
t=t3.iloc[[0,2],[2,1]]
print(t)
print('取連續(xù)多行')
t=t3.iloc[1:,:2]
print(t)
print('賦值更改數(shù)據(jù)')
t3.iloc[1:,:2]=30
print(t3)
print('賦值為nan')
t3.iloc[1:,:2]=np.nan
print(t3)

運(yùn)行結(jié)果：

t3：
   W  X   Y   Z
a  0  1   2   3
b  4  5   6   7
c  8  9  10  11
1.
3
<class 'numpy.int32'>

2.
W    0
X    1
Y    2
Z    3
Name: a, dtype: int32

t3.loc['a']的類型 <class 'pandas.core.series.Series'>

Y：
a     2
b     6
c    10
Name: Y, dtype: int32
取a行和c行
   W  X   Y   Z
a  0  1   2   3
c  8  9  10  11
取W和Z列
   W   Z
a  0   3
b  4   7
c  8  11
取間隔的多行多列
   W  Z
a  0  3
b  4  7
：選取多行
   W   Z
a  0   3
b  4   7
c  8  11
取第二行
W    4
X    5
Y    6
Z    7
Name: b, dtype: int32
取第3列
a     2
b     6
c    10
Name: Y, dtype: int32
取多列
    Y  X
a   2  1
b   6  5
c  10  9
    Y  X
a   2  1
c  10  9
取連續(xù)多行
   W  X
b  4  5
c  8  9
賦值更改數(shù)據(jù)
    W   X   Y   Z
a   0   1   2   3
b  30  30   6   7
c  30  30  10  11
賦值為nan
     W    X   Y   Z
a  0.0  1.0   2   3
b  NaN  NaN   6   7
c  NaN  NaN  10  11

pandas之布爾索引

假設(shè)有一列代表狗的名字，取這一列次數(shù)超過800的狗的名字
df=pd.read_csv(“file_path.csv”)

df[df["列名']>800]

假設(shè)要選擇使用次數(shù)超過700并且名字的字符串長(zhǎng)度大于4的狗的名字，應(yīng)如何寫？

df[(df["相應(yīng)列"].str.len()>4) &  (df["列名']>700)]

不同的條件之間需要括號(hào)括起來

& 且
| 或

pandas之字符串方法

方法	說明
cat	實(shí)現(xiàn)元素級(jí)的字符串連接操作，可指定分隔符
contains	返回表示各字符串是否含有指定模式的布爾型數(shù)組
count	模式的出現(xiàn)次數(shù)
endswith，startswith	相當(dāng)于對(duì)各個(gè)元素執(zhí)行x.endswith(pattern)或x.startswith(pattern)
findall	計(jì)算各字符串的模式列表
get	獲取各元素的第i個(gè)字符
join	根據(jù)指定的分隔符將Series中各元素的字符串連接起來
len	計(jì)算各字符串的長(zhǎng)度
lower，upper	轉(zhuǎn)換大小寫，相當(dāng)于對(duì)各個(gè)元素執(zhí)行x.lower()或x.upper()
match	根據(jù)指定的正則表達(dá)式對(duì)各個(gè)元素執(zhí)行re.match
pad	在字符串的左邊、右邊或左右兩邊添加空白符
center	相當(dāng)于pad(side=‘both’)
repeat	重復(fù)值。eg：s.str.repeat(3)相當(dāng)于對(duì)各個(gè)字符串執(zhí)行x*3
replace	用指定字符串替換找到的模式
slice	對(duì)Series中的哥哥字符串進(jìn)行子串截取
split	根據(jù)分隔符或正則表達(dá)式對(duì)字符串進(jìn)行拆分。eg:`df["列名"].str.split("/").tolist()`
strip,rstrip,lstrip	去除空白符，包括換行符。相當(dāng)于對(duì)各個(gè)元素執(zhí)行x.strip,x.rstrip,x.lstrip

缺失數(shù)據(jù)的處理

python數(shù)據(jù)分析學(xué)習(xí)筆記之matplotlib、numpy、pandas

數(shù)據(jù)缺失通常有兩種情況

一種就是空，None等，在pandas是NaN(和np.nan一樣)
另一種，讓其為0

在pandas中處理NaN數(shù)據(jù)非常容易

判斷數(shù)據(jù)是否是NaN	pd.isnull(t)是就返回True,pd.notnull(t)不是就返回True
處理方式	刪除NaN所在的行列：`dropna(axis=0,how='any',inplace=False)`
	填充數(shù)據(jù)：`t.fillna(t.mean())` ,`t.fillna(t.median())`,`t.fillna(0)`
處理為0的數(shù)據(jù)	t[t==0]=np.nan
	并非每次為0的數(shù)據(jù)都需要處理
	計(jì)算平均值等情況，nan是不參與計(jì)算的但是0會(huì)

數(shù)據(jù)的合并和分組聚合

python數(shù)據(jù)分析學(xué)習(xí)筆記之matplotlib、numpy、pandas

數(shù)據(jù)合并之join

join默認(rèn)情況下是把行索引相同的數(shù)據(jù)合并到一起

代碼演示：

import numpy as np
import pandas as pd

df1 = pd.DataFrame(np.ones((2, 4)), index=['A', 'B'], columns=list("abcd"))
print('輸出df1')
print(df1)
df2=pd.DataFrame(np.zeros((3,3)),index=['A','B','C'],columns=list('xyz'))
print('輸出df2    ')
print(df2)
print('輸出df1 join df2')
print(df1.join(df2))
print('輸出df2 join df1')
print(df2.join(df1))

運(yùn)行結(jié)果：

輸出df1
     a    b    c    d
A  1.0  1.0  1.0  1.0
B  1.0  1.0  1.0  1.0
輸出df2    
     x    y    z
A  0.0  0.0  0.0
B  0.0  0.0  0.0
C  0.0  0.0  0.0
輸出df1 join df2
     a    b    c    d    x    y    z
A  1.0  1.0  1.0  1.0  0.0  0.0  0.0
B  1.0  1.0  1.0  1.0  0.0  0.0  0.0
輸出df2 join df1
     x    y    z    a    b    c    d
A  0.0  0.0  0.0  1.0  1.0  1.0  1.0
B  0.0  0.0  0.0  1.0  1.0  1.0  1.0
C  0.0  0.0  0.0  NaN  NaN  NaN  NaN

數(shù)據(jù)合并之merge

merge按照指定的列把數(shù)據(jù)按照一定的方式合并到一起
默認(rèn)的合并方式：inner 交集
merge outer NaN補(bǔ)全并集
merge left 左邊為準(zhǔn) NaN補(bǔ)全
merge right 右邊為準(zhǔn) NaN補(bǔ)全

分組與聚合

在pandas中類似的分組的操作：

#grouped是一個(gè)DataFrameGroupBy對(duì)象，是可迭代的
#grouped中的每一個(gè)元素是一個(gè)元組
#元組里面是(索引(分組的值)，分組之后的DataFrame)
grouped=df.groupby(by="columns_name")
grouped.count()
grouped["columns_name"].count()

DataFrameGroupBy對(duì)象有很多經(jīng)過優(yōu)化的方法

函數(shù)名	說明
count	分組中非NA值的數(shù)量
sum	非NA值的和
mean	非NA值的平均值
median	非NA值的算術(shù)中位數(shù)
std，var	無偏（分母為n-1）標(biāo)準(zhǔn)差和方差
min,max	非NA值的最小值和最大值
假設(shè)按照國家和省份這兩列進(jìn)行分組統(tǒng)計(jì)

grouped = df.groupby(by=[df["country"],df["state/province"]])

假設(shè)只希望對(duì)獲取分組之后的某一部分?jǐn)?shù)據(jù)，或者只希望對(duì)某幾列數(shù)據(jù)進(jìn)行分組

#獲取分組之后的某一部分?jǐn)?shù)據(jù)
df.groupy(by=["country","state/province"])["country"].count()
#對(duì)某幾列數(shù)據(jù)進(jìn)行分組 
df["country"].groupby(by=[df["country"],df["state/province"]]).count()

以上學(xué)習(xí)內(nèi)容來自B站文章來源地址http://www.zghlxwxcb.cn/news/detail-472424.html

到了這里，關(guān)于python數(shù)據(jù)分析學(xué)習(xí)筆記之matplotlib、numpy、pandas的文章就介紹完了。如果您還想了解更多內(nèi)容，請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章，希望大家以后多多支持TOY模板網(wǎng)！

本文來自互聯(lián)網(wǎng)用戶投稿，該文觀點(diǎn)僅代表作者本人，不代表本站立場(chǎng)。本站僅提供信息存儲(chǔ)空間服務(wù)，不擁有所有權(quán)，不承擔(dān)相關(guān)法律責(zé)任。如若轉(zhuǎn)載，請(qǐng)注明出處：如若內(nèi)容造成侵權(quán)/違法違規(guī)/事實(shí)不符，請(qǐng)點(diǎn)擊違法舉報(bào)進(jìn)行投訴反饋，一經(jīng)查實(shí)，立即刪除！

分享到：

領(lǐng)支付寶紅包贊助服務(wù)器費(fèi)用

【數(shù)據(jù)分析】matplotlib、numpy、pandas速通
教程鏈接：【python教程】數(shù)據(jù)分析——numpy、pandas、matplotlib 資料：https://github.com/TheisTrue/DataAnalysis 官網(wǎng)鏈接：可查詢各種圖的使用及代碼對(duì)比常用統(tǒng)計(jì)圖（1）引入（2）示例（3）設(shè)置圖片大小 figsize：圖片的 (長(zhǎng), 寬) dpi：每英寸像素點(diǎn)的個(gè)數(shù)，例如選定為 80 （圖像模糊
2024年01月24日
瀏覽(25)
數(shù)據(jù)分析-python學(xué)習(xí) （1）numpy相關(guān)
內(nèi)容為：https://juejin.cn/book/7240731597035864121的學(xué)習(xí)筆記 numpy數(shù)組創(chuàng)建創(chuàng)建全0數(shù)組，正態(tài)分布、隨機(jī)數(shù)組等就不說了，提供了相應(yīng)的方法通過已有數(shù)據(jù)創(chuàng)建有兩種 arr1=np.array([1,2,3,4,5]) 或者data=np.loadtxt(‘C:/Users/000001_all.csv’,dtype=‘float’,delimiter=‘,’,skiprows=1) （data=np.genfromtxt(‘
2024年02月13日
瀏覽(45)
Matplotlib繪圖知識(shí)小結(jié)--Python數(shù)據(jù)分析學(xué)習(xí)
一、Pyplot子庫繪制2D圖表 1、Matplotlib Pyplot Pyplot 是 Matplotlib 的子庫，提供了和 MATLAB 類似的繪圖 API。 Pyplot 是常用的繪圖模塊，能很方便讓用戶繪制 2D 圖表。 Pyplot 包含一系列繪圖函數(shù)的相關(guān)函數(shù)，每個(gè)函數(shù)會(huì)對(duì)當(dāng)前的圖像進(jìn)行一些修改，例如：給圖像加上標(biāo)記，生新的圖像，
2024年02月12日
瀏覽(25)
郭煒老師mooc第十一章數(shù)據(jù)分析和展示(numpy,pandas, matplotlib)
numpy創(chuàng)建數(shù)組的常用函數(shù) ?numpy數(shù)組常用屬性和函數(shù) ?numpy數(shù)組元素的增刪在numpy數(shù)組中查找元素? np.argwhere( a ):返回非0的數(shù)組元組的索引，其中a是要索引數(shù)組的條件。 np.where(condition) 當(dāng)where內(nèi)只有一個(gè)參數(shù)時(shí)，那個(gè)參數(shù)表示條件，當(dāng)條件成立時(shí)，? ? ? ? ?? where返回的是每個(gè)
2024年03月15日
瀏覽(23)
【Python數(shù)據(jù)分析】數(shù)據(jù)分析之numpy基礎(chǔ)
實(shí)驗(yàn)環(huán)境：建立在Python3的基礎(chǔ)之上 numpy提供了一種數(shù)據(jù)類型，提供了數(shù)據(jù)分析的運(yùn)算基礎(chǔ)，安裝方式導(dǎo)入numpy到python項(xiàng)目本文以案例的方式展示numpy的基本語法，沒有介紹語法的細(xì)枝末節(jié)，筆者認(rèn)為通過查閱案例就能掌握基本用法。 numpy數(shù)組的基本概念 numpy默認(rèn)所有元素具有
2024年02月10日
瀏覽(27)
[數(shù)據(jù)分析大全]基于Python的數(shù)據(jù)分析大全——Numpy基礎(chǔ)
NumPy 的全稱為 Numeric Python，它是 Python 的第三方擴(kuò)展包，主要用來計(jì)算、處理一維或多維數(shù)組。 ??步入8月了，7月時(shí)因?yàn)轫?xiàng)目所需，自學(xué)了深度學(xué)習(xí) 相關(guān)的內(nèi)容，現(xiàn)在已經(jīng)把項(xiàng)目所需要的神經(jīng)網(wǎng)絡(luò)框架搭建起來了，輸入輸出也都?xì)w一化了，模擬誤差也加上了，圖像的參數(shù)
2024年02月14日
瀏覽(26)
Python 數(shù)據(jù)分析——matplotlib 快速繪圖
matplotlib采用面向?qū)ο蟮募夹g(shù)來實(shí)現(xiàn)，因此組成圖表的各個(gè)元素都是對(duì)象，在編寫較大的應(yīng)用程序時(shí)通過面向?qū)ο蟮姆绞绞褂胢atplotlib將更加有效。但是使用這種面向?qū)ο蟮恼{(diào)用接口進(jìn)行繪圖比較煩瑣，因此matplotlib還提供了快速繪圖的pyplot模塊。本節(jié)首先介紹該模塊的使用方法
2024年02月11日
瀏覽(25)
大數(shù)據(jù)分析/機(jī)器學(xué)習(xí)基礎(chǔ)之matplotlib繪圖篇
目錄一、前言我的運(yùn)行環(huán)境二、什么是matplotlib？三、安裝及導(dǎo)入四、matplotlib的使用五、matplotlib中文亂碼問題本人因在學(xué)習(xí)基于python的機(jī)器學(xué)習(xí)相關(guān)教程時(shí)第一次接觸到matplotlib相關(guān)方面的繪圖知識(shí)，故寫此筆記進(jìn)行記錄，如果能幫助到其他人歡迎點(diǎn)個(gè)贊??表示支持學(xué)習(xí)
2024年02月05日
瀏覽(24)
【100天精通Python】Day53：Python 數(shù)據(jù)分析_NumPy數(shù)據(jù)操作和分析進(jìn)階
目錄 1. 廣播 ?2 文件輸入和輸出 3 隨機(jī)數(shù)生成 4 線性代數(shù)操作 ?5 進(jìn)階操作
2024年02月09日
瀏覽(96)
【Python數(shù)據(jù)分析】numpy庫的使用-上篇
NumPy是一個(gè)用于科學(xué)計(jì)算的Python庫，它提供了高性能的多維數(shù)組對(duì)象和用于處理這些數(shù)組的各種工具。NumPy的名稱來自于“ Numerical Python ”的縮寫。 NumPy的主要功能包括：多維數(shù)組對(duì)象：NumPy提供了多維數(shù)組對(duì)象，稱為 ndarray ，它是一個(gè)由同類型數(shù)據(jù)組成的表格。 ndarray 可以包
2024年02月06日
瀏覽(26)

<strike id="n9ray"></strike>