01 Pandas概覽(Pandas at a glance)
《Python數(shù)據(jù)分析技術(shù)?!返?6章使用 Pandas 準(zhǔn)備數(shù)據(jù) 01 Pandas概覽(Pandas at a glance)
Pandas概述
Wes McKinney developed the Pandas library in 2008. The name (Pandas) comes from the term “Panel Data” used in econometrics for analyzing time-series data. Pandas has many features, listed in the following, that make it a popular tool for data wrangling and analysis.
Wes McKinney 于 2008 年開(kāi)發(fā)了 Pandas 庫(kù)。Pandas 這個(gè)名字來(lái)源于計(jì)量經(jīng)濟(jì)學(xué)中用于分析時(shí)間序列數(shù)據(jù)的術(shù)語(yǔ) “面板數(shù)據(jù)”。Pandas 有許多功能,這些功能使其成為數(shù)據(jù)處理和分析的常用工具。
Pandas provides features for labeling of data or indexing, which speeds up the retrieval of data.
Pandas 提供數(shù)據(jù)標(biāo)簽或索引功能,可加快數(shù)據(jù)檢索速度。
Input and output support: Pandas provides options to read data from different file formats like JSON (JavaScript Object Notation), CSV (Comma-Separated Values), Excel, and HDF5 (Hierarchical Data Format Version 5). It can also be used to write data into databases, web services, and so on.
輸入和輸出支持: Pandas 提供從不同文件格式讀取數(shù)據(jù)的選項(xiàng),如 JSON(JavaScript Object Notation)、CSV(Comma-Separated Values)、Excel 和 HDF5(Hierarchical Data Format Version 5)。它還可用于將數(shù)據(jù)寫(xiě)入數(shù)據(jù)庫(kù)、網(wǎng)絡(luò)服務(wù)等。
Most of the data that is needed for analysis is not contained in a single source, and we often need to combine datasets to consolidate the data that we need for analysis. Again, Pandas comes to the rescue with tailor-made functions to combine data.
分析所需的大部分?jǐn)?shù)據(jù)并不包含在單一來(lái)源中,因此我們經(jīng)常需要合并數(shù)據(jù)集,以整合分析所需的數(shù)據(jù)。Pandas 又一次提供了量身定制的合并數(shù)據(jù)函數(shù)。
Speed and enhanced performance: The Pandas library is based on Cython, which combines the convenience and ease of use of Python with the speed of the C language. Cython helps to optimize performance and reduce overheads.
速度和增強(qiáng)的性能 Pandas 庫(kù)基于 Cython,它將 Python 的方便易用與 C 語(yǔ)言的速度相結(jié)合。Cython 有助于優(yōu)化性能和減少開(kāi)銷(xiāo)。
Data visualization: To derive insights from the data and make it presentable to the audience, viewing data using visual means is crucial, and Pandas provides a lot of built-in visualization tools using Matplotlib as the base library.
數(shù)據(jù)可視化: 要從數(shù)據(jù)中獲得洞察力并將其呈現(xiàn)給受眾,使用可視化手段查看數(shù)據(jù)至關(guān)重要,而 Pandas 使用 Matplotlib 作為基礎(chǔ)庫(kù),提供了大量?jī)?nèi)置可視化工具。
Support for other libraries: Pandas integrates smoothly with other libraries like Numpy, Matplotlib, Scipy, and Scikit-learn. Thus we can perform other tasks like numerical computations, visualizations, statistical analysis, and machine learning in conjunction with data manipulation.
支持其他庫(kù) Pandas 可與 Numpy、Matplotlib、Scipy 和 Scikit-learn 等其他庫(kù)順利集成。因此,我們可以結(jié)合數(shù)據(jù)處理執(zhí)行其他任務(wù),如數(shù)值計(jì)算、可視化、統(tǒng)計(jì)分析和機(jī)器學(xué)習(xí)。
Grouping: Pandas provides support for the split-apply-combine methodology, whereby we can group our data into categories, apply separate functions on them, and combine the results.
分組: Pandas 支持 "拆分-應(yīng)用-合并 "方法,我們可以將數(shù)據(jù)分組,分別應(yīng)用不同的函數(shù),然后合并結(jié)果。
Handling missing data, duplicates, and filler characters: Data often has missing values, duplicates, blank spaces, special characters (like $, &), and so on that may need to be removed or replaced. With the functions provided in Pandas, you can handle such anomalies with ease.
處理缺失數(shù)據(jù)、重復(fù)數(shù)據(jù)和填充字符: 數(shù)據(jù)中經(jīng)常會(huì)有需要?jiǎng)h除或替換的缺失值、重復(fù)數(shù)據(jù)、空白、特殊字符(如 $、&)等。利用 Pandas 提供的函數(shù),您可以輕松處理此類異常情況。
Mathematical operations: Many numerical operations and computations can be performed in Pandas, with NumPy being used at the back end for this purpose.
數(shù)學(xué)運(yùn)算 在 Pandas 中可以執(zhí)行許多數(shù)值運(yùn)算和計(jì)算,NumPy 在后端用于此目的。
環(huán)境準(zhǔn)備
If you have not already installed Pandas, go to the Anaconda Prompt and enter the following command.
如果尚未安裝 Pandas,請(qǐng)轉(zhuǎn)到 Anaconda 提示符并輸入以下命令。
pip install pandas
Once the Pandas library is installed, you need to import it before using its functions. In your Jupyter notebook, type the following to import this library.
安裝好 Pandas 庫(kù)后,在使用其功能之前需要將其導(dǎo)入。在 Jupyter 筆記本中,鍵入以下內(nèi)容導(dǎo)入該庫(kù)。
import pandas as pd
Here, pd is a shorthand name or alias that is a standard for Pandas.
這里,pd 是 Pandas 標(biāo)準(zhǔn)的速記名稱或別名。
For some of the examples, we also use functions from the NumPy library. Ensure that both the Pandas and NumPy libraries are installed and imported.
在部分示例中,我們還使用了 NumPy 庫(kù)中的函數(shù)。確保已安裝并導(dǎo)入 Pandas 和 NumPy 庫(kù)。
You need to download a dataset, “subset-covid-data.csv”, that contains data about the number of cases and deaths related to the COVID-19 pandemic for various countries on a particular date. Please use the following link for downloading the dataset: https://github.com/DataRepo2019/Data-files/blob/master/subset-covid-data.csv文章來(lái)源:http://www.zghlxwxcb.cn/news/detail-817267.html
您需要下載一個(gè)名為 "subset-covid-data.csv "的數(shù)據(jù)集,其中包含特定日期不同國(guó)家與 COVID-19 大流行相關(guān)的病例數(shù)和死亡數(shù)的數(shù)據(jù)。請(qǐng)使用以下鏈接下載數(shù)據(jù)集: https://github.com/DataRepo2019/Data-files/blob/master/subset-covid-data.csv文章來(lái)源地址http://www.zghlxwxcb.cn/news/detail-817267.html
到了這里,關(guān)于《Python數(shù)據(jù)分析技術(shù)?!返?6章使用 Pandas 準(zhǔn)備數(shù)據(jù) 01 Pandas概覽(Pandas at a glance)的文章就介紹完了。如果您還想了解更多內(nèi)容,請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!