摘要
英創(chuàng)嵌入式主板,如ESM7000系列、ESM8000系列等,均可配置標(biāo)準(zhǔn)的PCIE×1高速接口。連接NVMe模塊作高速大容量數(shù)據(jù)存儲(chǔ)、連接多通道高速網(wǎng)絡(luò)接口模塊都是PCIE接口的典型應(yīng)用。此外,對(duì)于工控領(lǐng)域中的高速數(shù)據(jù)采集,還可采用FPGA的PCIE IP核實(shí)現(xiàn)PCIE EP端點(diǎn),與英創(chuàng)嵌入式主板構(gòu)成高效低成本的應(yīng)用方案。本文簡(jiǎn)要介紹方案硬件配置,以及PCIE在Linux平臺(tái)上的驅(qū)動(dòng)程序?qū)崿F(xiàn)。
硬件設(shè)計(jì)要點(diǎn)
Xilinx公司為它的FPGA設(shè)計(jì)有多種PCIE EP端點(diǎn)的IP核,針對(duì)本文的應(yīng)用需求,選擇DMA/Bridge Subsystem for PCI Express v4.1(簡(jiǎn)稱PCIE/XDMA)。PCIE/XDMA在硬件上把PCIE接口轉(zhuǎn)換為AXI-Stream高速并行接口(簡(jiǎn)稱AXIS),工控前端邏輯只需把采集數(shù)據(jù)轉(zhuǎn)換成AXI-Stream格式提供給AXIS通道。IP核會(huì)采用PCIE總線的DMA機(jī)制,把AXIS通道數(shù)據(jù)按數(shù)據(jù)塊的形式直接傳送至Linux的內(nèi)存中,這樣在Linux的應(yīng)用程序就可直接處理采集數(shù)據(jù)了。Xilinx Artix 7系的低成本芯片XC7A35T、XC7A50T均可容納PCIE/XDMA IP核,這樣可保證應(yīng)用方案的成本處于合理的范圍。

?
圖1中的實(shí)例xdma_0是Xilinx公司的PCIE/DMA IP模塊,作為PCIE端點(diǎn)設(shè)備(PCIE Endpoint Device)。Dtaker1_5_0是應(yīng)用相關(guān)的前端邏輯。對(duì)PCIE的主要配置如下圖所示:

?
上述配置定義的AXIS總線為64-bit數(shù)據(jù)寬度、總線時(shí)鐘62.5MHz(ACLK)。
AXIS總線典型的握手時(shí)序如圖3所示,一個(gè)數(shù)據(jù)傳輸周期最快需要3個(gè)ACLK,T3上升沿為數(shù)據(jù)鎖存時(shí)刻:

?
若前端邏輯每4個(gè)ACLK產(chǎn)生一個(gè)dword數(shù)據(jù),則對(duì)應(yīng)的數(shù)據(jù)速率就是125MB/s。

?
基于XC7A50T的PCIE/DMA IP可支持最多4路DMA通道,分別為2路發(fā)送(H2C通道)和2路接收(C2H通道),加上用戶前端邏輯中斷,共有至少5個(gè)中斷源。采用PCIE的MSI中斷機(jī)制是解決多中斷源的最好方式,所以配置8個(gè)中斷矢量,實(shí)際使用5個(gè)。
DMA Engine驅(qū)動(dòng)
目前Xilinx公司為其IP核DMA/Bridge Subsystem for PCI Express v4.1,僅提供基于x86體系的驅(qū)動(dòng),而沒(méi)有在Linux DMA Engine架構(gòu)上做工作。而事實(shí)上,DMA Engine架構(gòu)已成為ARM嵌入式Linux平臺(tái)的DMA應(yīng)用的事實(shí)標(biāo)準(zhǔn)(de facto),為此本方案首先構(gòu)建了標(biāo)準(zhǔn)的DMA Engine架構(gòu)驅(qū)動(dòng)程序,包括通用DMA Controller驅(qū)動(dòng)和面向應(yīng)用的DMA Client驅(qū)動(dòng),應(yīng)用程序通過(guò)標(biāo)準(zhǔn)的字符型設(shè)備節(jié)點(diǎn),操作DMA Client驅(qū)動(dòng),從而實(shí)現(xiàn)所需的數(shù)據(jù)采集。圖5是從軟件開(kāi)發(fā)角度來(lái)看的總體功能框圖。

?
DMA Engine架構(gòu)為不同的DMA模式提供不同的API函數(shù),其中最主要的是單次DMA和周期DMA兩種,其API函數(shù)分別為:
struct dma_async_tx_descriptor *dmaengine_prep_slave_sg(
?????????? struct dma_chan *chan, struct scatterlist *sgl,
?????????? unsigned int sg_len, enum dma_data_direction direction,
?????????? unsigned long flags);
struct dma_async_tx_descriptor *dmaengine_prep_dma_cyclic(
?????????? struct dma_chan *chan, dma_addr_t buf_addr, size_t buf_len,
?????????? size_t period_len, enum dma_data_direction direction);
DMA Controller驅(qū)動(dòng)要求DMA支持Scatter-gather結(jié)構(gòu)的非連續(xù)數(shù)據(jù)Buffer,但在本方案的應(yīng)用中,對(duì)單次DMA情形,采用單個(gè)Buffer是最常見(jiàn)的應(yīng)用方式,這時(shí)可采用DMAEngine的簡(jiǎn)化函數(shù):
struct dma_async_tx_descriptor *dmaengine_prep_slave_singl(
?????????? struct dma_chan *chan, dma_addr_t buf, size_t len,
?????????? enum dma_data_direction direction, unsigned long flags);
Cyclic DMA模式,是把多個(gè)DMA Buffer通過(guò)其描述符(dma descriptor)表連接成環(huán)狀,當(dāng)一個(gè)buffer的DMA傳送結(jié)束后,驅(qū)動(dòng)程序的中斷線程將自動(dòng)啟動(dòng)面向下一個(gè)描述符的DMA Buffer。由DMA descriptor表描述的邏輯流程如圖2所示:

?
本方案的DMA Controller驅(qū)動(dòng)實(shí)現(xiàn)了上述兩種DMA傳輸方式,即單次DMA傳輸和周期DMA傳輸。DMA Controller驅(qū)動(dòng)本質(zhì)上講,是一種通用的DMA服務(wù)器,如何使用DMA的傳輸功能,實(shí)現(xiàn)具體的數(shù)據(jù)傳輸任務(wù),則是由DMA Client來(lái)決定的。Linux把DMA服務(wù)與具體應(yīng)用分成兩個(gè)部分,有利于DMA Controller驅(qū)動(dòng)面向不同的應(yīng)用場(chǎng)景。
DMA Client驅(qū)動(dòng)
DMA Client驅(qū)動(dòng)是一個(gè)面向應(yīng)用的驅(qū)動(dòng),如圖5所示,它需要與User Space的上層應(yīng)用程序配合運(yùn)行,來(lái)完成所需的數(shù)據(jù)采集與處理。
單次DMA的操作如下所示。
/* prepare a single buffer dma */
desc = dmaengine_prep_slave_single(dchan, edev->dma_phys,
????????????? edev->total_len, DMA_DEV_TO_MEM,
????????????? DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
if (!desc) {
??? dev_err(edev->dev, "dmaengine_prep_slave_single(..) failed\n");
??? ret = -ENODEV;
??? goto error_out;
}
/* setup dtaker hardware */
eta750_dtaker_setup(edev);
/* put callback, and submit dma */
desc->callback = dma_callback;
desc->callback_param = edev;
edev->cookie = dmaengine_submit(desc);
ret = dma_submit_error(edev->cookie);
if (ret) {
dev_err(edev->dev, "DMA submit failed %d\n", ret);
goto error_submit;
}
/* init complete, and fire */
reinit_completion(&edev->xdma_chan_complete);
dma_async_issue_pending(dchan);
/* simulate input data */
eta750_dtaker_run(edev);
/* wait dma complete */
count = wait_for_completion_timeout(&edev->xdma_chan_complete, msecs_to_jiffies(DMA_TIMEOUT));
if (count == 0) {
dev_err(edev->dev, "wait_for_completion_timeout timeout\n");
ret = -ETIMEDOUT;
eta750_dtaker_end(edev);
goto error_submit;
}
/* error processing */
eta750_dtaker_error_pro(edev);
/* stop front-end daq unit */
count = eta750_dtaker_end(edev);
/* dump data */
eta750_dtaker_dump_data(edev);
return edev->total_len;
error_submit:
dmaengine_terminate_all(dchan);
error_out:
return ret;
只有周期DMA方式才能實(shí)現(xiàn)連續(xù)數(shù)據(jù)采集,在DMA Client中采用雙DMA Buffer的乒乓結(jié)構(gòu)來(lái)實(shí)現(xiàn)連續(xù)采集,應(yīng)用程序處理0# Buffer數(shù)據(jù)時(shí),DMA傳輸數(shù)據(jù)至1# Buffer,傳輸結(jié)束時(shí),進(jìn)行切換,應(yīng)用程序處理1# Buffer數(shù)據(jù),DMA傳輸新數(shù)據(jù)至0# Buffer。周期DMA需要指定每個(gè)buffer的長(zhǎng)度period_len,同時(shí)需指定由2個(gè)buffer構(gòu)成的ping-pong buffer的總長(zhǎng)度total_len。其DMA流程如下所示。文章來(lái)源:http://www.zghlxwxcb.cn/news/detail-609591.html
/* prepare cyclic buffer dma */
desc = dmaengine_prep_dma_cyclic(dchan, edev->dma_phys, edev->total_len,
edev->period_len, DMA_DEV_TO_MEM, DMA_PREP_INTERRUPT);
if (!desc) {
dev_err(edev->dev, "%s: prep dma cyclic failed!\n", __func__);
ret = -EINVAL;
goto error_out;
}
/* in cyclic mode */
edev->cyclic = true;
/* setup dtaker hardware */
eta750_dtaker_setup(edev);
/* put callback, and submit dma */
desc->callback = dma_callback;
desc->callback_param = edev;
edev->cookie = dmaengine_submit(desc);
ret = dma_submit_error(edev->cookie);
if (ret) {
dev_err(edev->dev, "cyclic dma submit failed %d\n", ret);
goto error_submit;
}
/* init complete, and fire */
reinit_completion(&edev->xdma_chan_complete);
dma_async_issue_pending(dchan);
edev->running = true;
/* simulate input data */
eta750_dtaker_run(edev);
edev->data_seed += ((edev->period_len / sizeof(u16)) * edev->data_incr);
while(!kthread_should_stop()) {
/* wait dma complete */
count = wait_for_completion_timeout(&edev->xdma_chan_complete,
msecs_to_jiffies(DMA_TIMEOUT));
if (count == 0) {
dev_err(edev->dev, "wait_for_completion timeout, transfer %d\n", edev->transfer_count);
ret = -ETIMEDOUT;
break;
}
/* data processing */
eta750_dtaker_error_pro(edev);
edev->transfer_count++;
reinit_completion(&edev->xdma_chan_complete);
/* fill more data */
eta750_dtaker_run(edev);
edev->data_seed += ((edev->period_len / sizeof(u16)) * edev->data_incr);
}
/* stop front-end daq unit */
count = eta750_dtaker_end(edev);
edev->running = false;
error_submit:
dmaengine_terminate_all(dchan);
edev->cyclic = false;
dev_info(edev->dev, "%s: dma stopped, cyclic %d, running %d\n", __func__, edev->cyclic, edev->running);
error_out:
return ret;
從上面代碼可見(jiàn),傳送過(guò)程是一個(gè)無(wú)限循環(huán),DMA Controller驅(qū)動(dòng)會(huì)自動(dòng)進(jìn)行ping-pong buffer的切換。并通過(guò)回調(diào)函數(shù)通知上層應(yīng)用程序,新數(shù)據(jù)已準(zhǔn)備就緒。應(yīng)用程序可通過(guò)命令來(lái)終止采集傳輸過(guò)程。文章來(lái)源地址http://www.zghlxwxcb.cn/news/detail-609591.html
到了這里,關(guān)于FPGA PCIE接口的Linux DMA Engine驅(qū)動(dòng)的文章就介紹完了。如果您還想了解更多內(nèi)容,請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!