国产 无码 综合区,色欲AV无码国产永久播放,无码天堂亚洲国产AV,国产日韩欧美女同一区二区

深入淺出 Linux 中的 ARM IOMMU SMMU I

這篇具有很好參考價(jià)值的文章主要介紹了深入淺出 Linux 中的 ARM IOMMU SMMU I。希望對(duì)大家有所幫助。如果存在錯(cuò)誤或未考慮完全的地方,請(qǐng)大家不吝賜教,您也可以點(diǎn)擊"舉報(bào)違法"按鈕提交疑問。

Linux 系統(tǒng)下的 SMMU 介紹

在計(jì)算機(jī)系統(tǒng)架構(gòu)中,與傳統(tǒng)的用于 CPU 訪問內(nèi)存的管理的 MMU 類似,IOMMU (Input Output Memory Management Unit) 將來自系統(tǒng) I/O 設(shè)備的 DMA 請(qǐng)求傳遞到系統(tǒng)互連之前,它會(huì)先轉(zhuǎn)換請(qǐng)求的地址,并對(duì)系統(tǒng) I/O 設(shè)備的內(nèi)存訪問事務(wù)進(jìn)行管理和限制。IOMMU 將設(shè)備可見的虛擬地址 (IOVA) 映射到物理內(nèi)存地址。不同的硬件體系結(jié)構(gòu)有不同的 IOMMU 實(shí)現(xiàn),ARM 平臺(tái)的 IOMMU 是 SMMU (System Memory Management)。

SMMU 只為來自系統(tǒng) I/O 設(shè)備的內(nèi)存訪問事務(wù)提供轉(zhuǎn)換服務(wù),而不為到系統(tǒng) I/O 設(shè)備的事務(wù)提供轉(zhuǎn)換服務(wù)。從系統(tǒng)或 CPU 到系統(tǒng) I/O 設(shè)備的事務(wù)由其它方式管理,例如 MMU。下圖展示了 SMMU 在系統(tǒng)中的角色。

來自系統(tǒng) I/O 設(shè)備的內(nèi)存訪問事務(wù)指系統(tǒng) I/O 設(shè)備對(duì)內(nèi)存的讀寫,到系統(tǒng) I/O 設(shè)備的事務(wù)通常指 CPU 訪問系統(tǒng) I/O 設(shè)備內(nèi)部映射到物理內(nèi)存地址空間的存儲(chǔ)器或寄存器。關(guān)于 SMMU 更詳細(xì)的介紹,可以參考 IOMMU和Arm SMMU介紹 及 SMMU 軟件指南。關(guān)于 SMMU 的寄存器、數(shù)據(jù)結(jié)構(gòu)和行為的詳細(xì)描述,可以參考 ARM 系統(tǒng)內(nèi)存管理單元架構(gòu)規(guī)范版本 3。關(guān)于 SMMU 的具體實(shí)現(xiàn),可以參考相關(guān)實(shí)現(xiàn)的文檔,如 MMU-600 的 Arm CoreLink MMU-600 系統(tǒng)內(nèi)存管理單元技術(shù)參考手冊(cè) 和 MMU-700 的 Arm? CoreLink? MMU-700 系統(tǒng)內(nèi)存管理單元技術(shù)參考手冊(cè)。

SMMU 通過 StreamID 等區(qū)分不同的系統(tǒng) I/O 設(shè)備,系統(tǒng) I/O 設(shè)備在通過 SMMU 訪問內(nèi)存時(shí),需要將 StreamID 等信息帶給 SMMU。從系統(tǒng) I/O 設(shè)備的角度來看,包含 SMMU 的系統(tǒng)更精細(xì)的結(jié)構(gòu)如下圖:

系統(tǒng) I/O 設(shè)備通過 DMA 訪問內(nèi)存,DMA 請(qǐng)求發(fā)出后,在送到 SMMU 和系統(tǒng)互聯(lián)之前,先要經(jīng)過一個(gè)稱為 DAA (對(duì)于其它實(shí)現(xiàn),可能是其它設(shè)備) 的設(shè)備,DAA 做第一次地址轉(zhuǎn)換,之后將內(nèi)存訪問請(qǐng)求信息,包括配置的 StreamID 等送進(jìn) SMMU,以做進(jìn)一步的處理。

在 Linux 系統(tǒng)中,要為某個(gè)系統(tǒng) I/O 設(shè)備開啟 SMMU,一般要經(jīng)過如下步驟:

  1. SMMU 驅(qū)動(dòng)程序的初始化。這主要包括讀取 dts 文件中的,SMMU 設(shè)備節(jié)點(diǎn),探測(cè) SMMU 的硬件設(shè)備特性,初始化全局資源及數(shù)據(jù)結(jié)構(gòu),如命令隊(duì)列、事件隊(duì)列、中斷,和流表等,并將 SMMU 設(shè)備注冊(cè)進(jìn) Linux 內(nèi)核的 IOMMU 子系統(tǒng)。
  2. 系統(tǒng) I/O 設(shè)備探測(cè)、發(fā)現(xiàn),并和驅(qū)動(dòng)程序綁定初始化的過程中,設(shè)備和 IOMMU 的綁定。對(duì)于使用 DMA 來訪問內(nèi)存的設(shè)備,這一般通過調(diào)用 of_dma_configure()/of_dma_configure_id() 函數(shù)完成。設(shè)備探測(cè)、發(fā)現(xiàn),并和驅(qū)動(dòng)程序綁定初始化的過程,需要訪問設(shè)備樹 dts 文件里,設(shè)備節(jié)點(diǎn)定義中與 IOMMU 配置相關(guān)的字段。如在 arch/arm64/boot/dts/renesas/r8a77961.dtsi 文件中:
			iommus = <&ipmmu_vc0 19>;
  1. 系統(tǒng) I/O 設(shè)備驅(qū)動(dòng)程序關(guān)于 IOMMU 的配置。這部分通常因具體的硬件系統(tǒng)實(shí)現(xiàn)而異。這主要包括調(diào)用 dma_coerce_mask_and_coherent()/dma_set_mask_and_coherent() 函數(shù)將 DMA 掩碼和一致性 DMA 掩碼設(shè)置為相同的值,以及配置類似前面提到的 DAA 之類的設(shè)備。
  2. 系統(tǒng) I/O 設(shè)備驅(qū)動(dòng)程序分配內(nèi)存。系統(tǒng) I/O 設(shè)備驅(qū)動(dòng)程序通過 dma_alloc_coherent() 等接口分配內(nèi)存,這個(gè)過程除了分配內(nèi)存外,還會(huì)通過 SMMU 設(shè)備驅(qū)動(dòng)程序的操作函數(shù),創(chuàng)建地址轉(zhuǎn)換表,并完成 SMMU CD 等數(shù)據(jù)結(jié)構(gòu)的設(shè)置。在 Linux 內(nèi)核中,不同的子系統(tǒng)實(shí)際調(diào)用的分配 DMA 內(nèi)存的方法不同,但最終都需要調(diào)用 dma_alloc_coherent() 函數(shù),這樣分配的內(nèi)存,在通過 DMA 訪問時(shí),才會(huì)經(jīng)過 SMMU。
  3. 訪問分配的內(nèi)存。通過 dma_alloc_coherent() 函數(shù)分配到的內(nèi)存,其地址可以提供給系統(tǒng) I/O 設(shè)備的 DMA 配置相關(guān)邏輯,后續(xù)系統(tǒng) I/O 設(shè)備通過 DMA 訪問內(nèi)存,將經(jīng)過 SMMU 完成地址轉(zhuǎn)換。通過 DMA 訪問內(nèi)存時(shí),將經(jīng)過 SMMU 的地址轉(zhuǎn)換。

SMMU 的地址轉(zhuǎn)換借助于相關(guān)的數(shù)據(jù)結(jié)構(gòu)完成,這主要包括流表及其流表項(xiàng) STE,上下文描述符表及其表項(xiàng) CD,和地址轉(zhuǎn)換表及其表項(xiàng)。STE 存儲(chǔ)流的上下文信息,每個(gè) STE 64 字節(jié)。CD 存儲(chǔ)了與第 1 階段轉(zhuǎn)換有關(guān)的所有設(shè)置,每個(gè) CD 64 字節(jié)。地址轉(zhuǎn)換表用于描述虛擬地址和物理內(nèi)存地址之間的映射關(guān)系。流表的結(jié)構(gòu)可以分為 線性流表2 級(jí)流表 兩種。線性流表結(jié)構(gòu)如下圖:

2 級(jí)流表示例結(jié)構(gòu)如下圖:

上下文描述符表的結(jié)構(gòu)可以分為 單個(gè) CD,單級(jí) CD 表2 級(jí) CD 表 三種情況。單個(gè) CD 示例結(jié)構(gòu)如下圖:

單級(jí) CD 表示例結(jié)構(gòu)如下圖:

2 級(jí) CD 表示例結(jié)構(gòu)如下圖:

SMMU 在做地址轉(zhuǎn)換時(shí),根據(jù) SMMU 的流表基址寄存器找到流表,通過 StreamID 在流表中找到 STE。之后根據(jù) STE 的配置和 SubstreamID/PASID,找到上下文描述符表及對(duì)應(yīng)的 CD。再根據(jù) CD 中的信息找到地址轉(zhuǎn)換表,并通過地址轉(zhuǎn)換表完成最終的地址轉(zhuǎn)換。

Linux 內(nèi)核的 IOMMU 子系統(tǒng)相關(guān)源碼位于 drivers/iommu,ARM SMMU 驅(qū)動(dòng)實(shí)現(xiàn)位于 drivers/iommu/arm/arm-smmu-v3。在 Linux 內(nèi)核的 SMMU 驅(qū)動(dòng)實(shí)現(xiàn)中,做地址轉(zhuǎn)換所用到的數(shù)據(jù)結(jié)構(gòu),在上面提到的不同步驟中創(chuàng)建:

  • 流表在 SMMU 驅(qū)動(dòng)程序的初始化過程中創(chuàng)建。如果流表的結(jié)構(gòu)為線性流表,則線性流表中所有的 STE 都被配置為旁路 SMMU,即對(duì)應(yīng)的流不做 SMMU 地址轉(zhuǎn)換;如果流表的結(jié)構(gòu)為 2 級(jí)流表,則流表中為無效的 L1 流表描述符。
  • 系統(tǒng) I/O 設(shè)備發(fā)現(xiàn)、探測(cè),并和驅(qū)動(dòng)程序綁定初始化的過程中,設(shè)備和 IOMMU 綁定時(shí),創(chuàng)建上下文描述符表。如果流表的結(jié)構(gòu)為 2 級(jí)流表,這個(gè)過程會(huì)先創(chuàng)建第 2 級(jí)流表,第 2 級(jí)流表中的 STE 都被配置為旁路 SMMU。創(chuàng)建上下文描述符表時(shí),同樣分是否需要 2 級(jí)上下文描述符表來執(zhí)行。上下文描述符表創(chuàng)建之后,其地址被寫入 STE。
  • 系統(tǒng) I/O 設(shè)備驅(qū)動(dòng)程序分配內(nèi)存的過程中創(chuàng)建地址轉(zhuǎn)換表。這個(gè)過程中,SMMU 驅(qū)動(dòng)程序的回調(diào)會(huì)被調(diào)用,以將地址轉(zhuǎn)換表的地址放進(jìn) CD 中。

Linux 內(nèi)核中 SMMU 的數(shù)據(jù)結(jié)構(gòu)

Linux 內(nèi)核的 IOMMU 子系統(tǒng)用 struct iommu_device 結(jié)構(gòu)體表示一個(gè) IOMMU 硬件設(shè)備實(shí)例,并用 struct iommu_ops 結(jié)構(gòu)體描述 IOMMU 硬件設(shè)備實(shí)例支持的操作和能力,這兩個(gè)結(jié)構(gòu)體定義 (位于 include/linux/iommu.h 文件中) 如下:

/**
 * struct iommu_ops - iommu ops and capabilities
 * @capable: check capability
 * @domain_alloc: allocate iommu domain
 * @domain_free: free iommu domain
 * @attach_dev: attach device to an iommu domain
 * @detach_dev: detach device from an iommu domain
 * @map: map a physically contiguous memory region to an iommu domain
 * @unmap: unmap a physically contiguous memory region from an iommu domain
 * @flush_iotlb_all: Synchronously flush all hardware TLBs for this domain
 * @iotlb_sync_map: Sync mappings created recently using @map to the hardware
 * @iotlb_sync: Flush all queued ranges from the hardware TLBs and empty flush
 *            queue
 * @iova_to_phys: translate iova to physical address
 * @probe_device: Add device to iommu driver handling
 * @release_device: Remove device from iommu driver handling
 * @probe_finalize: Do final setup work after the device is added to an IOMMU
 *                  group and attached to the groups domain
 * @device_group: find iommu group for a particular device
 * @domain_get_attr: Query domain attributes
 * @domain_set_attr: Change domain attributes
 * @support_dirty_log: Check whether domain supports dirty log tracking
 * @switch_dirty_log: Perform actions to start|stop dirty log tracking
 * @sync_dirty_log: Sync dirty log from IOMMU into a dirty bitmap
 * @clear_dirty_log: Clear dirty log of IOMMU by a mask bitmap
 * @get_resv_regions: Request list of reserved regions for a device
 * @put_resv_regions: Free list of reserved regions for a device
 * @apply_resv_region: Temporary helper call-back for iova reserved ranges
 * @domain_window_enable: Configure and enable a particular window for a domain
 * @domain_window_disable: Disable a particular window for a domain
 * @of_xlate: add OF master IDs to iommu grouping
 * @is_attach_deferred: Check if domain attach should be deferred from iommu
 *                      driver init to device driver init (default no)
 * @dev_has/enable/disable_feat: per device entries to check/enable/disable
 *                               iommu specific features.
 * @dev_feat_enabled: check enabled feature
 * @aux_attach/detach_dev: aux-domain specific attach/detach entries.
 * @aux_get_pasid: get the pasid given an aux-domain
 * @sva_bind: Bind process address space to device
 * @sva_unbind: Unbind process address space from device
 * @sva_get_pasid: Get PASID associated to a SVA handle
 * @page_response: handle page request response
 * @cache_invalidate: invalidate translation caches
 * @sva_bind_gpasid: bind guest pasid and mm
 * @sva_unbind_gpasid: unbind guest pasid and mm
 * @def_domain_type: device default domain type, return value:
 *		- IOMMU_DOMAIN_IDENTITY: must use an identity domain
 *		- IOMMU_DOMAIN_DMA: must use a dma domain
 *		- 0: use the default setting
 * @attach_pasid_table: attach a pasid table
 * @detach_pasid_table: detach the pasid table
 * @pgsize_bitmap: bitmap of all possible supported page sizes
 * @owner: Driver module providing these ops
 */
struct iommu_ops {
	bool (*capable)(enum iommu_cap);

	/* Domain allocation and freeing by the iommu driver */
	struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type);
	void (*domain_free)(struct iommu_domain *);

	int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
	void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
	int (*map)(struct iommu_domain *domain, unsigned long iova,
		   phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
		     size_t size, struct iommu_iotlb_gather *iotlb_gather);
	void (*flush_iotlb_all)(struct iommu_domain *domain);
	void (*iotlb_sync_map)(struct iommu_domain *domain, unsigned long iova,
			       size_t size);
	void (*iotlb_sync)(struct iommu_domain *domain,
			   struct iommu_iotlb_gather *iotlb_gather);
	phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova);
	struct iommu_device *(*probe_device)(struct device *dev);
	void (*release_device)(struct device *dev);
	void (*probe_finalize)(struct device *dev);
	struct iommu_group *(*device_group)(struct device *dev);
	int (*domain_get_attr)(struct iommu_domain *domain,
			       enum iommu_attr attr, void *data);
	int (*domain_set_attr)(struct iommu_domain *domain,
			       enum iommu_attr attr, void *data);

	/*
	 * Track dirty log. Note: Don't concurrently call these interfaces with
	 * other ops that access underlying page table.
	 */
	bool (*support_dirty_log)(struct iommu_domain *domain);
	int (*switch_dirty_log)(struct iommu_domain *domain, bool enable,
				unsigned long iova, size_t size, int prot);
	int (*sync_dirty_log)(struct iommu_domain *domain,
			      unsigned long iova, size_t size,
			      unsigned long *bitmap, unsigned long base_iova,
			      unsigned long bitmap_pgshift);
	int (*clear_dirty_log)(struct iommu_domain *domain,
			       unsigned long iova, size_t size,
			       unsigned long *bitmap, unsigned long base_iova,
			       unsigned long bitmap_pgshift);

	/* Request/Free a list of reserved regions for a device */
	void (*get_resv_regions)(struct device *dev, struct list_head *list);
	void (*put_resv_regions)(struct device *dev, struct list_head *list);
	void (*apply_resv_region)(struct device *dev,
				  struct iommu_domain *domain,
				  struct iommu_resv_region *region);

	/* Window handling functions */
	int (*domain_window_enable)(struct iommu_domain *domain, u32 wnd_nr,
				    phys_addr_t paddr, u64 size, int prot);
	void (*domain_window_disable)(struct iommu_domain *domain, u32 wnd_nr);

	int (*of_xlate)(struct device *dev, struct of_phandle_args *args);
	bool (*is_attach_deferred)(struct iommu_domain *domain, struct device *dev);

	/* Per device IOMMU features */
	bool (*dev_has_feat)(struct device *dev, enum iommu_dev_features f);
	bool (*dev_feat_enabled)(struct device *dev, enum iommu_dev_features f);
	int (*dev_enable_feat)(struct device *dev, enum iommu_dev_features f);
	int (*dev_disable_feat)(struct device *dev, enum iommu_dev_features f);

	/* Aux-domain specific attach/detach entries */
	int (*aux_attach_dev)(struct iommu_domain *domain, struct device *dev);
	void (*aux_detach_dev)(struct iommu_domain *domain, struct device *dev);
	int (*aux_get_pasid)(struct iommu_domain *domain, struct device *dev);

	struct iommu_sva *(*sva_bind)(struct device *dev, struct mm_struct *mm,
				      void *drvdata);
	void (*sva_unbind)(struct iommu_sva *handle);
	u32 (*sva_get_pasid)(struct iommu_sva *handle);

	int (*page_response)(struct device *dev,
			     struct iommu_fault_event *evt,
			     struct iommu_page_response *msg);
	int (*cache_invalidate)(struct iommu_domain *domain, struct device *dev,
				struct iommu_cache_invalidate_info *inv_info);
	int (*sva_bind_gpasid)(struct iommu_domain *domain,
			struct device *dev, struct iommu_gpasid_bind_data *data);

	int (*sva_unbind_gpasid)(struct device *dev, u32 pasid);
	int (*attach_pasid_table)(struct iommu_domain *domain,
				  struct iommu_pasid_table_config *cfg);
	void (*detach_pasid_table)(struct iommu_domain *domain);

	int (*def_domain_type)(struct device *dev);
	int (*dev_get_config)(struct device *dev, int type, void *data);
	int (*dev_set_config)(struct device *dev, int type, void *data);

	unsigned long pgsize_bitmap;
	struct module *owner;
};

/**
 * struct iommu_device - IOMMU core representation of one IOMMU hardware
 *			 instance
 * @list: Used by the iommu-core to keep a list of registered iommus
 * @ops: iommu-ops for talking to this iommu
 * @dev: struct device for sysfs handling
 */
struct iommu_device {
	struct list_head list;
	const struct iommu_ops *ops;
	struct fwnode_handle *fwnode;
	struct device *dev;
};

SMMU 驅(qū)動(dòng)程序創(chuàng)建 struct iommu_devicestruct iommu_ops 結(jié)構(gòu)體的實(shí)例并注冊(cè)進(jìn) IOMMU 子系統(tǒng)中。

Linux 內(nèi)核的 IOMMU 子系統(tǒng)用 struct dev_iommu 結(jié)構(gòu)體表示一個(gè)連接到 IOMMU 的系統(tǒng) I/O 設(shè)備,用 struct iommu_fwspec 表示系統(tǒng) I/O 設(shè)備連接的 IOMMU 設(shè)備,這幾個(gè)結(jié)構(gòu)體定義 (位于 include/linux/iommu.h 文件中) 如下:

struct fwnode_handle {
	struct fwnode_handle *secondary;
	const struct fwnode_operations *ops;
	struct device *dev;
};
 . . . . . .
/**
 * struct dev_iommu - Collection of per-device IOMMU data
 *
 * @fault_param: IOMMU detected device fault reporting data
 * @iopf_param:	 I/O Page Fault queue and data
 * @fwspec:	 IOMMU fwspec data
 * @iommu_dev:	 IOMMU device this device is linked to
 * @priv:	 IOMMU Driver private data
 *
 * TODO: migrate other per device data pointers under iommu_dev_data, e.g.
 *	struct iommu_group	*iommu_group;
 */
struct dev_iommu {
	struct mutex lock;
	struct iommu_fault_param	*fault_param;
	struct iopf_device_param	*iopf_param;
	struct iommu_fwspec		*fwspec;
	struct iommu_device		*iommu_dev;
	void				*priv;
};
 . . . . . .
/**
 * struct iommu_fwspec - per-device IOMMU instance data
 * @ops: ops for this device's IOMMU
 * @iommu_fwnode: firmware handle for this device's IOMMU
 * @iommu_priv: IOMMU driver private data for this device
 * @num_ids: number of associated device IDs
 * @ids: IDs which this device may present to the IOMMU
 */
struct iommu_fwspec {
	const struct iommu_ops	*ops;
	struct fwnode_handle	*iommu_fwnode;
	u32			flags;
	unsigned int		num_ids;
	u32			ids[];
};

在 IOMMU 中,每一個(gè) domain 即代表一個(gè) IOMMU 映射地址空間,即一個(gè) page table。一個(gè) group 邏輯上是需要與 domain 進(jìn)行綁定的,即一個(gè) group 中的所有設(shè)備都位于一個(gè) domain 中。在 Linux 內(nèi)核的 IOMMU 子系統(tǒng)中,domain 由 struct iommu_domain 結(jié)構(gòu)體表示,這個(gè)結(jié)構(gòu)體定義 (位于 include/linux/iommu.h 文件中) 如下:

struct iommu_domain {
	unsigned type;
	const struct iommu_ops *ops;
	unsigned long pgsize_bitmap;	/* Bitmap of page sizes in use */
	iommu_fault_handler_t handler;
	void *handler_token;
	struct iommu_domain_geometry geometry;
	void *iova_cookie;
	struct mutex switch_log_lock;
};

Linux 內(nèi)核的 IOMMU 子系統(tǒng)用 struct iommu_group 結(jié)構(gòu)體表示位于同一個(gè) domain 的設(shè)備組,并用 struct group_device 結(jié)構(gòu)體表示設(shè)備組中的一個(gè)設(shè)備。這兩個(gè)結(jié)構(gòu)體定義 (位于 drivers/iommu/iommu.c 文件中) 如下:

struct iommu_group {
	struct kobject kobj;
	struct kobject *devices_kobj;
	struct list_head devices;
	struct mutex mutex;
	struct blocking_notifier_head notifier;
	void *iommu_data;
	void (*iommu_data_release)(void *iommu_data);
	char *name;
	int id;
	struct iommu_domain *default_domain;
	struct iommu_domain *domain;
	struct list_head entry;
};

struct group_device {
	struct list_head list;
	struct device *dev;
	char *name;
};

以面向?qū)ο蟮木幊谭椒▉砜?,可以認(rèn)為在 ARM SMMUv3 驅(qū)動(dòng)程序中,struct iommu_devicestruct iommu_domain 結(jié)構(gòu)體有其特定的實(shí)現(xiàn),即 struct arm_smmu_devicestruct arm_smmu_domain 結(jié)構(gòu)體繼承了 struct iommu_devicestruct iommu_domain 結(jié)構(gòu)體,這兩個(gè)結(jié)構(gòu)體定義 (位于 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 文件中) 如下:

/* An SMMUv3 instance */
struct arm_smmu_device {
	struct device			*dev;
	void __iomem			*base;
	void __iomem			*page1;

#define ARM_SMMU_FEAT_2_LVL_STRTAB	(1 << 0)
#define ARM_SMMU_FEAT_2_LVL_CDTAB	(1 << 1)
#define ARM_SMMU_FEAT_TT_LE		(1 << 2)
#define ARM_SMMU_FEAT_TT_BE		(1 << 3)
#define ARM_SMMU_FEAT_PRI		(1 << 4)
#define ARM_SMMU_FEAT_ATS		(1 << 5)
#define ARM_SMMU_FEAT_SEV		(1 << 6)
#define ARM_SMMU_FEAT_MSI		(1 << 7)
#define ARM_SMMU_FEAT_COHERENCY		(1 << 8)
#define ARM_SMMU_FEAT_TRANS_S1		(1 << 9)
#define ARM_SMMU_FEAT_TRANS_S2		(1 << 10)
#define ARM_SMMU_FEAT_STALLS		(1 << 11)
#define ARM_SMMU_FEAT_HYP		(1 << 12)
#define ARM_SMMU_FEAT_STALL_FORCE	(1 << 13)
#define ARM_SMMU_FEAT_VAX		(1 << 14)
#define ARM_SMMU_FEAT_RANGE_INV		(1 << 15)
#define ARM_SMMU_FEAT_BTM		(1 << 16)
#define ARM_SMMU_FEAT_SVA		(1 << 17)
#define ARM_SMMU_FEAT_E2H		(1 << 18)
#define ARM_SMMU_FEAT_HA		(1 << 19)
#define ARM_SMMU_FEAT_HD		(1 << 20)
#define ARM_SMMU_FEAT_BBML1		(1 << 21)
#define ARM_SMMU_FEAT_BBML2		(1 << 22)
#define ARM_SMMU_FEAT_ECMDQ		(1 << 23)
#define ARM_SMMU_FEAT_MPAM		(1 << 24)
	u32				features;

#define ARM_SMMU_OPT_SKIP_PREFETCH	(1 << 0)
#define ARM_SMMU_OPT_PAGE0_REGS_ONLY	(1 << 1)
#define ARM_SMMU_OPT_MSIPOLL		(1 << 2)
	u32				options;

	union {
		u32			nr_ecmdq;
		u32			ecmdq_enabled;
	};
	struct arm_smmu_ecmdq *__percpu	*ecmdq;

	struct arm_smmu_cmdq		cmdq;
	struct arm_smmu_evtq		evtq;
	struct arm_smmu_priq		priq;

	int				gerr_irq;
	int				combined_irq;

	unsigned long			ias; /* IPA */
	unsigned long			oas; /* PA */
	unsigned long			pgsize_bitmap;

#define ARM_SMMU_MAX_ASIDS		(1 << 16)
	unsigned int			asid_bits;

#define ARM_SMMU_MAX_VMIDS		(1 << 16)
	unsigned int			vmid_bits;
	DECLARE_BITMAP(vmid_map, ARM_SMMU_MAX_VMIDS);

	unsigned int			ssid_bits;
	unsigned int			sid_bits;

	struct arm_smmu_strtab_cfg	strtab_cfg;

	/* IOMMU core code handle */
	struct iommu_device		iommu;

	struct rb_root			streams;
	struct mutex			streams_mutex;

	unsigned int			mpam_partid_max;
	unsigned int			mpam_pmg_max;

	bool				bypass;
};
 . . . . . .
struct arm_smmu_domain {
	struct arm_smmu_device		*smmu;
	struct mutex			init_mutex; /* Protects smmu pointer */

	struct io_pgtable_ops		*pgtbl_ops;
	bool				stall_enabled;
	bool				non_strict;
	atomic_t			nr_ats_masters;

	enum arm_smmu_domain_stage	stage;
	union {
		struct arm_smmu_s1_cfg	s1_cfg;
		struct arm_smmu_s2_cfg	s2_cfg;
	};

	struct iommu_domain		domain;

	/* Unused in aux domains */
	struct list_head		devices;
	spinlock_t			devices_lock;

	struct list_head		mmu_notifiers;

	/* Auxiliary domain stuff */
	struct arm_smmu_domain		*parent;
	ioasid_t			ssid;
	unsigned long			aux_nr_devs;
};

在 ARM SMMUv3 驅(qū)動(dòng)程序中,用 struct arm_smmu_master 結(jié)構(gòu)體描述連接到 SMMU 的系統(tǒng) I/O 設(shè)備的 SMMU 私有數(shù)據(jù),這個(gè)結(jié)構(gòu)體定義 (位于 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 文件中) 如下:

struct arm_smmu_stream {
	u32				id;
	struct arm_smmu_master		*master;
	struct rb_node			node;
};

/* SMMU private data for each master */
struct arm_smmu_master {
	struct arm_smmu_device		*smmu;
	struct device			*dev;
	struct arm_smmu_domain		*domain;
	struct list_head		domain_head;
	struct arm_smmu_stream		*streams;
	unsigned int			num_streams;
	bool				ats_enabled;
	bool				stall_enabled;
	bool				pri_supported;
	bool				prg_resp_needs_ssid;
	bool				sva_enabled;
	bool				iopf_enabled;
	bool				auxd_enabled;
	struct list_head		bonds;
	unsigned int			ssid_bits;
};

以面向?qū)ο蟮木幊谭椒▉砜?,可以認(rèn)為 struct arm_smmu_master 結(jié)構(gòu)體繼承了 struct dev_iommu 結(jié)構(gòu)體。

Linux 內(nèi)核中 SMMU 的數(shù)據(jù)結(jié)構(gòu)大體有如下的結(jié)構(gòu)關(guān)系:

上面這些數(shù)據(jù)結(jié)構(gòu),基本上都包含指向 struct device 對(duì)象的指針,struct device 則包含指向幾個(gè)關(guān)鍵 IOMMU 對(duì)象的指針。struct device 對(duì)象是各個(gè)部分的中介者,相關(guān)的各個(gè)子系統(tǒng)多通過 struct device 對(duì)象找到它需要的操作或數(shù)據(jù)。struct device 結(jié)構(gòu)體中與 IOMMU 相關(guān)的字段主要有如下這些:

struct device {
#ifdef CONFIG_DMA_OPS
	const struct dma_map_ops *dma_ops;
#endif
 . . . . . .
#ifdef CONFIG_DMA_DECLARE_COHERENT
	struct dma_coherent_mem	*dma_mem; /* internal for coherent mem
					     override */
#endif
 . . . . . .
	struct iommu_group	*iommu_group;
	struct dev_iommu	*iommu;
 . . . . . .
#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
    defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \
    defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
	bool			dma_coherent:1;
#endif
#ifdef CONFIG_DMA_OPS_BYPASS
	bool			dma_ops_bypass : 1;
#endif
};

除了 IOMMU 子系統(tǒng)的這些數(shù)據(jù)結(jié)構(gòu)外,在更底層的 SMMU 驅(qū)動(dòng)程序?qū)崿F(xiàn)中,還定義了許多特定于硬件的數(shù)據(jù)結(jié)構(gòu),如:

  • 命令隊(duì)列項(xiàng) struct arm_smmu_cmdq_ent,
  • 命令隊(duì)列 struct arm_smmu_cmdq,
  • 擴(kuò)展命令隊(duì)列 struct arm_smmu_ecmdq,
  • 事件隊(duì)列 struct arm_smmu_evtq
  • PRI 隊(duì)列 struct arm_smmu_priq
  • 2 級(jí)流表中的 L1 流表描述符 struct arm_smmu_strtab_l1_desc,
  • 上下文描述符 struct arm_smmu_ctx_desc
  • 2 級(jí)上下文描述符表中的 L1 表描述符 struct arm_smmu_l1_ctx_desc
  • 上下文描述符配置 struct arm_smmu_ctx_desc_cfg
  • 第 1 階段轉(zhuǎn)換配置 struct arm_smmu_s1_cfg
  • 第 2 階段轉(zhuǎn)換配置 struct arm_smmu_s2_cfg
  • 流表配置 struct arm_smmu_strtab_cfg

特定于硬件的這些數(shù)據(jù)結(jié)構(gòu)基本上與 ARM 官方的硬件說明文檔 SMMU 軟件指南 和 ARM 系統(tǒng)內(nèi)存管理單元架構(gòu)規(guī)范版本 3 中提到的數(shù)據(jù)結(jié)構(gòu)嚴(yán)格的一一對(duì)應(yīng)。

SMMU 相關(guān)的操作及過程,和對(duì) SMMU 的訪問,基于上面這些數(shù)據(jù)結(jié)構(gòu)實(shí)現(xiàn),這種實(shí)現(xiàn)的分層結(jié)構(gòu)大體如下圖所示:

系統(tǒng) I/O 設(shè)備發(fā)現(xiàn)、探測(cè),并和驅(qū)動(dòng)程序綁定初始化的過程及系統(tǒng) I/O 設(shè)備驅(qū)動(dòng)程序通常調(diào)用平臺(tái)設(shè)備子系統(tǒng)和 DMA 子系統(tǒng)提供的接口,如平臺(tái)設(shè)備子系統(tǒng)的 of_dma_configure()/of_dma_configure_id() 和 DMA 子系統(tǒng)的 dma_alloc_coherent() 函數(shù),這些函數(shù)的實(shí)現(xiàn)中,借助于更底層的模塊完成。

SMMUv3 設(shè)備驅(qū)動(dòng)程序的初始化

Linux 內(nèi)核啟動(dòng)早期,會(huì)執(zhí)行 IOMMU 初始化,這主要是執(zhí)行 iommu_init() 函數(shù),它創(chuàng)建并添加 iommu_groups kset,這個(gè)函數(shù)定義 (位于 drivers/iommu/iommu.c 文件中) 如下:

static int __init iommu_init(void)
{
	iommu_group_kset = kset_create_and_add("iommu_groups",
					       NULL, kernel_kobj);
	BUG_ON(!iommu_group_kset);

	iommu_debugfs_setup();

	return 0;
}
core_initcall(iommu_init);

Linux 內(nèi)核啟動(dòng)時(shí),可以傳入一些配置 IOMMU 的命令行參數(shù),這包括用于配置默認(rèn) domain 類型的 iommu.passthrough、用于配置 DMA setup 的 iommu.strict 和用于配置等待掛起的頁請(qǐng)求的頁相應(yīng)的超時(shí)時(shí)間的 iommu.prq_timeout。Linux 內(nèi)核啟動(dòng)早期,會(huì)初始化 IOMMU 子系統(tǒng),如果沒有通過 Linux 內(nèi)核的命令行參數(shù)配置 IOMMU,則會(huì)設(shè)置默認(rèn)的 domain 類型,相關(guān)代碼 (位于 drivers/iommu/iommu.c 文件中) 如下:

static unsigned int iommu_def_domain_type __read_mostly;
static bool iommu_dma_strict __read_mostly;
static u32 iommu_cmd_line __read_mostly;

/*
 * Timeout to wait for page response of a pending page request. This is
 * intended as a basic safety net in case a pending page request is not
 * responded for an exceptionally long time. Device may also implement
 * its own protection mechanism against this exception.
 * Units are in jiffies with a range between 1 - 100 seconds equivalent.
 * Default to 10 seconds.
 * Setting 0 means no timeout tracking.
 */
#define IOMMU_PAGE_RESPONSE_MAX_TIMEOUT (HZ * 100)
#define IOMMU_PAGE_RESPONSE_DEF_TIMEOUT (HZ * 10)
static unsigned long prq_timeout = IOMMU_PAGE_RESPONSE_DEF_TIMEOUT;
 . . . . . .
#define IOMMU_CMD_LINE_DMA_API		BIT(0)

static void iommu_set_cmd_line_dma_api(void)
{
	iommu_cmd_line |= IOMMU_CMD_LINE_DMA_API;
}

static bool iommu_cmd_line_dma_api(void)
{
	return !!(iommu_cmd_line & IOMMU_CMD_LINE_DMA_API);
}
 . . . . . .
/*
 * Use a function instead of an array here because the domain-type is a
 * bit-field, so an array would waste memory.
 */
static const char *iommu_domain_type_str(unsigned int t)
{
	switch (t) {
	case IOMMU_DOMAIN_BLOCKED:
		return "Blocked";
	case IOMMU_DOMAIN_IDENTITY:
		return "Passthrough";
	case IOMMU_DOMAIN_UNMANAGED:
		return "Unmanaged";
	case IOMMU_DOMAIN_DMA:
		return "Translated";
	default:
		return "Unknown";
	}
}

static int __init iommu_subsys_init(void)
{
	bool cmd_line = iommu_cmd_line_dma_api();

	if (!cmd_line) {
		if (IS_ENABLED(CONFIG_IOMMU_DEFAULT_PASSTHROUGH))
			iommu_set_default_passthrough(false);
		else
			iommu_set_default_translated(false);

		if (iommu_default_passthrough() && mem_encrypt_active()) {
			pr_info("Memory encryption detected - Disabling default IOMMU Passthrough\n");
			iommu_set_default_translated(false);
		}
	}

	pr_info("Default domain type: %s %s\n",
		iommu_domain_type_str(iommu_def_domain_type),
		cmd_line ? "(set via kernel command line)" : "");

	return 0;
}
subsys_initcall(iommu_subsys_init);
 . . . . . .
static int __init iommu_set_def_domain_type(char *str)
{
	bool pt;
	int ret;

	ret = kstrtobool(str, &pt);
	if (ret)
		return ret;

	if (pt)
		iommu_set_default_passthrough(true);
	else
		iommu_set_default_translated(true);

	return 0;
}
early_param("iommu.passthrough", iommu_set_def_domain_type);

static int __init iommu_dma_setup(char *str)
{
	return kstrtobool(str, &iommu_dma_strict);
}
early_param("iommu.strict", iommu_dma_setup);

static int __init iommu_set_prq_timeout(char *str)
{
	int ret;
	unsigned long timeout;

	if (!str)
		return -EINVAL;

	ret = kstrtoul(str, 10, &timeout);
	if (ret)
		return ret;
	timeout = timeout * HZ;
	if (timeout > IOMMU_PAGE_RESPONSE_MAX_TIMEOUT)
		return -EINVAL;
	prq_timeout = timeout;

	return 0;
}
early_param("iommu.prq_timeout", iommu_set_prq_timeout);
 . . . . . .
void iommu_set_default_passthrough(bool cmd_line)
{
	if (cmd_line)
		iommu_set_cmd_line_dma_api();

	iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY;
}

void iommu_set_default_translated(bool cmd_line)
{
	if (cmd_line)
		iommu_set_cmd_line_dma_api();

	iommu_def_domain_type = IOMMU_DOMAIN_DMA;
}

core_initcall 的函數(shù)比 subsys_initcall 的函數(shù)執(zhí)行地更早。

IOMMU 子系統(tǒng)初始化之后,就輪到 SMMU 設(shè)備驅(qū)動(dòng)程序上場了。SMMUv3 本身是一個(gè)平臺(tái)設(shè)備,其硬件設(shè)備信息,包括寄存器映射地址范圍,中斷號(hào)等使用的資源,在設(shè)備樹 dts/dtsi 文件中描述。SMMUv3 設(shè)備在設(shè)備樹文件中的示例設(shè)備節(jié)點(diǎn) (位于 arch/arm64/boot/dts/arm/fvp-base-revc.dts 文件中) 如下:

	smmu: iommu@2b400000 {
		compatible = "arm,smmu-v3";
		reg = <0x0 0x2b400000 0x0 0x100000>;
		interrupts = <GIC_SPI 74 IRQ_TYPE_EDGE_RISING>,
			     <GIC_SPI 79 IRQ_TYPE_EDGE_RISING>,
			     <GIC_SPI 75 IRQ_TYPE_EDGE_RISING>,
			     <GIC_SPI 77 IRQ_TYPE_EDGE_RISING>;
		interrupt-names = "eventq", "gerror", "priq", "cmdq-sync";
		dma-coherent;
		#iommu-cells = <1>;
		msi-parent = <&its 0x10000>;
	};

SMMUv3 設(shè)備驅(qū)動(dòng)程序加載的入口為 arm_smmu_device_probe() 函數(shù),這個(gè)函數(shù)定義 (位于 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 文件中) 如下:

static struct arm_smmu_option_prop arm_smmu_options[] = {
	{ ARM_SMMU_OPT_SKIP_PREFETCH, "hisilicon,broken-prefetch-cmd" },
	{ ARM_SMMU_OPT_PAGE0_REGS_ONLY, "cavium,cn9900-broken-page1-regspace"},
	{ 0, NULL},
};

static void parse_driver_options(struct arm_smmu_device *smmu)
{
	int i = 0;

	do {
		if (of_property_read_bool(smmu->dev->of_node,
						arm_smmu_options[i].prop)) {
			smmu->options |= arm_smmu_options[i].opt;
			dev_notice(smmu->dev, "option %s\n",
				arm_smmu_options[i].prop);
		}
	} while (arm_smmu_options[++i].opt);
}
 . . . . . .
static int arm_smmu_device_dt_probe(struct platform_device *pdev,
				    struct arm_smmu_device *smmu)
{
	struct device *dev = &pdev->dev;
	u32 cells;
	int ret = -EINVAL;

	if (of_property_read_u32(dev->of_node, "#iommu-cells", &cells))
		dev_err(dev, "missing #iommu-cells property\n");
	else if (cells != 1)
		dev_err(dev, "invalid #iommu-cells value (%d)\n", cells);
	else
		ret = 0;

	parse_driver_options(smmu);

	if (of_dma_is_coherent(dev->of_node))
		smmu->features |= ARM_SMMU_FEAT_COHERENCY;

	return ret;
}

static unsigned long arm_smmu_resource_size(struct arm_smmu_device *smmu)
{
	if (smmu->options & ARM_SMMU_OPT_PAGE0_REGS_ONLY)
		return SZ_64K;
	else
		return SZ_128K;
}
 . . . . . .
static void __iomem *arm_smmu_ioremap(struct device *dev, resource_size_t start,
				      resource_size_t size)
{
	struct resource res = DEFINE_RES_MEM(start, size);

	return devm_ioremap_resource(dev, &res);
}
 . . . . . .
static int arm_smmu_device_probe(struct platform_device *pdev)
{
	int irq, ret;
	struct resource *res;
	resource_size_t ioaddr;
	struct arm_smmu_device *smmu;
	struct device *dev = &pdev->dev;

	smmu = devm_kzalloc(dev, sizeof(*smmu), GFP_KERNEL);
	if (!smmu) {
		dev_err(dev, "failed to allocate arm_smmu_device\n");
		return -ENOMEM;
	}
	smmu->dev = dev;

	if (dev->of_node) {
		ret = arm_smmu_device_dt_probe(pdev, smmu);
	} else {
		ret = arm_smmu_device_acpi_probe(pdev, smmu);
		if (ret == -ENODEV)
			return ret;
	}

	/* Set bypass mode according to firmware probing result */
	smmu->bypass = !!ret;

	/* Base address */
	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
	if (!res)
		return -EINVAL;
	if (resource_size(res) < arm_smmu_resource_size(smmu)) {
		dev_err(dev, "MMIO region too small (%pr)\n", res);
		return -EINVAL;
	}
	ioaddr = res->start;

	/*
	 * Don't map the IMPLEMENTATION DEFINED regions, since they may contain
	 * the PMCG registers which are reserved by the PMU driver.
	 */
	smmu->base = arm_smmu_ioremap(dev, ioaddr, ARM_SMMU_REG_SZ);
	if (IS_ERR(smmu->base))
		return PTR_ERR(smmu->base);

	if (arm_smmu_resource_size(smmu) > SZ_64K) {
		smmu->page1 = arm_smmu_ioremap(dev, ioaddr + SZ_64K,
					       ARM_SMMU_REG_SZ);
		if (IS_ERR(smmu->page1))
			return PTR_ERR(smmu->page1);
	} else {
		smmu->page1 = smmu->base;
	}

	/* Interrupt lines */

	irq = platform_get_irq_byname_optional(pdev, "combined");
	if (irq > 0)
		smmu->combined_irq = irq;
	else {
		irq = platform_get_irq_byname_optional(pdev, "eventq");
		if (irq > 0)
			smmu->evtq.q.irq = irq;

		irq = platform_get_irq_byname_optional(pdev, "priq");
		if (irq > 0)
			smmu->priq.q.irq = irq;

		irq = platform_get_irq_byname_optional(pdev, "gerror");
		if (irq > 0)
			smmu->gerr_irq = irq;
	}
	/* Probe the h/w */
	ret = arm_smmu_device_hw_probe(smmu);
	if (ret)
		return ret;

	/* Initialise in-memory data structures */
	ret = arm_smmu_init_structures(smmu);
	if (ret)
		return ret;

	/* Record our private device structure */
	platform_set_drvdata(pdev, smmu);

	/* Reset the device */
	ret = arm_smmu_device_reset(smmu, false);
	if (ret)
		return ret;

	/* And we're up. Go go go! */
	ret = iommu_device_sysfs_add(&smmu->iommu, dev, NULL,
				     "smmu3.%pa", &ioaddr);
	if (ret)
		return ret;

	iommu_device_set_ops(&smmu->iommu, &arm_smmu_ops);
	iommu_device_set_fwnode(&smmu->iommu, dev->fwnode);

	ret = iommu_device_register(&smmu->iommu);
	if (ret) {
		dev_err(dev, "Failed to register iommu\n");
		return ret;
	}

	return arm_smmu_set_bus_ops(&arm_smmu_ops);
}

arm_smmu_device_probe() 函數(shù)主要做了如下幾件事情:

  1. 分配 struct arm_smmu_device 對(duì)象,這個(gè)對(duì)象用來在 IOMMU 子系統(tǒng)中描述 SMMUv3 設(shè)備。
  2. 獲取設(shè)備樹文件 dts/dtsi 中的 SMMUv3 設(shè)備節(jié)點(diǎn)中包含的信息,和引用的資源,這主要包括:
    • 關(guān)于 SMMUv3 設(shè)備的信息,如 iommu-cells,其值必須為 1;options,如是否只有寄存器頁 0 等;SMMU 是否支持 coherent,這主要由設(shè)備樹文件中的設(shè)備節(jié)點(diǎn)的 dma-coherent 屬性表示;
    • SMMUv3 設(shè)備的寄存器映射,arm_smmu_device_probe() 函數(shù)會(huì)根據(jù) options 的值檢查寄存器映射的范圍大小是否與預(yù)期匹配,并重映射 SMMUv3 設(shè)備的寄存器映射;
    • SMMUv3 設(shè)備引用的中斷資源,包括用于命令隊(duì)列、事件隊(duì)列和全局錯(cuò)誤的中斷資源。
  3. 探測(cè) SMMUv3 設(shè)備的硬件特性,這主要按照 ARM 系統(tǒng)內(nèi)存管理單元架構(gòu)規(guī)范版本 3 中定義的寄存器 SMMU_IDR0SMMU_IDR1、SMMU_IDR3、和 SMMU_IDR5 (另外一些用于提供信息的只讀寄存器包含的信息和 SMMUv3 硬件設(shè)備的特性關(guān)系不大,SMMU_IDR2 包含 SMMU 為非安全編程接口實(shí)現(xiàn)的特性相關(guān)的信息,SMMU_IDR4 是一個(gè) SMMU 實(shí)現(xiàn)定義的寄存器,SMMU_IIDR 寄存器包含 SMMU 的實(shí)現(xiàn)和實(shí)現(xiàn)者的信息,以及由實(shí)現(xiàn)定義的支持的架構(gòu)版本信息,SMMU_AIDR 寄存器包含 SMMU 實(shí)現(xiàn)遵從的 SMMU 架構(gòu)版本號(hào)信息) 的各個(gè)字段,確認(rèn)實(shí)際的 SMMUv3 硬件設(shè)備支持的特性,這主要通過調(diào)用 arm_smmu_device_hw_probe() 函數(shù)完成。
  4. 初始化數(shù)據(jù)結(jié)構(gòu),這主要包括幾個(gè)隊(duì)列和流表,隊(duì)列包括命令隊(duì)列、事件隊(duì)列和 PRIQ 隊(duì)列。對(duì)于流表的初始化,分兩種情況,如果流表的結(jié)構(gòu)為線性流表,則線性流表中所有的 STE 都被配置為旁路 SMMU;如果流表的結(jié)構(gòu)為 2 級(jí)流表,則流表中為無效的 L1 流表描述符,這主要通過調(diào)用 arm_smmu_init_structures() 函數(shù)完成。
  5. 在設(shè)備結(jié)構(gòu) struct platform_device 對(duì)象的私有字段中記錄 struct arm_smmu_device 對(duì)象。
  6. 復(fù)位 SMMUv3 設(shè)備,這主要包括通過 SMMU_CR0 等寄存器復(fù)位硬件設(shè)備,設(shè)置流表基址寄存器等;以及設(shè)置中斷,包括向系統(tǒng)請(qǐng)求中斷及注冊(cè)中斷處理程序;初始化數(shù)據(jù)結(jié)構(gòu)在內(nèi)存中建立各個(gè)數(shù)據(jù)結(jié)構(gòu),復(fù)位 SMMUv3 設(shè)備則將各個(gè)數(shù)據(jù)結(jié)構(gòu)的基地址和各種配置寫進(jìn)對(duì)應(yīng)的設(shè)備寄存器中,這主要通過調(diào)用 arm_smmu_device_reset() 函數(shù)完成。
  7. 將 SMMUv3 設(shè)備注冊(cè)到 IOMMU 子系統(tǒng),這包括為 struct iommu_device 設(shè)置 struct iommu_opsstruct fwnode_handle,并將 struct iommu_device 對(duì)象注冊(cè)進(jìn) IOMMU 子系統(tǒng)。struct fwnode_handle 用于匹配 SMMUv3 設(shè)備和系統(tǒng) I/O 設(shè)備,這主要通過調(diào)用 iommu_device_register() 函數(shù)完成。
  8. 為各個(gè)總線類型設(shè)置 struct iommu_ops,SMMUv3 設(shè)備驅(qū)動(dòng)程序和要使用 IOMMU 的系統(tǒng) I/O 設(shè)備的加載順序可能是不確定的;正常情況下,應(yīng)該是 SMMUv3 設(shè)備驅(qū)動(dòng)程序先加載,要使用 IOMMU 的系統(tǒng) I/O 設(shè)備后加載;這里會(huì)處理使用 IOMMU 的系統(tǒng) I/O 設(shè)備先于 SMMUv3 設(shè)備驅(qū)動(dòng)程序加載的情況,這主要通過調(diào)用 arm_smmu_set_bus_ops() 函數(shù)完成。

探測(cè) SMMUv3 設(shè)備的硬件特性

arm_smmu_device_probe() 函數(shù)調(diào)用 arm_smmu_device_hw_probe() 函數(shù)探測(cè) SMMUv3 設(shè)備的硬件特性,后者定義 (位于 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 文件中) 如下:

static int arm_smmu_ecmdq_probe(struct arm_smmu_device *smmu)
{
	int ret, cpu;
	u32 i, nump, numq, gap;
	u32 reg, shift_increment;
	u64 addr, smmu_dma_base;
	void __iomem *cp_regs, *cp_base;

	/* IDR6 */
	reg = readl_relaxed(smmu->base + ARM_SMMU_IDR6);

	smmu_reg_dump(smmu);

	nump = 1 << FIELD_GET(IDR6_LOG2NUMP, reg);
	numq = 1 << FIELD_GET(IDR6_LOG2NUMQ, reg);
	smmu->nr_ecmdq = nump * numq;
	gap = ECMDQ_CP_RRESET_SIZE >> FIELD_GET(IDR6_LOG2NUMQ, reg);

	smmu_dma_base = (vmalloc_to_pfn(smmu->base) << PAGE_SHIFT);
	cp_regs = ioremap(smmu_dma_base + ARM_SMMU_ECMDQ_CP_BASE, PAGE_SIZE);
	if (!cp_regs)
		return -ENOMEM;

	for (i = 0; i < nump; i++) {
		u64 val, pre_addr;

		val = readq_relaxed(cp_regs + 32 * i);
		if (!(val & ECMDQ_CP_PRESET)) {
			iounmap(cp_regs);
			dev_err(smmu->dev, "ecmdq control page %u is memory mode\n", i);
			return -EFAULT;
		}

		if (i && ((val & ECMDQ_CP_ADDR) != (pre_addr + ECMDQ_CP_RRESET_SIZE))) {
			iounmap(cp_regs);
			dev_err(smmu->dev, "ecmdq_cp memory region is not contiguous\n");
			return -EFAULT;
		}

		pre_addr = val & ECMDQ_CP_ADDR;
	}

	addr = readl_relaxed(cp_regs) & ECMDQ_CP_ADDR;
	iounmap(cp_regs);

	cp_base = devm_ioremap(smmu->dev, smmu_dma_base + addr, ECMDQ_CP_RRESET_SIZE * nump);
	if (!cp_base)
		return -ENOMEM;

	smmu->ecmdq = devm_alloc_percpu(smmu->dev, struct arm_smmu_ecmdq *);
	if (!smmu->ecmdq)
		return -ENOMEM;

	ret = arm_smmu_ecmdq_layout(smmu);
	if (ret)
		return ret;

	shift_increment = order_base_2(num_possible_cpus() / smmu->nr_ecmdq);

	addr = 0;
	for_each_possible_cpu(cpu) {
		struct arm_smmu_ecmdq *ecmdq;
		struct arm_smmu_queue *q;

		ecmdq = *per_cpu_ptr(smmu->ecmdq, cpu);
		q = &ecmdq->cmdq.q;

		/*
		 * The boot option "maxcpus=" can limit the number of online
		 * CPUs. The CPUs that are not selected are not showed in
		 * cpumask_of_node(node), their 'ecmdq' may be NULL.
		 *
		 * (q->ecmdq_prod & ECMDQ_PROD_EN) indicates that the ECMDQ is
		 * shared by multiple cores and has been initialized.
		 */
		if (!ecmdq || (q->ecmdq_prod & ECMDQ_PROD_EN))
			continue;
		ecmdq->base = cp_base + addr;

		q->llq.max_n_shift = ECMDQ_MAX_SZ_SHIFT + shift_increment;
		ret = arm_smmu_init_one_queue(smmu, q, ecmdq->base, ARM_SMMU_ECMDQ_PROD,
				ARM_SMMU_ECMDQ_CONS, CMDQ_ENT_DWORDS, "ecmdq");
		if (ret)
			return ret;

		q->ecmdq_prod = ECMDQ_PROD_EN;
		rwlock_init(&q->ecmdq_lock);

		ret = arm_smmu_ecmdq_init(&ecmdq->cmdq);
		if (ret) {
			dev_err(smmu->dev, "ecmdq[%d] init failed\n", i);
			return ret;
		}

		addr += gap;
	}

	return 0;
}

static void arm_smmu_get_httu(struct arm_smmu_device *smmu, u32 reg)
{
	u32 fw_features = smmu->features & (ARM_SMMU_FEAT_HA | ARM_SMMU_FEAT_HD);
	u32 features = 0;

	switch (FIELD_GET(IDR0_HTTU, reg)) {
	case IDR0_HTTU_ACCESS_DIRTY:
		features |= ARM_SMMU_FEAT_HD;
		fallthrough;
	case IDR0_HTTU_ACCESS:
		features |= ARM_SMMU_FEAT_HA;
	}

	if (smmu->dev->of_node)
		smmu->features |= features;
	else if (features != fw_features)
		/* ACPI IORT sets the HTTU bits */
		dev_warn(smmu->dev,
			 "IDR0.HTTU overridden by FW configuration (0x%x)\n",
			 fw_features);
}

static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
{
	u32 reg;
	bool coherent = smmu->features & ARM_SMMU_FEAT_COHERENCY;
	bool vhe = cpus_have_cap(ARM64_HAS_VIRT_HOST_EXTN);

	/* IDR0 */
	reg = readl_relaxed(smmu->base + ARM_SMMU_IDR0);

	/* 2-level structures */
	if (FIELD_GET(IDR0_ST_LVL, reg) == IDR0_ST_LVL_2LVL)
		smmu->features |= ARM_SMMU_FEAT_2_LVL_STRTAB;

	if (reg & IDR0_CD2L)
		smmu->features |= ARM_SMMU_FEAT_2_LVL_CDTAB;

	/*
	 * Translation table endianness.
	 * We currently require the same endianness as the CPU, but this
	 * could be changed later by adding a new IO_PGTABLE_QUIRK.
	 */
	switch (FIELD_GET(IDR0_TTENDIAN, reg)) {
	case IDR0_TTENDIAN_MIXED:
		smmu->features |= ARM_SMMU_FEAT_TT_LE | ARM_SMMU_FEAT_TT_BE;
		break;
#ifdef __BIG_ENDIAN
	case IDR0_TTENDIAN_BE:
		smmu->features |= ARM_SMMU_FEAT_TT_BE;
		break;
#else
	case IDR0_TTENDIAN_LE:
		smmu->features |= ARM_SMMU_FEAT_TT_LE;
		break;
#endif
	default:
		dev_err(smmu->dev, "unknown/unsupported TT endianness!\n");
		return -ENXIO;
	}

	/* Boolean feature flags */
	if (IS_ENABLED(CONFIG_PCI_PRI) && reg & IDR0_PRI)
		smmu->features |= ARM_SMMU_FEAT_PRI;

	if (IS_ENABLED(CONFIG_PCI_ATS) && reg & IDR0_ATS)
		smmu->features |= ARM_SMMU_FEAT_ATS;

	if (reg & IDR0_SEV)
		smmu->features |= ARM_SMMU_FEAT_SEV;

	if (reg & IDR0_MSI) {
		smmu->features |= ARM_SMMU_FEAT_MSI;
		if (coherent && !disable_msipolling)
			smmu->options |= ARM_SMMU_OPT_MSIPOLL;
	}

	if (reg & IDR0_HYP) {
		smmu->features |= ARM_SMMU_FEAT_HYP;
		if (vhe)
			smmu->features |= ARM_SMMU_FEAT_E2H;
	}

	arm_smmu_get_httu(smmu, reg);

	/*
	 * If the CPU is using VHE, but the SMMU doesn't support it, the SMMU
	 * will create TLB entries for NH-EL1 world and will miss the
	 * broadcasted TLB invalidations that target EL2-E2H world. Don't enable
	 * BTM in that case.
	 */
	if (reg & IDR0_BTM && (!vhe || reg & IDR0_HYP))
		smmu->features |= ARM_SMMU_FEAT_BTM;

	/*
	 * The coherency feature as set by FW is used in preference to the ID
	 * register, but warn on mismatch.
	 */
	if (!!(reg & IDR0_COHACC) != coherent)
		dev_warn(smmu->dev, "IDR0.COHACC overridden by FW configuration (%s)\n",
			 coherent ? "true" : "false");

	switch (FIELD_GET(IDR0_STALL_MODEL, reg)) {
	case IDR0_STALL_MODEL_FORCE:
		smmu->features |= ARM_SMMU_FEAT_STALL_FORCE;
		fallthrough;
	case IDR0_STALL_MODEL_STALL:
		smmu->features |= ARM_SMMU_FEAT_STALLS;
	}

	if (reg & IDR0_S1P)
		smmu->features |= ARM_SMMU_FEAT_TRANS_S1;

	if (reg & IDR0_S2P)
		smmu->features |= ARM_SMMU_FEAT_TRANS_S2;

	if (!(reg & (IDR0_S1P | IDR0_S2P))) {
		dev_err(smmu->dev, "no translation support!\n");
		return -ENXIO;
	}

	/* We only support the AArch64 table format at present */
	switch (FIELD_GET(IDR0_TTF, reg)) {
	case IDR0_TTF_AARCH32_64:
		smmu->ias = 40;
		fallthrough;
	case IDR0_TTF_AARCH64:
		break;
	default:
		dev_err(smmu->dev, "AArch64 table format not supported!\n");
		return -ENXIO;
	}

	/* ASID/VMID sizes */
	smmu->asid_bits = reg & IDR0_ASID16 ? 16 : 8;
	smmu->vmid_bits = reg & IDR0_VMID16 ? 16 : 8;

	/* IDR1 */
	reg = readl_relaxed(smmu->base + ARM_SMMU_IDR1);
	if (reg & (IDR1_TABLES_PRESET | IDR1_QUEUES_PRESET | IDR1_REL)) {
		dev_err(smmu->dev, "embedded implementation not supported\n");
		return -ENXIO;
	}

	if (reg & IDR1_ECMDQ)
		smmu->features |= ARM_SMMU_FEAT_ECMDQ;

	/* Queue sizes, capped to ensure natural alignment */
	smmu->cmdq.q.llq.max_n_shift = min_t(u32, CMDQ_MAX_SZ_SHIFT,
					     FIELD_GET(IDR1_CMDQS, reg));
	if (smmu->cmdq.q.llq.max_n_shift <= ilog2(CMDQ_BATCH_ENTRIES)) {
		/*
		 * We don't support splitting up batches, so one batch of
		 * commands plus an extra sync needs to fit inside the command
		 * queue. There's also no way we can handle the weird alignment
		 * restrictions on the base pointer for a unit-length queue.
		 */
		dev_err(smmu->dev, "command queue size <= %d entries not supported\n",
			CMDQ_BATCH_ENTRIES);
		return -ENXIO;
	}

	smmu->evtq.q.llq.max_n_shift = min_t(u32, EVTQ_MAX_SZ_SHIFT,
					     FIELD_GET(IDR1_EVTQS, reg));
	smmu->priq.q.llq.max_n_shift = min_t(u32, PRIQ_MAX_SZ_SHIFT,
					     FIELD_GET(IDR1_PRIQS, reg));

	/* SID/SSID sizes */
	smmu->ssid_bits = FIELD_GET(IDR1_SSIDSIZE, reg);
	smmu->sid_bits = FIELD_GET(IDR1_SIDSIZE, reg);

	/*
	 * If the SMMU supports fewer bits than would fill a single L2 stream
	 * table, use a linear table instead.
	 */
	if (smmu->sid_bits <= STRTAB_SPLIT)
		smmu->features &= ~ARM_SMMU_FEAT_2_LVL_STRTAB;

	/* IDR3 */
	reg = readl_relaxed(smmu->base + ARM_SMMU_IDR3);
	switch (FIELD_GET(IDR3_BBML, reg)) {
	case IDR3_BBML0:
		break;
	case IDR3_BBML1:
		smmu->features |= ARM_SMMU_FEAT_BBML1;
		break;
	case IDR3_BBML2:
		smmu->features |= ARM_SMMU_FEAT_BBML2;
		break;
	default:
		dev_err(smmu->dev, "unknown/unsupported BBM behavior level\n");
		return -ENXIO;
	}

	if (FIELD_GET(IDR3_RIL, reg))
		smmu->features |= ARM_SMMU_FEAT_RANGE_INV;

	if (reg & IDR3_MPAM) {
		reg = readl_relaxed(smmu->base + ARM_SMMU_MPAMIDR);
		smmu->mpam_partid_max = FIELD_GET(MPAMIDR_PARTID_MAX, reg);
		smmu->mpam_pmg_max = FIELD_GET(MPAMIDR_PMG_MAX, reg);
		if (smmu->mpam_partid_max || smmu->mpam_pmg_max)
			smmu->features |= ARM_SMMU_FEAT_MPAM;
	}

	/* IDR5 */
	reg = readl_relaxed(smmu->base + ARM_SMMU_IDR5);

	/* Maximum number of outstanding stalls */
	smmu->evtq.max_stalls = FIELD_GET(IDR5_STALL_MAX, reg);

	/* Page sizes */
	if (reg & IDR5_GRAN64K)
		smmu->pgsize_bitmap |= SZ_64K | SZ_512M;
	if (reg & IDR5_GRAN16K)
		smmu->pgsize_bitmap |= SZ_16K | SZ_32M;
	if (reg & IDR5_GRAN4K)
		smmu->pgsize_bitmap |= SZ_4K | SZ_2M | SZ_1G;

	/* Input address size */
	if (FIELD_GET(IDR5_VAX, reg) == IDR5_VAX_52_BIT)
		smmu->features |= ARM_SMMU_FEAT_VAX;

	/* Output address size */
	switch (FIELD_GET(IDR5_OAS, reg)) {
	case IDR5_OAS_32_BIT:
		smmu->oas = 32;
		break;
	case IDR5_OAS_36_BIT:
		smmu->oas = 36;
		break;
	case IDR5_OAS_40_BIT:
		smmu->oas = 40;
		break;
	case IDR5_OAS_42_BIT:
		smmu->oas = 42;
		break;
	case IDR5_OAS_44_BIT:
		smmu->oas = 44;
		break;
	case IDR5_OAS_52_BIT:
		smmu->oas = 52;
		smmu->pgsize_bitmap |= 1ULL << 42; /* 4TB */
		break;
	default:
		dev_info(smmu->dev,
			"unknown output address size. Truncating to 48-bit\n");
		fallthrough;
	case IDR5_OAS_48_BIT:
		smmu->oas = 48;
	}

	if (arm_smmu_ops.pgsize_bitmap == -1UL)
		arm_smmu_ops.pgsize_bitmap = smmu->pgsize_bitmap;
	else
		arm_smmu_ops.pgsize_bitmap |= smmu->pgsize_bitmap;

	/* Set the DMA mask for our table walker */
	if (dma_set_mask_and_coherent(smmu->dev, DMA_BIT_MASK(smmu->oas)))
		dev_warn(smmu->dev,
			 "failed to set DMA mask for table walker\n");

	smmu->ias = max(smmu->ias, smmu->oas);

	if (arm_smmu_sva_supported(smmu))
		smmu->features |= ARM_SMMU_FEAT_SVA;

	dev_info(smmu->dev, "ias %lu-bit, oas %lu-bit (features 0x%08x)\n",
		 smmu->ias, smmu->oas, smmu->features);

	if (smmu->features & ARM_SMMU_FEAT_ECMDQ) {
		int err;

		err = arm_smmu_ecmdq_probe(smmu);
		if (err) {
			dev_err(smmu->dev, "suppress ecmdq feature, errno=%d\n", err);
			smmu->ecmdq_enabled = 0;
		}
	}
	return 0;
}

struct arm_smmu_device 結(jié)構(gòu)體中,SMMUv3 驅(qū)動(dòng)程序用一個(gè) 32 位的值來描述支持的硬件特性,其中每個(gè)特性用一位來表示。函數(shù) arm_smmu_device_hw_probe() 通過讀取 SMMU 的寄存器獲取 SMMU 的硬件特性。

SMMU_IDR0 寄存器:

  • 是否支持兩級(jí)流表
  • 是否支持兩級(jí)上下文描述符 (CD) 表
  • 支持的轉(zhuǎn)換表的字節(jié)序
  • 是否支持 PRI
  • 是否支持 ATS
  • 是否支持 SEV
  • 是否支持 MSI
  • 是否支持 HYP
  • HTTU 特性
  • 是否支持 BTM
  • 是否支持 COHACC
  • 是否支持 STALL
  • 是否支持第 1 階段轉(zhuǎn)換
  • 是否支持第 2 階段轉(zhuǎn)換
  • IAS (輸入地址大小) 的值
  • ASID bits
  • VMID bits

SMMU_IDR1 寄存器 (部分字段被忽略,如 ATTR_TYPE_OVR 和 ATTR_PERMS_OVR):

  • 流表基地址和流表配置是否固定
  • 命令隊(duì)列、事件隊(duì)列和 PRI 隊(duì)列基址是否固定
  • 基址固定時(shí),基址寄存器包含的是絕對(duì)地址還是相對(duì)地址,SMMUv3 設(shè)備驅(qū)動(dòng)程序要求流表基地址和流表配置不固定,命令隊(duì)列、事件隊(duì)列和 PRI 隊(duì)列基址不固定
  • 是否支持?jǐn)U展的命令隊(duì)列
  • 命令隊(duì)列、事件隊(duì)列和 PRI 隊(duì)列的大小
  • StreamID SID 的大小
  • SubstreamID SSID 的大小

SMMU_IDR3 寄存器 (部分字段被忽略):

  • 支持的 BBML
  • 是否支持 RIL
  • 是否支持 MPAM,支持 MPAM 時(shí),還會(huì)讀取 MPAM 的寄存器獲得更多信息

SMMU_IDR5 寄存器:

  • SMMU 和系統(tǒng)支持的未完成停滯事務(wù)的最大數(shù)目。
  • 支持的頁大小
  • 虛擬地址擴(kuò)展 VAX,即支持的虛擬地址大小
  • 輸出地址大小 OAS

此外,arm_smmu_device_hw_probe() 函數(shù)還會(huì)探測(cè)是否支持 SVA,當(dāng)前面檢測(cè)到支持?jǐn)U展的命令隊(duì)列時(shí),還會(huì)讀取 ARM_SMMU_IDR6 寄存器檢測(cè) ECMDQ 的特性。

arm_smmu_device_hw_probe() 函數(shù)按照 ARM 系統(tǒng)內(nèi)存管理單元架構(gòu)規(guī)范版本 3 中定義的寄存器各個(gè)字段的含義執(zhí)行。

初始化數(shù)據(jù)結(jié)構(gòu)

初始化數(shù)據(jù)結(jié)構(gòu)主要通過調(diào)用 arm_smmu_init_structures() 函數(shù)完成,這個(gè)函數(shù)定義 (位于 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 文件中) 如下:

/* Stream table manipulation functions */
static void
arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc)
{
	u64 val = 0;

	val |= FIELD_PREP(STRTAB_L1_DESC_SPAN, desc->span);
	val |= desc->l2ptr_dma & STRTAB_L1_DESC_L2PTR_MASK;

	/* See comment in arm_smmu_write_ctx_desc() */
	WRITE_ONCE(*dst, cpu_to_le64(val));
}

static void arm_smmu_sync_ste_for_sid(struct arm_smmu_device *smmu, u32 sid)
{
	struct arm_smmu_cmdq_ent cmd = {
		.opcode	= CMDQ_OP_CFGI_STE,
		.cfgi	= {
			.sid	= sid,
			.leaf	= true,
		},
	};

	arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd);
}

static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
				      __le64 *dst)
{
	/*
	 * This is hideously complicated, but we only really care about
	 * three cases at the moment:
	 *
	 * 1. Invalid (all zero) -> bypass/fault (init)
	 * 2. Bypass/fault -> translation/bypass (attach)
	 * 3. Translation/bypass -> bypass/fault (detach)
	 *
	 * Given that we can't update the STE atomically and the SMMU
	 * doesn't read the thing in a defined order, that leaves us
	 * with the following maintenance requirements:
	 *
	 * 1. Update Config, return (init time STEs aren't live)
	 * 2. Write everything apart from dword 0, sync, write dword 0, sync
	 * 3. Update Config, sync
	 */
	u64 val = le64_to_cpu(dst[0]);
	bool ste_live = false;
	struct arm_smmu_device *smmu = NULL;
	struct arm_smmu_s1_cfg *s1_cfg = NULL;
	struct arm_smmu_s2_cfg *s2_cfg = NULL;
	struct arm_smmu_domain *smmu_domain = NULL;
	struct arm_smmu_cmdq_ent prefetch_cmd = {
		.opcode		= CMDQ_OP_PREFETCH_CFG,
		.prefetch	= {
			.sid	= sid,
		},
	};

	if (master) {
		smmu_domain = master->domain;
		smmu = master->smmu;
	}

	if (smmu_domain) {
		switch (smmu_domain->stage) {
		case ARM_SMMU_DOMAIN_S1:
			s1_cfg = &smmu_domain->s1_cfg;
			break;
		case ARM_SMMU_DOMAIN_S2:
		case ARM_SMMU_DOMAIN_NESTED:
			s2_cfg = &smmu_domain->s2_cfg;
			break;
		default:
			break;
		}
	}

	if (val & STRTAB_STE_0_V) {
		switch (FIELD_GET(STRTAB_STE_0_CFG, val)) {
		case STRTAB_STE_0_CFG_BYPASS:
			break;
		case STRTAB_STE_0_CFG_S1_TRANS:
		case STRTAB_STE_0_CFG_S2_TRANS:
			ste_live = true;
			break;
		case STRTAB_STE_0_CFG_ABORT:
			BUG_ON(!disable_bypass);
			break;
		default:
			BUG(); /* STE corruption */
		}
	}

	/* Nuke the existing STE_0 value, as we're going to rewrite it */
	val = STRTAB_STE_0_V;

	/* Bypass/fault */
	if (!smmu_domain || !(s1_cfg || s2_cfg)) {
		if (!smmu_domain && disable_bypass)
			val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT);
		else
			val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_BYPASS);

		dst[0] = cpu_to_le64(val);
		dst[1] = cpu_to_le64(FIELD_PREP(STRTAB_STE_1_SHCFG,
						STRTAB_STE_1_SHCFG_INCOMING));
		dst[2] = 0; /* Nuke the VMID */
		/*
		 * The SMMU can perform negative caching, so we must sync
		 * the STE regardless of whether the old value was live.
		 */
		if (smmu)
			arm_smmu_sync_ste_for_sid(smmu, sid);
		return;
	}

	if (s1_cfg) {
		u64 strw = smmu->features & ARM_SMMU_FEAT_E2H ?
			STRTAB_STE_1_STRW_EL2 : STRTAB_STE_1_STRW_NSEL1;

		BUG_ON(ste_live);
		dst[1] = cpu_to_le64(
			 FIELD_PREP(STRTAB_STE_1_S1DSS, STRTAB_STE_1_S1DSS_SSID0) |
			 FIELD_PREP(STRTAB_STE_1_S1CIR, STRTAB_STE_1_S1C_CACHE_WBRA) |
			 FIELD_PREP(STRTAB_STE_1_S1COR, STRTAB_STE_1_S1C_CACHE_WBRA) |
			 FIELD_PREP(STRTAB_STE_1_S1CSH, ARM_SMMU_SH_ISH) |
			 FIELD_PREP(STRTAB_STE_1_STRW, strw));

		if (master->prg_resp_needs_ssid)
			dst[1] |= cpu_to_le64(STRTAB_STE_1_PPAR);

		if (smmu->features & ARM_SMMU_FEAT_STALLS &&
		    !master->stall_enabled)
			dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);

		val |= (s1_cfg->cdcfg.cdtab_dma & STRTAB_STE_0_S1CTXPTR_MASK) |
			FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S1_TRANS) |
			FIELD_PREP(STRTAB_STE_0_S1CDMAX, s1_cfg->s1cdmax) |
			FIELD_PREP(STRTAB_STE_0_S1FMT, s1_cfg->s1fmt);
	}

	if (s2_cfg) {
		BUG_ON(ste_live);
		dst[2] = cpu_to_le64(
			 FIELD_PREP(STRTAB_STE_2_S2VMID, s2_cfg->vmid) |
			 FIELD_PREP(STRTAB_STE_2_VTCR, s2_cfg->vtcr) |
#ifdef __BIG_ENDIAN
			 STRTAB_STE_2_S2ENDI |
#endif
			 STRTAB_STE_2_S2PTW | STRTAB_STE_2_S2AA64 |
			 STRTAB_STE_2_S2R);

		dst[3] = cpu_to_le64(s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK);

		val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS);
	}

	if (master->ats_enabled)
		dst[1] |= cpu_to_le64(FIELD_PREP(STRTAB_STE_1_EATS,
						 STRTAB_STE_1_EATS_TRANS));

	pr_info("arm_smmu_write_strtab_ent[%d], val[0]=0x%llx, val[1]=0x%llx, val[2]=0x%llx, val[3]=0x%llx\n",
			sid, val, dst[1], dst[2], dst[3]);
	arm_smmu_sync_ste_for_sid(smmu, sid);
	/* See comment in arm_smmu_write_ctx_desc() */
	WRITE_ONCE(dst[0], cpu_to_le64(val));
	arm_smmu_sync_ste_for_sid(smmu, sid);

	/* It's likely that we'll want to use the new STE soon */
	if (!(smmu->options & ARM_SMMU_OPT_SKIP_PREFETCH))
		arm_smmu_cmdq_issue_cmd(smmu, &prefetch_cmd);
}

static void arm_smmu_init_bypass_stes(__le64 *strtab, unsigned int nent)
{
	unsigned int i;

	for (i = 0; i < nent; ++i) {
		arm_smmu_write_strtab_ent(NULL, -1, strtab);
		strtab += STRTAB_STE_DWORDS;
	}
}
 . . . . . .
/* Probing and initialisation functions */
static int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
				   struct arm_smmu_queue *q,
				   void __iomem *page,
				   unsigned long prod_off,
				   unsigned long cons_off,
				   size_t dwords, const char *name)
{
	size_t qsz;

	do {
		qsz = ((1 << q->llq.max_n_shift) * dwords) << 3;
		q->base = dmam_alloc_coherent(smmu->dev, qsz, &q->base_dma,
					      GFP_KERNEL);
		if (q->base || qsz < PAGE_SIZE)
			break;

		q->llq.max_n_shift--;
	} while (1);

	if (!q->base) {
		dev_err(smmu->dev,
			"failed to allocate queue (0x%zx bytes) for %s\n",
			qsz, name);
		return -ENOMEM;
	}

	if (!WARN_ON(q->base_dma & (qsz - 1))) {
		dev_info(smmu->dev, "allocated %u entries for %s\n",
			 1 << q->llq.max_n_shift, name);
	}

	q->prod_reg	= page + prod_off;
	q->cons_reg	= page + cons_off;
	q->ent_dwords	= dwords;

	q->q_base  = Q_BASE_RWA;
	q->q_base |= q->base_dma & Q_BASE_ADDR_MASK;
	q->q_base |= FIELD_PREP(Q_BASE_LOG2SIZE, q->llq.max_n_shift);

	q->llq.prod = q->llq.cons = 0;
	return 0;
}

static void arm_smmu_cmdq_free_bitmap(void *data)
{
	unsigned long *bitmap = data;
	bitmap_free(bitmap);
}

static int arm_smmu_cmdq_init(struct arm_smmu_device *smmu)
{
	int ret = 0;
	struct arm_smmu_cmdq *cmdq = &smmu->cmdq;
	unsigned int nents = 1 << cmdq->q.llq.max_n_shift;
	atomic_long_t *bitmap;

	cmdq->shared = 1;
	atomic_set(&cmdq->owner_prod, 0);
	atomic_set(&cmdq->lock, 0);

	bitmap = (atomic_long_t *)bitmap_zalloc(nents, GFP_KERNEL);
	if (!bitmap) {
		dev_err(smmu->dev, "failed to allocate cmdq bitmap\n");
		ret = -ENOMEM;
	} else {
		cmdq->valid_map = bitmap;
		devm_add_action(smmu->dev, arm_smmu_cmdq_free_bitmap, bitmap);
	}

	return ret;
}

static int arm_smmu_ecmdq_init(struct arm_smmu_cmdq *cmdq)
{
	unsigned int nents = 1 << cmdq->q.llq.max_n_shift;

	atomic_set(&cmdq->owner_prod, 0);
	atomic_set(&cmdq->lock, 0);

	cmdq->valid_map = (atomic_long_t *)bitmap_zalloc(nents, GFP_KERNEL);
	if (!cmdq->valid_map)
		return -ENOMEM;

	return 0;
}

static int arm_smmu_init_queues(struct arm_smmu_device *smmu)
{
	int ret;

	/* cmdq */
	ret = arm_smmu_init_one_queue(smmu, &smmu->cmdq.q, smmu->base,
				      ARM_SMMU_CMDQ_PROD, ARM_SMMU_CMDQ_CONS,
				      CMDQ_ENT_DWORDS, "cmdq");
	if (ret)
		return ret;

	ret = arm_smmu_cmdq_init(smmu);
	if (ret)
		return ret;

	/* evtq */
	ret = arm_smmu_init_one_queue(smmu, &smmu->evtq.q, smmu->page1,
				      ARM_SMMU_EVTQ_PROD, ARM_SMMU_EVTQ_CONS,
				      EVTQ_ENT_DWORDS, "evtq");
	if (ret)
		return ret;

	if ((smmu->features & ARM_SMMU_FEAT_SVA) &&
	    (smmu->features & ARM_SMMU_FEAT_STALLS)) {
		smmu->evtq.iopf = iopf_queue_alloc(dev_name(smmu->dev));
		if (!smmu->evtq.iopf)
			return -ENOMEM;
	}

	/* priq */
	if (!(smmu->features & ARM_SMMU_FEAT_PRI))
		return 0;

	if (smmu->features & ARM_SMMU_FEAT_SVA) {
		smmu->priq.iopf = iopf_queue_alloc(dev_name(smmu->dev));
		if (!smmu->priq.iopf)
			return -ENOMEM;
	}

	init_waitqueue_head(&smmu->priq.wq);
	smmu->priq.batch = 0;

	return arm_smmu_init_one_queue(smmu, &smmu->priq.q, smmu->page1,
				       ARM_SMMU_PRIQ_PROD, ARM_SMMU_PRIQ_CONS,
				       PRIQ_ENT_DWORDS, "priq");
}

static int arm_smmu_init_l1_strtab(struct arm_smmu_device *smmu)
{
	unsigned int i;
	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
	size_t size = sizeof(*cfg->l1_desc) * cfg->num_l1_ents;
	void *strtab = smmu->strtab_cfg.strtab;

	cfg->l1_desc = devm_kzalloc(smmu->dev, size, GFP_KERNEL);
	if (!cfg->l1_desc) {
		dev_err(smmu->dev, "failed to allocate l1 stream table desc\n");
		return -ENOMEM;
	}

	for (i = 0; i < cfg->num_l1_ents; ++i) {
		arm_smmu_write_strtab_l1_desc(strtab, &cfg->l1_desc[i]);
		strtab += STRTAB_L1_DESC_DWORDS << 3;
	}

	return 0;
}

#ifdef CONFIG_SMMU_BYPASS_DEV
static void arm_smmu_install_bypass_ste_for_dev(struct arm_smmu_device *smmu,
				    u32 sid)
{
	u64 val;
	__le64 *step = arm_smmu_get_step_for_sid(smmu, sid);

	if (!step)
		return;

	val = STRTAB_STE_0_V;
	val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_BYPASS);
	step[0] = cpu_to_le64(val);
	step[1] = cpu_to_le64(FIELD_PREP(STRTAB_STE_1_SHCFG,
	STRTAB_STE_1_SHCFG_INCOMING));
	step[2] = 0;
}

static int arm_smmu_prepare_init_l2_strtab(struct device *dev, void *data)
{
	u32 sid;
	int ret;
	struct pci_dev *pdev;
	struct arm_smmu_device *smmu = (struct arm_smmu_device *)data;

	if (!arm_smmu_device_domain_type(dev))
		return 0;

	pdev = to_pci_dev(dev);
	sid = PCI_DEVID(pdev->bus->number, pdev->devfn);
	if (!arm_smmu_sid_in_range(smmu, sid))
		return -ERANGE;

	ret = arm_smmu_init_l2_strtab(smmu, sid);
	if (ret)
		return ret;

	arm_smmu_install_bypass_ste_for_dev(smmu, sid);

	return 0;
}
#endif

static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
{
	void *strtab;
	u64 reg;
	u32 size, l1size;
	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
#ifdef CONFIG_SMMU_BYPASS_DEV
	int ret;
#endif

	/* Calculate the L1 size, capped to the SIDSIZE. */
	size = STRTAB_L1_SZ_SHIFT - (ilog2(STRTAB_L1_DESC_DWORDS) + 3);
	size = min(size, smmu->sid_bits - STRTAB_SPLIT);
	cfg->num_l1_ents = 1 << size;

	size += STRTAB_SPLIT;
	if (size < smmu->sid_bits)
		dev_warn(smmu->dev,
			 "2-level strtab only covers %u/%u bits of SID\n",
			 size, smmu->sid_bits);

	l1size = cfg->num_l1_ents * (STRTAB_L1_DESC_DWORDS << 3);
	strtab = dmam_alloc_coherent(smmu->dev, l1size, &cfg->strtab_dma,
				     GFP_KERNEL);
	if (!strtab) {
		dev_err(smmu->dev,
			"failed to allocate l1 stream table (%u bytes)\n",
			l1size);
		return -ENOMEM;
	}
	cfg->strtab = strtab;

	/* Configure strtab_base_cfg for 2 levels */
	reg  = FIELD_PREP(STRTAB_BASE_CFG_FMT, STRTAB_BASE_CFG_FMT_2LVL);
	reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, size);
	reg |= FIELD_PREP(STRTAB_BASE_CFG_SPLIT, STRTAB_SPLIT);
	cfg->strtab_base_cfg = reg;
#ifdef CONFIG_SMMU_BYPASS_DEV
	ret = arm_smmu_init_l1_strtab(smmu);
	if (ret)
		return ret;

	if (smmu_bypass_devices_num) {
		ret = bus_for_each_dev(&pci_bus_type, NULL, (void *)smmu,
								arm_smmu_prepare_init_l2_strtab);
	}

	return ret;
#else
	return arm_smmu_init_l1_strtab(smmu);
#endif
}

static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
{
	void *strtab;
	u64 reg;
	u32 size;
	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;

	size = (1 << smmu->sid_bits) * (STRTAB_STE_DWORDS << 3);
	strtab = dmam_alloc_coherent(smmu->dev, size, &cfg->strtab_dma,
				     GFP_KERNEL);
	if (!strtab) {
		dev_err(smmu->dev,
			"failed to allocate linear stream table (%u bytes)\n",
			size);
		return -ENOMEM;
	}
	cfg->strtab = strtab;
	cfg->num_l1_ents = 1 << smmu->sid_bits;

	/* Configure strtab_base_cfg for a linear table covering all SIDs */
	reg  = FIELD_PREP(STRTAB_BASE_CFG_FMT, STRTAB_BASE_CFG_FMT_LINEAR);
	reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, smmu->sid_bits);
	cfg->strtab_base_cfg = reg;

	arm_smmu_init_bypass_stes(strtab, cfg->num_l1_ents);
	return 0;
}

static int arm_smmu_init_strtab(struct arm_smmu_device *smmu)
{
	u64 reg;
	int ret;

	if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB)
		ret = arm_smmu_init_strtab_2lvl(smmu);
	else
		ret = arm_smmu_init_strtab_linear(smmu);

	if (ret)
		return ret;

	/* Set the strtab base address */
	reg  = smmu->strtab_cfg.strtab_dma & STRTAB_BASE_ADDR_MASK;
	reg |= STRTAB_BASE_RA;
	smmu->strtab_cfg.strtab_base = reg;

	/* Allocate the first VMID for stage-2 bypass STEs */
	set_bit(0, smmu->vmid_map);
	return 0;
}

static int arm_smmu_init_structures(struct arm_smmu_device *smmu)
{
	int ret;

	mutex_init(&smmu->streams_mutex);
	smmu->streams = RB_ROOT;

	ret = arm_smmu_init_queues(smmu);
	if (ret)
		return ret;

	return arm_smmu_init_strtab(smmu);
}

arm_smmu_init_structures() 函數(shù)初始化的數(shù)據(jù)結(jié)構(gòu),主要有命令隊(duì)列和事件隊(duì)列,當(dāng) SMMUv3 硬件設(shè)備支持 PRI 時(shí),會(huì)初始化 PRI 隊(duì)列,以及流表。

隊(duì)列的初始化主要通過 arm_smmu_init_queues()/arm_smmu_init_one_queue() 函數(shù)完成,這個(gè)過程大體如下:

  1. 為隊(duì)列分配內(nèi)存,SMMU_IDR1 寄存器中有字段描述了支持的命令隊(duì)列、事件隊(duì)列和 PRIQ 隊(duì)列的最大大小,這個(gè)大小的含義為,支持的隊(duì)列中包含的項(xiàng)的最大個(gè)數(shù)以 2 為底的對(duì)數(shù)。在為隊(duì)列分配內(nèi)存時(shí),會(huì)嘗試從分配最大數(shù)量的內(nèi)存開始,并逐漸減半,直到內(nèi)存分配成功,或隊(duì)列大小的字節(jié)數(shù)小于一個(gè)內(nèi)存頁,后一種情況也就意味著內(nèi)存分配失敗,此時(shí)將報(bào)錯(cuò)退出。
  2. 初始化隊(duì)列的生產(chǎn)者和消費(fèi)者寄存器地址,以及隊(duì)列項(xiàng)以 64 位為單位的大小。
  3. 基于為隊(duì)列分配的內(nèi)存的地址,以及隊(duì)列大小,構(gòu)造隊(duì)列基址寄存器的值,這個(gè)值將在后面被寫入 SMMU_CMDQ_BASE、SMMU_EVENTQ_BASESMMU_PRIQ_BASE 等寄存器。

流表的初始化主要通過 arm_smmu_init_strtab() 及其調(diào)用的 arm_smmu_init_strtab_2lvl()/arm_smmu_init_strtab_linear() 等函數(shù)完成,這個(gè)過程大體如下:

  1. 如果 SMMUv3 硬件設(shè)備支持 2 級(jí)流表,則創(chuàng)建 2 級(jí)流表:
    • SMMU_STRTAB_BASE_CFG 寄存器中有幾個(gè)位可以用來配置使用多級(jí)流表時(shí),StreamID 的分割點(diǎn),即多少位用于索引第 1 級(jí)流表,多少位用于索引第 2 級(jí)流表,還有幾個(gè)位可以用來配置 StreamID 的位長,另外從 SMMU_IDR1 寄存器中可以獲得 SMMUv3 硬件設(shè)備支持的最長 StreamID 位長;
    • SMMUv3 設(shè)備驅(qū)動(dòng)程序取第 2 級(jí)流表位長為 STRTAB_SPLIT 位,即 8 位,并取第 1 級(jí)流表最大占用 1 MB 內(nèi)存空間,以此計(jì)算第 1 級(jí)流表的位長,并計(jì)算第 1 級(jí)流表的項(xiàng)數(shù),和所需的以字節(jié)為單位的內(nèi)存空間;
    • 為第 1 級(jí)流表分配內(nèi)存;
    • 初始化流表配置結(jié)構(gòu)體 struct arm_smmu_strtab_cfg 的流表基址,流表項(xiàng)個(gè)數(shù),和流表配置值等字段,其中的流表配置值將在后面被寫入 SMMU_STRTAB_BASE_CFG 寄存器;
    • 調(diào)用 arm_smmu_init_l1_strtab() 函數(shù)初始化第 1 級(jí)流表,SMMUv3 設(shè)備驅(qū)動(dòng)程序維護(hù)兩個(gè) L1 流表描述符表,一個(gè)主要由 SMMUv3 驅(qū)動(dòng)程序訪問,另一個(gè)給 SMMUv3 硬件訪問,前者用 struct arm_smmu_strtab_l1_desc 結(jié)構(gòu)體數(shù)組來表示,在 arm_smmu_init_l1_strtab() 函數(shù)中會(huì)創(chuàng)建 struct arm_smmu_strtab_l1_desc 結(jié)構(gòu)體數(shù)組,并初始化為無效 L1 流表描述符,并將這些對(duì)象的內(nèi)容寫入第 1 級(jí)流表。
  2. SMMUv3 硬件設(shè)備不支持 2 級(jí)流表,創(chuàng)建線性流表:
    • 根據(jù)前面從 SMMU_IDR1 寄存器中獲得的 StreamID 的位長度,計(jì)算線性流表所需的內(nèi)存空間以字節(jié)為單位的大小;
    • 為線性流表分配內(nèi)存;
    • 初始化流表配置結(jié)構(gòu)體 struct arm_smmu_strtab_cfg 的流表基址,流表項(xiàng)個(gè)數(shù),和流表配置值等字段,其中的流表配置值將在后面被寫入 SMMU_STRTAB_BASE_CFG 寄存器;
    • 調(diào)用 arm_smmu_init_bypass_stes() 函數(shù),將線性流表中的所有 STE 初始化為旁路 SMMU。
  3. 基于在內(nèi)存中創(chuàng)建的流表的基地址,構(gòu)造流表基址寄存器的值,這個(gè)值將在后面被寫入 SMMU_STRTAB_BASE 寄存器。

arm_smmu_init_structures() 函數(shù)按照 ARM 系統(tǒng)內(nèi)存管理單元架構(gòu)規(guī)范版本 3 中定義的數(shù)據(jù)結(jié)構(gòu)及它們的關(guān)系執(zhí)行。

復(fù)位 SMMUv3 設(shè)備

復(fù)位 SMMUv3 設(shè)備主要通過調(diào)用 arm_smmu_device_reset() 函數(shù)完成,這個(gè)函數(shù)定義 (位于 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 文件中) 如下:

static int arm_smmu_write_reg_sync(struct arm_smmu_device *smmu, u32 val,
				   unsigned int reg_off, unsigned int ack_off)
{
	u32 reg;

	writel_relaxed(val, smmu->base + reg_off);
	return readl_relaxed_poll_timeout(smmu->base + ack_off, reg, reg == val,
					  1, ARM_SMMU_POLL_TIMEOUT_US);
}

/* GBPA is "special" */
static int arm_smmu_update_gbpa(struct arm_smmu_device *smmu, u32 set, u32 clr)
{
	int ret;
	u32 reg, __iomem *gbpa = smmu->base + ARM_SMMU_GBPA;

	ret = readl_relaxed_poll_timeout(gbpa, reg, !(reg & GBPA_UPDATE),
					 1, ARM_SMMU_POLL_TIMEOUT_US);
	if (ret)
		return ret;

	reg &= ~clr;
	reg |= set;
	writel_relaxed(reg | GBPA_UPDATE, gbpa);
	ret = readl_relaxed_poll_timeout(gbpa, reg, !(reg & GBPA_UPDATE),
					 1, ARM_SMMU_POLL_TIMEOUT_US);

	if (ret)
		dev_err(smmu->dev, "GBPA not responding to update\n");
	return ret;
}

static void arm_smmu_free_msis(void *data)
{
	struct device *dev = data;
	platform_msi_domain_free_irqs(dev);
}

static void arm_smmu_write_msi_msg(struct msi_desc *desc, struct msi_msg *msg)
{
	phys_addr_t doorbell;
	struct device *dev = msi_desc_to_dev(desc);
	struct arm_smmu_device *smmu = dev_get_drvdata(dev);
	phys_addr_t *cfg = arm_smmu_msi_cfg[desc->platform.msi_index];

	doorbell = (((u64)msg->address_hi) << 32) | msg->address_lo;
	doorbell &= MSI_CFG0_ADDR_MASK;

#ifdef CONFIG_PM_SLEEP
	/* Saves the msg (base addr of msi irq) and restores it during resume */
	desc->msg.address_lo = msg->address_lo;
	desc->msg.address_hi = msg->address_hi;
	desc->msg.data = msg->data;
#endif

	writeq_relaxed(doorbell, smmu->base + cfg[0]);
	writel_relaxed(msg->data, smmu->base + cfg[1]);
	writel_relaxed(ARM_SMMU_MEMATTR_DEVICE_nGnRE, smmu->base + cfg[2]);
}

static void arm_smmu_setup_msis(struct arm_smmu_device *smmu)
{
	struct msi_desc *desc;
	int ret, nvec = ARM_SMMU_MAX_MSIS;
	struct device *dev = smmu->dev;

	/* Clear the MSI address regs */
	writeq_relaxed(0, smmu->base + ARM_SMMU_GERROR_IRQ_CFG0);
	writeq_relaxed(0, smmu->base + ARM_SMMU_EVTQ_IRQ_CFG0);

	if (smmu->features & ARM_SMMU_FEAT_PRI)
		writeq_relaxed(0, smmu->base + ARM_SMMU_PRIQ_IRQ_CFG0);
	else
		nvec--;

	if (!(smmu->features & ARM_SMMU_FEAT_MSI))
		return;

	if (!dev->msi_domain) {
		dev_info(smmu->dev, "msi_domain absent - falling back to wired irqs\n");
		return;
	}

	/* Allocate MSIs for evtq, gerror and priq. Ignore cmdq */
	ret = platform_msi_domain_alloc_irqs(dev, nvec, arm_smmu_write_msi_msg);
	if (ret) {
		dev_warn(dev, "failed to allocate MSIs - falling back to wired irqs\n");
		return;
	}

	for_each_msi_entry(desc, dev) {
		switch (desc->platform.msi_index) {
		case EVTQ_MSI_INDEX:
			smmu->evtq.q.irq = desc->irq;
			break;
		case GERROR_MSI_INDEX:
			smmu->gerr_irq = desc->irq;
			break;
		case PRIQ_MSI_INDEX:
			smmu->priq.q.irq = desc->irq;
			break;
		default:	/* Unknown */
			continue;
		}
	}

	/* Add callback to free MSIs on teardown */
	devm_add_action(dev, arm_smmu_free_msis, dev);
}

#ifdef CONFIG_PM_SLEEP
static void arm_smmu_resume_msis(struct arm_smmu_device *smmu)
{
	struct msi_desc *desc;
	struct device *dev = smmu->dev;

	for_each_msi_entry(desc, dev) {
		switch (desc->platform.msi_index) {
		case EVTQ_MSI_INDEX:
		case GERROR_MSI_INDEX:
		case PRIQ_MSI_INDEX: {
			phys_addr_t *cfg = arm_smmu_msi_cfg[desc->platform.msi_index];
			struct msi_msg *msg = &desc->msg;
			phys_addr_t doorbell = (((u64)msg->address_hi) << 32) | msg->address_lo;

			doorbell &= MSI_CFG0_ADDR_MASK;
			writeq_relaxed(doorbell, smmu->base + cfg[0]);
			writel_relaxed(msg->data, smmu->base + cfg[1]);
			writel_relaxed(ARM_SMMU_MEMATTR_DEVICE_nGnRE,
					smmu->base + cfg[2]);
			break;
		}
		default:
			continue;

		}
	}
}
#else
static void arm_smmu_resume_msis(struct arm_smmu_device *smmu)
{
}
#endif

static void arm_smmu_setup_unique_irqs(struct arm_smmu_device *smmu, bool resume)
{
	int irq, ret;

	if (!resume)
		arm_smmu_setup_msis(smmu);
	else {
		/* The irq doesn't need to be re-requested during resume */
		arm_smmu_resume_msis(smmu);
		return;
	}

	/* Request interrupt lines */
	irq = smmu->evtq.q.irq;
	if (irq) {
		ret = devm_request_threaded_irq(smmu->dev, irq, NULL,
						arm_smmu_evtq_thread,
						IRQF_ONESHOT,
						"arm-smmu-v3-evtq", smmu);
		if (ret < 0)
			dev_warn(smmu->dev, "failed to enable evtq irq\n");
	} else {
		dev_warn(smmu->dev, "no evtq irq - events will not be reported!\n");
	}

	irq = smmu->gerr_irq;
	if (irq) {
		ret = devm_request_irq(smmu->dev, irq, arm_smmu_gerror_handler,
				       0, "arm-smmu-v3-gerror", smmu);
		if (ret < 0)
			dev_warn(smmu->dev, "failed to enable gerror irq\n");
	} else {
		dev_warn(smmu->dev, "no gerr irq - errors will not be reported!\n");
	}

	if (smmu->features & ARM_SMMU_FEAT_PRI) {
		irq = smmu->priq.q.irq;
		if (irq) {
			ret = devm_request_threaded_irq(smmu->dev, irq, NULL,
							arm_smmu_priq_thread,
							IRQF_ONESHOT,
							"arm-smmu-v3-priq",
							smmu);
			if (ret < 0)
				dev_warn(smmu->dev,
					 "failed to enable priq irq\n");
		} else {
			dev_warn(smmu->dev, "no priq irq - PRI will be broken\n");
		}
	}
}

static int arm_smmu_setup_irqs(struct arm_smmu_device *smmu, bool resume)
{
	int ret, irq;
	u32 irqen_flags = IRQ_CTRL_EVTQ_IRQEN | IRQ_CTRL_GERROR_IRQEN;

	/* Disable IRQs first */
	ret = arm_smmu_write_reg_sync(smmu, 0, ARM_SMMU_IRQ_CTRL,
				      ARM_SMMU_IRQ_CTRLACK);
	if (ret) {
		dev_err(smmu->dev, "failed to disable irqs\n");
		return ret;
	}

	irq = smmu->combined_irq;
	if (irq) {
		/*
		 * Cavium ThunderX2 implementation doesn't support unique irq
		 * lines. Use a single irq line for all the SMMUv3 interrupts.
		 */
		ret = devm_request_threaded_irq(smmu->dev, irq,
					arm_smmu_combined_irq_handler,
					arm_smmu_combined_irq_thread,
					IRQF_ONESHOT,
					"arm-smmu-v3-combined-irq", smmu);
		if (ret < 0)
			dev_warn(smmu->dev, "failed to enable combined irq\n");
	} else
		arm_smmu_setup_unique_irqs(smmu, resume);

	if (smmu->features & ARM_SMMU_FEAT_PRI)
		irqen_flags |= IRQ_CTRL_PRIQ_IRQEN;

	/* Enable interrupt generation on the SMMU */
	ret = arm_smmu_write_reg_sync(smmu, irqen_flags,
				      ARM_SMMU_IRQ_CTRL, ARM_SMMU_IRQ_CTRLACK);
	if (ret)
		dev_warn(smmu->dev, "failed to enable irqs\n");

	return 0;
}

static int arm_smmu_device_disable(struct arm_smmu_device *smmu)
{
	int ret;

	ret = arm_smmu_write_reg_sync(smmu, 0, ARM_SMMU_CR0, ARM_SMMU_CR0ACK);
	if (ret)
		dev_err(smmu->dev, "failed to clear cr0\n");

	return ret;
}

static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool resume)
{
	int i;
	int ret;
	u32 reg, enables;
	struct arm_smmu_cmdq_ent cmd;

	/* Clear CR0 and sync (disables SMMU and queue processing) */
	reg = readl_relaxed(smmu->base + ARM_SMMU_CR0);
	if (reg & CR0_SMMUEN) {
		dev_warn(smmu->dev, "SMMU currently enabled! Resetting...\n");
		WARN_ON(is_kdump_kernel() && !disable_bypass);
		arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0);
	}

	ret = arm_smmu_device_disable(smmu);
	if (ret)
		return ret;

	/* CR1 (table and queue memory attributes) */
	reg = FIELD_PREP(CR1_TABLE_SH, ARM_SMMU_SH_ISH) |
	      FIELD_PREP(CR1_TABLE_OC, CR1_CACHE_WB) |
	      FIELD_PREP(CR1_TABLE_IC, CR1_CACHE_WB) |
	      FIELD_PREP(CR1_QUEUE_SH, ARM_SMMU_SH_ISH) |
	      FIELD_PREP(CR1_QUEUE_OC, CR1_CACHE_WB) |
	      FIELD_PREP(CR1_QUEUE_IC, CR1_CACHE_WB);
	writel_relaxed(reg, smmu->base + ARM_SMMU_CR1);

	/* CR2 (random crap) */
	reg = CR2_RECINVSID;

	if (smmu->features & ARM_SMMU_FEAT_E2H)
		reg |= CR2_E2H;

	if (!(smmu->features & ARM_SMMU_FEAT_BTM))
		reg |= CR2_PTM;

	writel_relaxed(reg, smmu->base + ARM_SMMU_CR2);

	/* Stream table */
	writeq_relaxed(smmu->strtab_cfg.strtab_base,
		       smmu->base + ARM_SMMU_STRTAB_BASE);
	writel_relaxed(smmu->strtab_cfg.strtab_base_cfg,
		       smmu->base + ARM_SMMU_STRTAB_BASE_CFG);

	/* Command queue */
	writeq_relaxed(smmu->cmdq.q.q_base, smmu->base + ARM_SMMU_CMDQ_BASE);
	writel_relaxed(smmu->cmdq.q.llq.prod, smmu->base + ARM_SMMU_CMDQ_PROD);
	writel_relaxed(smmu->cmdq.q.llq.cons, smmu->base + ARM_SMMU_CMDQ_CONS);

	for (i = 0; i < smmu->nr_ecmdq; i++) {
		struct arm_smmu_ecmdq *ecmdq;
		struct arm_smmu_queue *q;

		ecmdq = *per_cpu_ptr(smmu->ecmdq, i);
		q = &ecmdq->cmdq.q;

		if (WARN_ON(q->llq.prod != q->llq.cons)) {
			q->llq.prod = 0;
			q->llq.cons = 0;
		}
		writeq_relaxed(q->q_base, ecmdq->base + ARM_SMMU_ECMDQ_BASE);
		writel_relaxed(q->llq.prod, ecmdq->base + ARM_SMMU_ECMDQ_PROD);
		writel_relaxed(q->llq.cons, ecmdq->base + ARM_SMMU_ECMDQ_CONS);

		/* enable ecmdq */
		writel(ECMDQ_PROD_EN | q->llq.prod, q->prod_reg);
		ret = readl_relaxed_poll_timeout(q->cons_reg, reg, reg & ECMDQ_CONS_ENACK,
					  1, ARM_SMMU_POLL_TIMEOUT_US);
		if (ret) {
			dev_err(smmu->dev, "ecmdq[%d] enable failed\n", i);
			smmu->ecmdq_enabled = 0;
			break;
		}
	}

	enables = CR0_CMDQEN;
	ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0,
				      ARM_SMMU_CR0ACK);
	if (ret) {
		dev_err(smmu->dev, "failed to enable command queue\n");
		return ret;
	}

	/* Invalidate any cached configuration */
	cmd.opcode = CMDQ_OP_CFGI_ALL;
	arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd);

	/* Invalidate any stale TLB entries */
	if (smmu->features & ARM_SMMU_FEAT_HYP) {
		cmd.opcode = CMDQ_OP_TLBI_EL2_ALL;
		arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd);
	}

	cmd.opcode = CMDQ_OP_TLBI_NSNH_ALL;
	arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd);

	/* Event queue */
	writeq_relaxed(smmu->evtq.q.q_base, smmu->base + ARM_SMMU_EVTQ_BASE);
	writel_relaxed(smmu->evtq.q.llq.prod, smmu->page1 + ARM_SMMU_EVTQ_PROD);
	writel_relaxed(smmu->evtq.q.llq.cons, smmu->page1 + ARM_SMMU_EVTQ_CONS);

	enables |= CR0_EVTQEN;
	ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0,
				      ARM_SMMU_CR0ACK);
	if (ret) {
		dev_err(smmu->dev, "failed to enable event queue\n");
		return ret;
	}

	/* PRI queue */
	if (smmu->features & ARM_SMMU_FEAT_PRI) {
		writeq_relaxed(smmu->priq.q.q_base,
			       smmu->base + ARM_SMMU_PRIQ_BASE);
		writel_relaxed(smmu->priq.q.llq.prod,
			       smmu->page1 + ARM_SMMU_PRIQ_PROD);
		writel_relaxed(smmu->priq.q.llq.cons,
			       smmu->page1 + ARM_SMMU_PRIQ_CONS);

		enables |= CR0_PRIQEN;
		ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0,
					      ARM_SMMU_CR0ACK);
		if (ret) {
			dev_err(smmu->dev, "failed to enable PRI queue\n");
			return ret;
		}
	}

	if (smmu->features & ARM_SMMU_FEAT_ATS) {
		enables |= CR0_ATSCHK;
		ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0,
					      ARM_SMMU_CR0ACK);
		if (ret) {
			dev_err(smmu->dev, "failed to enable ATS check\n");
			return ret;
		}
	}

	ret = arm_smmu_setup_irqs(smmu, resume);
	if (ret) {
		dev_err(smmu->dev, "failed to setup irqs\n");
		return ret;
	}

	if (is_kdump_kernel())
		enables &= ~(CR0_EVTQEN | CR0_PRIQEN);

	/* Enable the SMMU interface, or ensure bypass */
	if (!smmu->bypass || disable_bypass) {
		enables |= CR0_SMMUEN;
	} else {
		ret = arm_smmu_update_gbpa(smmu, 0, GBPA_ABORT);
		if (ret)
			return ret;
	}
	ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0,
				      ARM_SMMU_CR0ACK);
	if (ret) {
		dev_err(smmu->dev, "failed to enable SMMU interface\n");
		return ret;
	}

	return 0;
}

復(fù)位 SMMUv3 設(shè)備完成硬件 SMMUv3 設(shè)備的使能,這個(gè)過程大體如下:

  1. 檢查 SMMU_CR0 寄存器,如果 SMMU 已經(jīng)使能,則更新 SMMU_GBPA 寄存器,停止所有傳入的事務(wù)。
  2. 禁用 SMMU 設(shè)備的所有功能,包括命令隊(duì)列、事件隊(duì)列,和 PRI 隊(duì)列 等,這里可以看到 SMMU 的一種獨(dú)特的寄存器寫入模式,SMMU_CR0 寄存器有一個(gè)對(duì)應(yīng)的確認(rèn)寄存器 SMMU_CR0ACK,當(dāng)向 SMMU_CR0 寄存器寫入的值確認(rèn)生效時(shí),SMMU_CR0ACK 寄存器對(duì)應(yīng)的位會(huì)被更新,這里在寫入 SMMU_CR0 寄存器時(shí),會(huì)等待 SMMU_CR0ACK 寄存器對(duì)應(yīng)位的更新,以確定寫入生效,還有其它幾個(gè) SMMU 寄存器的寫入模式與此類似。
  3. 寫入SMMU_CR1 寄存器,配置表和隊(duì)列的內(nèi)存屬性,流表、命令隊(duì)列、事件隊(duì)列和 PRI 隊(duì)列等的可緩存性,可共享性。
  4. 寫入SMMU_CR2 寄存器,配置 RECINVSID、E2HBTM
  5. 將前面在初始化數(shù)據(jù)結(jié)構(gòu)中創(chuàng)建的流表基址配置,流表基址值寫入對(duì)應(yīng)的寄存器。
  6. 將前面在初始化數(shù)據(jù)結(jié)構(gòu)中創(chuàng)建的命令隊(duì)列基址值,命令隊(duì)列生產(chǎn)者指針,命令隊(duì)列消費(fèi)者指針,擴(kuò)展命令隊(duì)列相關(guān)的配置寫入對(duì)應(yīng)的寄存器,并寫入 SMMU_CR0 寄存器 啟用命令隊(duì)列。
  7. 向命令隊(duì)列中發(fā)送幾條命令,無效任何緩存配置,陳舊的 TLB 項(xiàng)等。
  8. 將前面在初始化數(shù)據(jù)結(jié)構(gòu)中創(chuàng)建的事件隊(duì)列基址值,事件隊(duì)列生產(chǎn)者指針,事件隊(duì)列消費(fèi)者指針寫入對(duì)應(yīng)的寄存器,并寫入 SMMU_CR0 寄存器 啟用事件隊(duì)列。
  9. SMMUv3 硬件設(shè)備支持 PRI 時(shí),將前面在初始化數(shù)據(jù)結(jié)構(gòu)中創(chuàng)建的 PRI 隊(duì)列基址值,PRI 隊(duì)列生產(chǎn)者指針,PRI 隊(duì)列消費(fèi)者指針寫入對(duì)應(yīng)的寄存器,并寫入 SMMU_CR0 寄存器 啟用 PRI 隊(duì)列。
  10. 如果 SMMUv3 硬件設(shè)備支持 ATS 檢查,則寫入 SMMU_CR0 寄存器 啟用 ATS 檢查。
  11. 配置中斷。
  12. 如果沒有配置旁路 SMMU 或禁用 SMMU,則寫入 SMMU_CR0 寄存器開啟 SMMU。

設(shè)置中斷的過程如下:

  1. 寫入 SMMU_IRQ_CTRL 寄存器禁用中斷。
  2. 如果配置了使用聯(lián)合中斷,則向系統(tǒng)申請(qǐng)中斷資源,并注冊(cè)中斷處理程序。
  3. 沒有使用聯(lián)合中斷:
    • 配置 MSI;
    • 為事件隊(duì)列請(qǐng)求中斷線,并注冊(cè)中斷處理程序;
    • 為全局錯(cuò)誤請(qǐng)求中斷線,并注冊(cè)中斷處理程序;
    • SMMUv3 硬件設(shè)備支持 PRI 時(shí),為 PRI 隊(duì)列請(qǐng)求中斷線,并注冊(cè)中斷處理程序。
  4. 寫入 SMMU_IRQ_CTRL 寄存器啟用中斷。

arm_smmu_device_reset() 函數(shù)復(fù)位 SMMUv3 設(shè)備,集中設(shè)置 SMMUv3 設(shè)備的各種寄存器,和流表,隊(duì)列,及中斷相關(guān)的各個(gè)寄存器,并在最后使能 SMMU 硬件設(shè)備。

將 SMMUv3 設(shè)備注冊(cè)到 IOMMU 子系統(tǒng)

arm_smmu_device_probe() 函數(shù)調(diào)用 iommu_device_register() 函數(shù)將 SMMUv3 設(shè)備注冊(cè)進(jìn) IOMMU 子系統(tǒng),iommu_device_register() 函數(shù)定義 (位于 drivers/iommu/iommu.c 文件中) 如下:

static LIST_HEAD(iommu_device_list);
static DEFINE_SPINLOCK(iommu_device_lock);
 . . . . . .
int iommu_device_register(struct iommu_device *iommu)
{
	spin_lock(&iommu_device_lock);
	list_add_tail(&iommu->list, &iommu_device_list);
	spin_unlock(&iommu_device_lock);
	return 0;
}
EXPORT_SYMBOL_GPL(iommu_device_register);

IOMMU 子系統(tǒng)用一個(gè)鏈表來維護(hù)系統(tǒng)中的 IOMMU 設(shè)備,將 SMMUv3 設(shè)備注冊(cè)進(jìn) IOMMU 子系統(tǒng)即將表示 SMMUv3 設(shè)備的 struct iommu_device 對(duì)象放進(jìn) IOMMU 子系統(tǒng)的 IOMMU 設(shè)備鏈表中。

為各個(gè)總線類型設(shè)置 IOMMU 回調(diào)

arm_smmu_device_probe() 函數(shù)調(diào)用 arm_smmu_set_bus_ops() 函數(shù)為各個(gè)支持的總線類型設(shè)置 IOMMU 回調(diào),arm_smmu_set_bus_ops() 函數(shù)定義 (位于 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 文件中) 如下:

static int arm_smmu_set_bus_ops(struct iommu_ops *ops)
{
	int err;

#ifdef CONFIG_PCI
	if (pci_bus_type.iommu_ops != ops) {
		err = bus_set_iommu(&pci_bus_type, ops);
		if (err)
			return err;
	}
#endif
#ifdef CONFIG_ARM_AMBA
	if (amba_bustype.iommu_ops != ops) {
		err = bus_set_iommu(&amba_bustype, ops);
		if (err)
			goto err_reset_pci_ops;
	}
#endif
	if (platform_bus_type.iommu_ops != ops) {
		err = bus_set_iommu(&platform_bus_type, ops);
		if (err)
			goto err_reset_amba_ops;
	}

	return 0;

err_reset_amba_ops:
#ifdef CONFIG_ARM_AMBA
	bus_set_iommu(&amba_bustype, NULL);
#endif
err_reset_pci_ops: __maybe_unused;
#ifdef CONFIG_PCI
	bus_set_iommu(&pci_bus_type, NULL);
#endif
	return err;
}

arm_smmu_set_bus_ops() 函數(shù)調(diào)用 bus_set_iommu() 函數(shù)為 platform_bus_typepci_bus_typeamba_bustype 等設(shè)置 IOMMU 回調(diào)。bus_set_iommu() 函數(shù)定義 (位于 drivers/iommu/iommu.c 文件中) 如下:

static int probe_get_default_domain_type(struct device *dev, void *data)
{
	const struct iommu_ops *ops = dev->bus->iommu_ops;
	struct __group_domain_type *gtype = data;
	unsigned int type = 0;

	if (ops->def_domain_type)
		type = ops->def_domain_type(dev);

	if (type) {
		if (gtype->type && gtype->type != type) {
			dev_warn(dev, "Device needs domain type %s, but device %s in the same iommu group requires type %s - using default\n",
				 iommu_domain_type_str(type),
				 dev_name(gtype->dev),
				 iommu_domain_type_str(gtype->type));
			gtype->type = 0;
		}

		if (!gtype->dev) {
			gtype->dev  = dev;
			gtype->type = type;
		}
	}

	return 0;
}

static void probe_alloc_default_domain(struct bus_type *bus,
				       struct iommu_group *group)
{
	struct __group_domain_type gtype;

	memset(&gtype, 0, sizeof(gtype));

	/* Ask for default domain requirements of all devices in the group */
	__iommu_group_for_each_dev(group, &gtype,
				   probe_get_default_domain_type);

	if (!gtype.type)
		gtype.type = iommu_def_domain_type;

	iommu_group_alloc_default_domain(bus, group, gtype.type);

}

static int iommu_group_do_dma_attach(struct device *dev, void *data)
{
	struct iommu_domain *domain = data;
	int ret = 0;

	if (!iommu_is_attach_deferred(domain, dev))
		ret = __iommu_attach_device(domain, dev);

	return ret;
}

static int __iommu_group_dma_attach(struct iommu_group *group)
{
	return __iommu_group_for_each_dev(group, group->default_domain,
					  iommu_group_do_dma_attach);
}

static int iommu_group_do_probe_finalize(struct device *dev, void *data)
{
	struct iommu_domain *domain = data;

	if (domain->ops->probe_finalize)
		domain->ops->probe_finalize(dev);

	return 0;
}

static void __iommu_group_dma_finalize(struct iommu_group *group)
{
	__iommu_group_for_each_dev(group, group->default_domain,
				   iommu_group_do_probe_finalize);
}

static void __iommu_group_dma_finalize(struct iommu_group *group)
{
	__iommu_group_for_each_dev(group, group->default_domain,
				   iommu_group_do_probe_finalize);
}

static int iommu_group_create_direct_mappings(struct iommu_group *group)
{
	return __iommu_group_for_each_dev(group, group,
					  iommu_do_create_direct_mappings);
}

static int probe_iommu_group(struct device *dev, void *data)
{
	struct list_head *group_list = data;
	struct iommu_group *group;
	int ret;

	/* Device is probed already if in a group */
	group = iommu_group_get(dev);
	if (group) {
		iommu_group_put(group);
		return 0;
	}

	ret = __iommu_probe_device(dev, group_list);
	if (ret == -ENODEV)
		ret = 0;

	return ret;
}

static int remove_iommu_group(struct device *dev, void *data)
{
	iommu_release_device(dev);

	return 0;
}

static int iommu_bus_notifier(struct notifier_block *nb,
			      unsigned long action, void *data)
{
	unsigned long group_action = 0;
	struct device *dev = data;
	struct iommu_group *group;

	/*
	 * ADD/DEL call into iommu driver ops if provided, which may
	 * result in ADD/DEL notifiers to group->notifier
	 */
	if (action == BUS_NOTIFY_ADD_DEVICE) {
		int ret;

		ret = iommu_probe_device(dev);
		return (ret) ? NOTIFY_DONE : NOTIFY_OK;
	} else if (action == BUS_NOTIFY_REMOVED_DEVICE) {
		iommu_release_device(dev);
		return NOTIFY_OK;
	}

	/*
	 * Remaining BUS_NOTIFYs get filtered and republished to the
	 * group, if anyone is listening
	 */
	group = iommu_group_get(dev);
	if (!group)
		return 0;

	switch (action) {
	case BUS_NOTIFY_BIND_DRIVER:
		group_action = IOMMU_GROUP_NOTIFY_BIND_DRIVER;
		break;
	case BUS_NOTIFY_BOUND_DRIVER:
		group_action = IOMMU_GROUP_NOTIFY_BOUND_DRIVER;
		break;
	case BUS_NOTIFY_UNBIND_DRIVER:
		group_action = IOMMU_GROUP_NOTIFY_UNBIND_DRIVER;
		break;
	case BUS_NOTIFY_UNBOUND_DRIVER:
		group_action = IOMMU_GROUP_NOTIFY_UNBOUND_DRIVER;
		break;
	}

	if (group_action)
		blocking_notifier_call_chain(&group->notifier,
					     group_action, dev);

	iommu_group_put(group);
	return 0;
}
 . . . . . .
int bus_iommu_probe(struct bus_type *bus)
{
	struct iommu_group *group, *next;
	LIST_HEAD(group_list);
	int ret;

	/*
	 * This code-path does not allocate the default domain when
	 * creating the iommu group, so do it after the groups are
	 * created.
	 */
	ret = bus_for_each_dev(bus, NULL, &group_list, probe_iommu_group);
	if (ret)
		return ret;

	list_for_each_entry_safe(group, next, &group_list, entry) {
		/* Remove item from the list */
		list_del_init(&group->entry);

		mutex_lock(&group->mutex);

		/* Try to allocate default domain */
		probe_alloc_default_domain(bus, group);

		if (!group->default_domain) {
			mutex_unlock(&group->mutex);
			continue;
		}

		iommu_group_create_direct_mappings(group);

		ret = __iommu_group_dma_attach(group);

		mutex_unlock(&group->mutex);

		if (ret)
			break;

		__iommu_group_dma_finalize(group);
	}

	return ret;
}

static int iommu_bus_init(struct bus_type *bus, const struct iommu_ops *ops)
{
	struct notifier_block *nb;
	int err;

	nb = kzalloc(sizeof(struct notifier_block), GFP_KERNEL);
	if (!nb)
		return -ENOMEM;

	nb->notifier_call = iommu_bus_notifier;

	err = bus_register_notifier(bus, nb);
	if (err)
		goto out_free;

	err = bus_iommu_probe(bus);
	if (err)
		goto out_err;


	return 0;

out_err:
	/* Clean up */
	bus_for_each_dev(bus, NULL, NULL, remove_iommu_group);
	bus_unregister_notifier(bus, nb);

out_free:
	kfree(nb);

	return err;
}

/**
 * bus_set_iommu - set iommu-callbacks for the bus
 * @bus: bus.
 * @ops: the callbacks provided by the iommu-driver
 *
 * This function is called by an iommu driver to set the iommu methods
 * used for a particular bus. Drivers for devices on that bus can use
 * the iommu-api after these ops are registered.
 * This special function is needed because IOMMUs are usually devices on
 * the bus itself, so the iommu drivers are not initialized when the bus
 * is set up. With this function the iommu-driver can set the iommu-ops
 * afterwards.
 */
int bus_set_iommu(struct bus_type *bus, const struct iommu_ops *ops)
{
	int err;

	if (ops == NULL) {
		bus->iommu_ops = NULL;
		return 0;
	}

	if (bus->iommu_ops != NULL)
		return -EBUSY;

	bus->iommu_ops = ops;

	/* Do IOMMU specific setup for this bus-type */
	err = iommu_bus_init(bus, ops);
	if (err)
		bus->iommu_ops = NULL;

	return err;
}
EXPORT_SYMBOL_GPL(bus_set_iommu);

bus_set_iommu() 函數(shù)執(zhí)行過程如下:

  1. 為總線設(shè)置 IOMMU 回調(diào)。
  2. 執(zhí)行總線類型 IOMMU 特有的設(shè)置:
    • 向總線注冊(cè)一個(gè)回調(diào),以便于在有新設(shè)備添加時(shí)得到通知;
    • 探測(cè)已經(jīng)添加到總線的設(shè)備。

探測(cè)已經(jīng)添加到總線的設(shè)備的過程如下:

  1. 遍歷總線上已經(jīng)添加的所有設(shè)備,探測(cè)各個(gè)設(shè)備,如果設(shè)備連接到注冊(cè)的 IOMMU 設(shè)備上,則獲得設(shè)備的 iommu_group,這些 iommu_group 都被放進(jìn)一個(gè)鏈表中。
  2. 遍歷獲得的 iommu_group,針對(duì)每個(gè) iommu_group
    • iommu_group 移出鏈表;
    • 嘗試分配默認(rèn)的 domain,先確認(rèn)默認(rèn)的 domain 類型,然后分配 domain 對(duì)象,確認(rèn) domain 類型時(shí),先嘗試通過 SMMU 設(shè)備注冊(cè)的回調(diào) def_domain_type 獲取,失敗時(shí)則取 IOMMO 子系統(tǒng)的默認(rèn) domain 類型;
    • 為各個(gè)設(shè)備創(chuàng)建直接映射;
    • 為各個(gè)設(shè)備執(zhí)行 attach;
    • 為各個(gè)設(shè)備執(zhí)行探測(cè)結(jié)束回調(diào)。

為各個(gè)總線類型設(shè)置 IOMMU 回調(diào),處理了連接到 IOMMU 的系統(tǒng) I/O 設(shè)備先于 IOMMU 設(shè)備探測(cè)的情況。

參考文檔

IOMMU的現(xiàn)狀和發(fā)展

IOMMU和Arm SMMU介紹

SMMU內(nèi)核驅(qū)動(dòng)分析

IOMMU/SMMUV3代碼分析(5)IO設(shè)備與SMMU的關(guān)聯(lián)2文章來源地址http://www.zghlxwxcb.cn/news/detail-746387.html

到了這里,關(guān)于深入淺出 Linux 中的 ARM IOMMU SMMU I的文章就介紹完了。如果您還想了解更多內(nèi)容,請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!

本文來自互聯(lián)網(wǎng)用戶投稿,該文觀點(diǎn)僅代表作者本人,不代表本站立場。本站僅提供信息存儲(chǔ)空間服務(wù),不擁有所有權(quán),不承擔(dān)相關(guān)法律責(zé)任。如若轉(zhuǎn)載,請(qǐng)注明出處: 如若內(nèi)容造成侵權(quán)/違法違規(guī)/事實(shí)不符,請(qǐng)點(diǎn)擊違法舉報(bào)進(jìn)行投訴反饋,一經(jīng)查實(shí),立即刪除!

領(lǐng)支付寶紅包贊助服務(wù)器費(fèi)用

相關(guān)文章

  • 深入淺出Node.js中的node_modules

    深入淺出Node.js中的node_modules

    在Node.js中, node_modules 是一個(gè)特殊的目錄,通常用來存放項(xiàng)目所依賴的 npm 包及其相關(guān)依賴,以供應(yīng)用程序在運(yùn)行時(shí)動(dòng)態(tài)加載所需的模塊和庫文件。 當(dāng)使用 npm 或者 yarn 等包管理工具安裝npm包時(shí),會(huì)將相關(guān)依賴包下載并保存在項(xiàng)目的 node_modules 目錄下,以便于在應(yīng)用程序中引用

    2024年02月06日
    瀏覽(25)
  • 深入淺出對(duì)話系統(tǒng)——閑聊對(duì)話系統(tǒng)進(jìn)階

    深入淺出對(duì)話系統(tǒng)——閑聊對(duì)話系統(tǒng)進(jìn)階

    本文主要關(guān)注生成式閑聊對(duì)話系統(tǒng)的進(jìn)階技術(shù)。 本節(jié)主要介紹GPT系列文章,這是由OpenAI團(tuán)隊(duì)推出的,現(xiàn)在大火的ChatGPT也是它們推出的。 在自然語言理解中有很多不同的任務(wù),雖然我們有很多無標(biāo)簽文本文件,但是有標(biāo)簽的文本相對(duì)較少。使得難以在這些少量的數(shù)據(jù)集上訓(xùn)練

    2024年02月13日
    瀏覽(29)
  • 【昕寶爸爸小模塊】深入淺出之Java 8中的 Stream

    【昕寶爸爸小模塊】深入淺出之Java 8中的 Stream

    Stream 使用一種類似用 SOL 語句從數(shù)據(jù)庫查詢數(shù)據(jù)的直觀方式來提供一種對(duì)Java 集合運(yùn)算和表達(dá)的高階抽象。 Stream API 可以極大提高Java程序員的生產(chǎn)力,讓程序員寫出高效率、干凈、簡潔的代碼。 這種風(fēng)格將要處理的元素集合看作一種流,流在管道中傳輸,并且可以在管道的節(jié)

    2024年02月02日
    瀏覽(52)
  • 深入淺出推薦系統(tǒng)(一):推薦系統(tǒng)基本架構(gòu)

    深入淺出推薦系統(tǒng)(一):推薦系統(tǒng)基本架構(gòu)

    過去八九年在廣告、生活服務(wù)、電商等領(lǐng)域從事大數(shù)據(jù)及推薦系統(tǒng)相關(guān)工作,近來打算對(duì)過去的工作做一個(gè)系統(tǒng)性的梳理。一方面幫自己查缺補(bǔ)漏、進(jìn)行更深入的學(xué)習(xí);另一方面也希望能通過博客結(jié)交同好,增進(jìn)交流。 這一博客系列以介紹推薦系統(tǒng)為主,會(huì)少量涉及廣告系統(tǒng)

    2023年04月26日
    瀏覽(24)
  • 深入淺出 -- 系統(tǒng)架構(gòu)之單體架構(gòu)

    深入淺出 -- 系統(tǒng)架構(gòu)之單體架構(gòu)

    單體架構(gòu)(Monolithic Architecture)是一種傳統(tǒng)的軟件架構(gòu)模式,將整個(gè)應(yīng)用程序作為一個(gè)單一的、統(tǒng)一的單元進(jìn)行開發(fā)、部署和擴(kuò)展。在單體架構(gòu)中,所有的功能模塊都被打包在一起,共享同一個(gè)代碼庫和數(shù)據(jù)庫。 例如,在網(wǎng)上商城系統(tǒng)中,JavaWeb工程通常會(huì)被打成WA R包部署在

    2024年04月10日
    瀏覽(23)
  • (已完結(jié))深入淺出操作系統(tǒng) - 目錄

    ---- 整理自狄泰軟件唐佐林老師課程 實(shí)驗(yàn)環(huán)境: OS Version: Ubuntu 10.10 QT Version: 4.7.4 QT Creator Version: 2.4.1 Bochs Version: 2.4.5 01 - 進(jìn)階操作系統(tǒng)(BIOS) 02 - Hello, DTOS!(第一個(gè)主引導(dǎo)程序,屏幕上打印Hello,DTOS! ) 03 - 調(diào)試環(huán)境的搭建(Bochs) 04-05 - 主引導(dǎo)程序的擴(kuò)展 06-07-08 - 突破512字節(jié)

    2024年02月12日
    瀏覽(51)
  • 深入淺出對(duì)話系統(tǒng)——自然語言理解模塊

    深入淺出對(duì)話系統(tǒng)——自然語言理解模塊

    首先回顧一下自然語言理解的概念。 自然語言理解(Natural Language Understanding)包含三個(gè)子模塊: 其中領(lǐng)域識(shí)別和意圖識(shí)別都是分類問題,而語義槽填充屬于序列標(biāo)注問題。所以,在自然語言理解中,我們要解決兩個(gè)分類任務(wù)和一個(gè)序列標(biāo)注任務(wù)。既然其中兩個(gè)問題都屬于分類任

    2024年02月08日
    瀏覽(21)
  • 【計(jì)算機(jī)視覺中的多視圖幾何系列】深入淺出理解針孔相機(jī)模型

    【計(jì)算機(jī)視覺中的多視圖幾何系列】深入淺出理解針孔相機(jī)模型

    溫故而知新,可以為師矣! 《計(jì)算機(jī)視覺中的多視圖幾何-第五章》-Richard Hartley, Andrew Zisserman. 1.1 投影中心/攝像機(jī)中心/光心 投影中心 稱為 攝像機(jī)中心 ,也稱為 光心 。投影中心位于一個(gè)歐式坐標(biāo)系的原點(diǎn)。 1.2 圖像平面/聚焦平面 平面 Z = f Z=f Z = f 被稱為 圖像平面 或 聚焦

    2024年02月03日
    瀏覽(31)
  • 深入淺出 -- 系統(tǒng)架構(gòu)之負(fù)載均衡Nginx反向代理

    深入淺出 -- 系統(tǒng)架構(gòu)之負(fù)載均衡Nginx反向代理

    一、Nginx反向代理-負(fù)載均衡 ?首先通過 SpringBoot+Freemarker 快速搭建一個(gè) WEB 項(xiàng)目:springboot-web-nginx,然后在該項(xiàng)目中,創(chuàng)建一個(gè) IndexNginxController.java 文件,邏輯如下: 在該 Controller 類中,存在一個(gè)成員變量: port ,它的值即是從 application.properties 配置文件中獲取 server.port 值。

    2024年04月12日
    瀏覽(24)
  • 二、深入淺出WPF之系統(tǒng)學(xué)習(xí)XAML語法

    跟Winforms一樣,UI也是個(gè)平面結(jié)構(gòu),與winforms的設(shè)計(jì)思維不同,WPF使用樹形邏輯來描述UI,下面是UI布局的簡單代碼 實(shí)際的頁面效果:

    2024年02月16日
    瀏覽(27)

覺得文章有用就打賞一下文章作者

支付寶掃一掃打賞

博客贊助

微信掃一掃打賞

請(qǐng)作者喝杯咖啡吧~博客贊助

支付寶掃一掃領(lǐng)取紅包,優(yōu)惠每天領(lǐng)

二維碼1

領(lǐng)取紅包

二維碼2

領(lǐng)紅包