Title?
題目
SAM-Med3D
01
文獻速遞介紹
醫(yī)學圖像分析已成為現(xiàn)代醫(yī)療保健不可或缺的基石,輔助診斷、治療計劃和進一步的醫(yī)學研究]。在這一領域中最重要的挑戰(zhàn)之一是精確分割體積醫(yī)學圖像。盡管眾多方法在一系列目標上展現(xiàn)了值得稱贊的有效性,但現(xiàn)有的分割技術傾向于專門針對特定器官或病變。這種傾向是由體積醫(yī)學圖像的固有特性所決定的,比如3D解剖結構的復雜性和體積醫(yī)學注釋的有限性。因此,這種專業(yè)化阻礙了方法的泛化能力,給更廣泛的臨床應用帶來了實際挑戰(zhàn)。
最近,“分割任何模型”(SAM),一個用超過10億掩碼訓練的視覺基礎模型(VFM),在眾多領域展現(xiàn)了令人印象深刻的零樣本分割性能。SAM的興起為加速數(shù)據(jù)注釋和提高體積醫(yī)學圖像分析的方法論泛化能力引入了新的可能性。然而,研究指出,由于對醫(yī)學圖像知識的顯著缺乏,SAM對醫(yī)學領域的原生適用性是有限的。一個直接的解決方案是通過醫(yī)學圖像進行微調以將醫(yī)學知識注入SAM。MedSAM通過使用110萬掩碼對解碼器進行微調實現(xiàn)了這一點,使SAM能夠在醫(yī)學成像中得到應用。SAM-Med2D通過使用適配器和約2000萬掩碼進行全面適配,展示了在一般醫(yī)學圖像分割中的顯著能力。然而,這些方法必須采用逐片處理體積圖像的方法:將3D數(shù)據(jù)分解成2D切片,獨立處理每個切片,然后將2D結果聚合成3D預測。如之前的評估所示,由于忽略了切片間的3D空間信息,逐片處理方式在3D醫(yī)學圖像上的性能并不理想。
Abstract-Background?
摘要
Although the Segment Anything Model (SAM) has demonstrated impressive per formance in 2D natural image segmentation, its application to 3D volumetric?medical images reveals significant shortcomings, namely suboptimal performance and unstable prediction, necessitating an excessive number of prompt points to attain the desired outcomes. These issues can hardly be addressed by fine-tuning
SAM on medical data because the original 2D structure of SAM neglects 3D spatial information. In this paper, we introduce SAM-Med3D, the most comprehensive study to modify SAM for 3D medical images. Our approach is characterized by its comprehensiveness in two primary aspects: firstly, by comprehensively reformulat ing SAM to a thorough 3D architecture trained on a comprehensively processed large-scale volumetric medical dataset; and secondly, by providing a comprehen sive evaluation of its performance. Specifically, we train SAM-Med3D with over 131K 3D masks and 247 categories. Our SAM-Med3D excels at capturing 3D?spatial information, exhibiting competitive performance with significantly fewer prompt points than the top-performing fine-tuned SAM in the medical domain. We then evaluate its capabilities across 15 datasets and analyze it from multiple per?spectives, including anatomical structures, modalities, targets, and generalization abilities. Our approach, compared with SAM, showcases pronouncedly enhanced efficiency and broad segmentation capabilities for 3D volumetric medical images. Our code is released at https://github.com/uni-medical/SAM-Med3D.
盡管“分割任何模型”(SAM)在2D自然圖像分割方面展示了令人印象深刻的性能,但其應用于3D體積醫(yī)學圖像時卻顯示出明顯的不足,主要表現(xiàn)為次優(yōu)的性能和不穩(wěn)定的預測,需要過多的提示點才能達到期望的結果。僅僅通過在醫(yī)學數(shù)據(jù)上微調SAM難以解決這些問題,因為SAM的原始2D結構忽略了3D空間信息。在本文中,我們介紹了SAM-Med3D,這是修改SAM以適應3D醫(yī)學圖像的最全面的研究。我們的方法在兩個主要方面表現(xiàn)出全面性:首先,全面重新構思SAM到一個徹底的3D架構,并在一個全面處理的大規(guī)模體積醫(yī)學數(shù)據(jù)集上進行訓練;其次,提供了其性能的全面評估。具體來說,我們用超過131K的3D掩碼和247個類別訓練SAM-Med3D。我們的SAM-Med3D擅長捕捉3D空間信息,在醫(yī)學領域的頂尖微調SAM中,以顯著更少的提示點展示了具有競爭力的性能。然后,我們在15個數(shù)據(jù)集上評估其能力,并從多個角度進行分析,包括解剖結構、模態(tài)、目標和泛化能力。與SAM相比,我們的方法明顯提高了效率,并為3D體積醫(yī)學圖像展示了廣泛的分割能力。我們的代碼已在https://github.com/uni-medical/SAM-Med3D發(fā)布。
Conclusions
結論
In this study, we present SAM-Med3D, a holistic 3D SAM model for volumetric medical image segmentation, trained from scratch on a large-scale 3D medical image dataset. Our SAM-Med3D?employs 3D positional encodings in different components to directly integrate 3D spatial information, and exhibit excellent performance on volumetric medical image segmentation. SAM-Med3D achieves a 32.90% improvement than SAM when provided with 1 point per volume, indicating its excellent usability to generate better outcomes in volumetric medical segmentation tasks with significantly fewer prompt points. Furthermore, we conduct an extensive evaluation from diverse perspectives to explore the capacities of SAM-Med3D. For various anatomical structures like bone, heart and muscle, our SAM-Med3D outperforms other methods with a clear margin when limited prompt is provided. Our SAM-Med3D consistently excels in different modalities and various organs and lesions. Additionally, we test the transferability of SAM-Med3D. Validated on two frequently used benchmarks, SAM-Med3D has the potential to work as a powerful pre-trained model for 3D medical image transformer. Setting aside the numerical result gap between 2D SAM methods and our SAM-Med3D, a well trained 3D SAM model should inherently exhibit superior inter-slice consistency and usability, as?observed in the visual results. While 3D models enhance usability, prompts within volumetric images?13Axial Coronal Axial SagittalCT: Gluteus Maximus MRI: Kidney Axial Coronal MRI (FLAIR): Edematend to be sparser compared to the densely annotated 2D slices used in slice-by-slice inference. This sparsity places significant demands on the 3D model’s ability to capture spatial information and effectively utilize sparse prompts, thereby increasing the training complexity. In our approach, we address this issue by employing a fully learnable 3D structure to better model the spatial information in 3D space. Despite this, there remains a plethora of avenues for future exploration, such as the development of novel 3D prompt forms and training strategies that are more suited to 3D contexts.
在本研究中,我們提出了SAM-Med3D,一個全面的3D SAM模型,用于體積醫(yī)學圖像分割,從頭開始在一個大規(guī)模的3D醫(yī)學圖像數(shù)據(jù)集上進行訓練。我們的SAM-Med3D在不同組件中采用了3D位置編碼,直接整合了3D空間信息,并在體積醫(yī)學圖像分割上展現(xiàn)出了卓越的性能。與每個體積提供1個點的情況下的SAM相比,SAM-Med3D實現(xiàn)了32.90%的改進,表明其在需要顯著更少的提示點的體積醫(yī)學分割任務中具有出色的可用性。
此外,我們從多個角度進行了廣泛的評估,以探索SAM-Med3D的能力。對于像骨骼、心臟和肌肉這樣的各種解剖結構,當提供有限的提示時,我們的SAM-Med3D以明顯的優(yōu)勢超越了其他方法。我們的SAM-Med3D在不同的模態(tài)、各種器官和病變中始終表現(xiàn)出色。此外,我們測試了SAM-Med3D的可轉移性。在兩個常用的基準測試上驗證,SAM-Med3D有潛力作為3D醫(yī)學圖像變換器的強大預訓練模型。
撇開2D SAM方法和我們的SAM-Med3D之間的數(shù)值結果差異不談,一個訓練有素的3D SAM模型應該天生就表現(xiàn)出更優(yōu)的切片間一致性和可用性,正如在視覺結果中觀察到的那樣。雖然3D模型提高了可用性,但在體積圖像中的提示與逐片推理中使用的密集標注的2D切片相比,傾向于更稀疏。這種稀疏性對3D模型捕捉空間信息的能力和有效利用稀疏提示提出了重大要求,從而增加了訓練的復雜性。在我們的方法中,我們通過采用完全可學習的3D結構來更好地建模3D空間中的空間信息來解決這個問題。盡管如此,仍然存在大量的未來探索途徑,如開發(fā)更適合3D情境的新型3D提示形式和訓練策略。
Method
方法
4.1 Revisit SAM?
The Segment Anything Model (SAM) presents a robust architectural design for promptable image segmentation tasks, primarily tailored for 2D natural images. SAM’s architecture can be divided into
three core components: Image encoder SAM leverages an MAE? pre-trained Vision Transformer (ViT)? to extract representations. This component utilizes 2D patch embeddings combined with learnable position encodings to turn the input image into image embeddings. Prompt encoder This module can handle both sparse (points, boxes) and dense (masks) prompts. Sparse prompts are represented using frozen 2D absolute positional encodings and then combined with learned embeddings specific to each prompt type. Dense prompts are encoded with a 2D convolution neck to generate dense prompt embeddings. Mask decoder A lightweight structure is adopted to efficiently map the image embedding with aset of prompt embeddings to an output mask. Four steps are contained in each transformer layer:
(1) self-attention on tokens; (2) cross-attention between tokens and the image embedding; (3) token?updates using a point-wise MLP; (4) cross-attention that updates the image embedding with prompt?details. After processing through the transformer layers, the feature map undergoes up-sampling and?is subsequently converted into segmentation masks using an MLP. Notably, all the transformer layers
capture only 2D geometric information during the forward pass.
“分割任何模型”(SAM)提供了一個強大的架構設計,用于可提示的圖像分割任務,主要針對2D自然圖像。SAM的架構可以分為三個核心組件:
圖像編碼器 SAM利用預訓練的視覺變換器(ViT)中的MAE來提取表示。該組件利用2D補丁嵌入結合可學習的位置編碼,將輸入圖像轉換為圖像嵌入。
提示編碼器 該模塊可以處理稀疏(點、框)和密集(掩碼)提示。稀疏提示使用凍結的2D絕對位置編碼表示,然后與每種提示類型特定的學習嵌入結合。密集提示通過2D卷積頸部編碼,以生成密集提示嵌入。
掩碼解碼器 采用輕量級結構有效地將圖像嵌入與一組提示嵌入映射到輸出掩碼。每個變換器層包含四個步驟:(1)對令牌進行自注意;(2)令牌與圖像嵌入之間的交叉注意;(3)使用點式MLP更新令牌;(4)更新圖像嵌入與提示細節(jié)的交叉注意。通過變換器層處理后,特征圖進行上采樣,隨后使用MLP轉換為分割掩碼。值得注意的是,所有的變換器層在前向傳播過程中僅捕獲2D幾何信息。
Figure
圖
Figure 1: Illustration of SAM [**21], fine-tuned SAM (SAM-Med2D [6]), and our SAM-Med3D on 3D Volumetric Medical Images. Both SAM and SAM-Med2D take N prompt points (one for each slice) whereas SAM-Med3D uses a single prompt point for the entire 3D volume. Here, N corresponds to the number of slices containing the target object. The top-left corner provides a schematic of the Axial, Coronal, and Sagittal views. For a given 3D input, we visualize the 3D, coronal, and multiple axial views. The numbers in brackets indicate the index of each axial slice.
圖1:對3D體積醫(yī)學圖像的SAM、經(jīng)過微調的SAM(SAM-Med2D )以及我們的SAM-Med3D的示意圖。SAM和SAM-Med2D采用N個提示點(每個切片一個),而SAM-Med3D對整個3D體積使用單個提示點。這里的N*對應于包含目標對象的切片數(shù)量。左上角提供了軸向、冠狀和矢狀視圖的示意圖。對于給定的3D輸入,我們可視化了3D、冠狀和多個軸向視圖。括號中的數(shù)字表示每個軸向切片的索引。
Figure 2: (a) The word cloud maps for all training data category statistics. There are 247 categories?in our training data. (b) Comparison of counts of images and masks in the 3D medical image datasets?we collected for training. Our dataset consists of 21K 3D images with corresponding 131K 3D masks,?while AMOS , TotalSegmentator? have less than 2K images, and BraTS21 ?has less than?10K masks.
圖2:(a) 所有訓練數(shù)據(jù)類別統(tǒng)計的詞云圖。我們的訓練數(shù)據(jù)中有247個類別。(b) 我們收集用于訓練的3D醫(yī)學圖像數(shù)據(jù)集中圖像和掩碼數(shù)量的比較。我們的數(shù)據(jù)集包含21K 3D圖像及其對應的131K 3D掩碼,而AMOS、TotalSegmentator的圖像不到2K,BraTS21的掩碼不到10K。
Figure 3: The modified 3D architecture of our SAM-Med3D. The original 2D components are?transformed into their 3D counterparts, encompassing a 3D image encoder, 3D prompt encoder, and?3D mask decoder. 3D convolution, 3D positional encoding (PE) and 3D layer norm are employed to?construct the 3D model.
圖3:我們的SAM-Med3D修改后的3D架構。原始的2D組件被轉換為它們的3D對應部分,包括3D圖像編碼器、3D提示編碼器和3D掩碼解碼器。使用3D卷積、3D位置編碼(PE)和3D層歸一化來構建3D模型。
Figure 4: (a-c) Performance comparison across different modalities with varying numbers of points.?Notably, while SAM-Med2D was trained on the US (Ultrasound) modality and SAM-Med3D was not, SAM-Med3D still exhibits competitive performance. (d) Comparison of the Dice coefficient between SAM-Med3D and the top-performing 2D fine-tuned SAM model, SAM-Med2D across 34 major organs and 5 kinds of lesions.? and? represent seen and unseen lesions.
圖4:(a-c) 在不同模態(tài)下使用不同數(shù)量的點進行性能比較。值得注意的是,雖然SAM-Med2D是在US(超聲)模態(tài)上訓練的,而SAM-Med3D沒有,但SAM-Med3D仍展示了有競爭力的性能。(d) 在34個主要器官和5種類型的病變中,SAM-Med3D與表現(xiàn)最佳的2D微調SAM模型,SAM-Med2D 之間的Dice系數(shù)比較。和分別代表已見和未見的病變。
Figure 5: Visualization of SAM, SAM-Med2D, and our proposed SAM-Med3D across diverse?anatomical structures for varying numbers of point. We present both axial slices and coronal/sagittal?views to comprehensively illustrate the 3D results. Abd&Tho denotes Abdominal and Thorax.
圖5:在不同的解剖結構中,對于不同數(shù)量的點,可視化SAM、SAM-Med2D和我們提出的SAM-Med3D。我們展示了軸向切片和冠狀/矢狀視圖,以全面說明3D結果。Abd&Tho表示腹部和胸部。
Figure 6: Visualization of SAM, SAM-Med2D, and our proposed SAM-Med3D across various modalities for varying numbers of point. We present both axial slices and coronal/sagittal views to comprehensively illustrate the 3D results.
圖6:在不同模態(tài)中,對于不同數(shù)量的點,可視化SAM、SAM-Med2D和我們提出的SAM-Med3D。我們展示了軸向切片和冠狀/矢狀視圖,以全面說明3D結果。
Table
表
Table 1: Comparison of SAM models for 3D volumetric medical images. Our SAM-Med3D?employs a fully learnable 3D architecture with large-scale training data, instead of frozen 2D layers?with adapters. ?and ] denotes frozen and learnable.
表1:對3D體積醫(yī)學圖像的SAM模型比較。我們的SAM-Med3D采用完全可學習的3D架構和大規(guī)模訓練數(shù)據(jù),而不是使用適配器的凍結2D層。和]表示凍結和可學習。
Table 2: Preliminary experiment investigating the reusability of 2D SAM pre-trained Weights in?SAM-Med3D. Experimental details are consistent with Section 5.1.
表2:初步實驗探索2D SAM預訓練權重在SAM-Med3D中的可重用性。實驗細節(jié)與第5.1節(jié)一致。
Table 3: Quantitative comparison of different methods on our evaluation dataset, detailed in Section 3.Here, N denotes the count of slices containing the target object (10 ≤ N ≤ 200). T**inf (Inference?time) is calculated with N=100, excluding the time for image processing and simulated prompt?generation.
表3:在第3節(jié)詳述的我們的評估數(shù)據(jù)集上,不同方法的定量比較。這里,N表示包含目標對象的切片數(shù)量(10 ≤ N ≤ 200)。T**inf(推理時間)是在N=100的條件下計算的,不包括圖像處理和模擬提示生成的時間。
Table 4: Comparison from the perspective of anatomical structure and lesion. A&T represents?Abdominal and Thorax targets. N denotes the count of slices containing the target object (10 ≤ N ≤200).
表4:從解剖結構和病變的角度進行比較。A&T代表腹部和胸部目標。N表示包含目標對象的切片數(shù)量(10 ≤ N ≤ 200)。
Table 5: Transferability evaluation for the fully-supervised 3D medical image segmentation. We?trained the state-of-the-art ViT-based segmentation model (i.e. UNETR ), both with and without?our SAM-Med3D pre-trained ViT encoder, to assess the benefits of pre-training.文章來源:http://www.zghlxwxcb.cn/news/detail-856043.html
表5:對全監(jiān)督3D醫(yī)學圖像分割的可轉移性評估。我們訓練了最先進的基于ViT的分割模型(即UNETR [),分別使用和不使用我們的SAM-Med3D預訓練ViT編碼器,以評估預訓練的好處。文章來源地址http://www.zghlxwxcb.cn/news/detail-856043.html
到了這里,關于文獻速遞:文獻速遞:基于SAM的醫(yī)學圖像分割--SAM-Med3D的文章就介紹完了。如果您還想了解更多內容,請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關文章,希望大家以后多多支持TOY模板網(wǎng)!