Pixel-GS: Density Control with Pixel-aware Gradient for 3D Gaussian Splatting Pixel-GS:用于3D高斯濺射的具有像素感知梯度的密度控制
老宜興市鄭張文博胡 ?
Tong He ?Hengshuang Zhao?
趙同和恒雙 ?1122113311
Abstract?摘要? ? ? ? ?[2403.15530] Pixel-GS: Density Control with Pixel-aware Gradient for 3D Gaussian Splatting
3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis results while advancing real-time rendering performance. However, its efficacy heavily relies on the quality of the initial point cloud, leading to blurring and needle-like artifacts in regions with inadequate initializing points. This issue is mainly due to the point cloud growth condition, which only considers the average gradient magnitude of points from observable views, thereby failing to grow for large Gaussians that are observable for many viewpoints while many of them are only covered in the boundaries. To address this, we introduce Pixel-GS, a novel approach to take into account the number of pixels covered by the Gaussian in each view during the computation of the growth condition. We regard the covered pixel numbers as the weights to dynamically average the gradients from different views, such that the growth of large Gaussians can be prompted. As a result, points within the areas with insufficient initializing points can be grown more effectively, leading to a more accurate and detailed reconstruction. In addition, we propose a simple yet effective strategy to scale the gradient field according to the distance to the camera, to suppress the growth of floaters near the camera. Extensive qualitative and quantitative experiments confirm that our method achieves state-of-the-art rendering quality while maintaining real-time speeds, outperforming on challenging datasets such as Mip-NeRF 360 and Tanks & Temples. Code and demo are available at:?https://pixelgs.github.io
3D高斯濺射(3DGS)已經(jīng)展示了令人印象深刻的新穎的視圖合成結(jié)果,同時(shí)提高了實(shí)時(shí)渲染性能。然而,它的有效性嚴(yán)重依賴于初始點(diǎn)云的質(zhì)量,導(dǎo)致在初始化點(diǎn)不足的區(qū)域中出現(xiàn)模糊和針狀偽影。這個(gè)問(wèn)題主要是由于點(diǎn)云增長(zhǎng)條件,它只考慮來(lái)自可觀察視圖的點(diǎn)的平均梯度幅度,從而無(wú)法增長(zhǎng)對(duì)于許多視點(diǎn)可觀察的大高斯,而其中許多僅覆蓋在邊界中。為了解決這個(gè)問(wèn)題,我們引入了Pixel-GS,這是一種新的方法,可以在計(jì)算生長(zhǎng)條件的過(guò)程中考慮每個(gè)視圖中高斯覆蓋的像素?cái)?shù)量。我們將覆蓋像素?cái)?shù)作為權(quán)重,動(dòng)態(tài)平均來(lái)自不同視圖的梯度,從而可以促進(jìn)大高斯的增長(zhǎng)。 結(jié)果,可以更有效地生長(zhǎng)初始化點(diǎn)不足的區(qū)域內(nèi)的點(diǎn),從而導(dǎo)致更準(zhǔn)確和詳細(xì)的重建。此外,我們提出了一個(gè)簡(jiǎn)單而有效的策略,根據(jù)到相機(jī)的距離來(lái)縮放梯度場(chǎng),以抑制相機(jī)附近漂浮物的增長(zhǎng)。大量的定性和定量實(shí)驗(yàn)證實(shí),我們的方法實(shí)現(xiàn)了最先進(jìn)的渲染質(zhì)量,同時(shí)保持實(shí)時(shí)速度,在具有挑戰(zhàn)性的數(shù)據(jù)集,如Mip-NeRF 360和坦克和寺廟。代碼和演示可在:https://pixelgs.github。io
Keywords:?
View Synthesis Point-based Radiance Field Read-time Rendering 3D Gaussian Splatting Adaptive Density Control關(guān)鍵詞:視圖合成基于點(diǎn)的輻射場(chǎng)實(shí)時(shí)繪制三維高斯濺射自適應(yīng)密度控制
??Corresponding author.
? 通訊作者。
1Introduction?1介紹
Novel View Synthesis (NVS) is a fundamental problem in computer vision and graphics. Recently, 3D Gaussian Splatting (3DGS)?[21]?has drawn increasing attention for its explicit point-based representation of 3D scenes and real-time rendering performance.
新視圖合成是計(jì)算機(jī)視覺(jué)和圖形學(xué)中的一個(gè)基本問(wèn)題。最近,3D高斯濺射(3DGS)[ 21]因其顯式的基于點(diǎn)的3D場(chǎng)景表示和實(shí)時(shí)渲染性能而受到越來(lái)越多的關(guān)注。
|
|
|
|
|
|
|
|
(a) Ground Truth?(a)地面實(shí)況? | (b) 3DGS??(original threshold) (b)3DGS(原始閾值) |
(c) 3DGS??(lower threshold) (c)3DGS(低閾值) |
(d) Pixel-GS (Ours)?(d)Pixel-GS(我們的) |
To convert b to d, adjust densification from?∑‖??‖∑1>?pos?to?∑pixel?‖??‖∑pixel>?pos.
為了將B轉(zhuǎn)換為d,將致密化從 ∑‖??‖∑1>?pos 調(diào)整為 ∑pixel?‖??‖∑pixel>?pos 。
Figure 1:?Our Pixel-GS effectively grows points in areas with insufficient initializing points (a), leading to a more accurate and detailed reconstruction (d). In contrast, 3D Gaussian Splatting (3DGS) suffers from blurring and needle-like artifacts in these areas, even with a lower threshold of splitting and cloning to encourage more grown points (c). The rendering quality (in LPIPS?↓) and memory consumption are shown in the results. 3DGS??is our retrained 3DGS model with better performance.
圖一:我們的Pixel-GS有效地在初始化點(diǎn)不足的區(qū)域中增加點(diǎn)(a),從而實(shí)現(xiàn)更準(zhǔn)確和詳細(xì)的重建(d)。相比之下,3D高斯濺射(3DGS)在這些區(qū)域中遭受模糊和針狀偽影,即使具有較低的分裂和克隆閾值以鼓勵(lì)更多的生長(zhǎng)點(diǎn)(c)。渲染質(zhì)量(以LPIPS ↓ 為單位)和內(nèi)存消耗將顯示在結(jié)果中。3DGS ? 是我們重新訓(xùn)練的3DGS模型,具有更好的性能。
3DGS represents the scene as a set of points associated with geometry (Gaussian scales) and appearance (opacities and colors) attributes. These attributes can be effectively learned by the differentiable rendering, while the optimization of the point cloud’s density is challenging. 3DGS carefully initializes the point cloud using the sparse points produced by the Structure from Motion (SfM) process and presents an adaptive density control mechanism to split or clone the points during the optimization process. However, this mechanism relies heavily on the initial point cloud’s quality and cannot effectively grow points in areas where the initial point cloud is sparse, resulting in blurry or needle-like artifacts in the synthesized images. In practice, the initial point cloud from SfM unavoidably suffers from insufficient points in areas with repetitive textures and few observations. As shown in the first and second columns of Figure?1, the blurry regions in the RGB images are well aligned with the areas where few points are initialized, and 3DGS fails to generate enough points in these areas.
3DGS將場(chǎng)景表示為與幾何體(高斯比例)和外觀(不透明度和顏色)屬性相關(guān)聯(lián)的一組點(diǎn)。這些屬性可以通過(guò)可微繪制有效地學(xué)習(xí),而點(diǎn)云密度的優(yōu)化是具有挑戰(zhàn)性的。3DGS使用由運(yùn)動(dòng)恢復(fù)結(jié)構(gòu)(SfM)過(guò)程產(chǎn)生的稀疏點(diǎn)仔細(xì)地對(duì)點(diǎn)云進(jìn)行優(yōu)化,并提出了一種自適應(yīng)密度控制機(jī)制來(lái)在優(yōu)化過(guò)程中分割或克隆點(diǎn)。然而,這種機(jī)制嚴(yán)重依賴于初始點(diǎn)云的質(zhì)量,并且不能有效地在初始點(diǎn)云稀疏的區(qū)域中生長(zhǎng)點(diǎn),從而導(dǎo)致合成圖像中的模糊或針狀偽影。在實(shí)踐中,來(lái)自SfM的初始點(diǎn)云必然會(huì)在具有重復(fù)紋理和很少觀測(cè)的區(qū)域中遭受點(diǎn)不足的問(wèn)題。 如圖1的第一列和第二列所示,RGB圖像中的模糊區(qū)域與初始化很少的點(diǎn)的區(qū)域?qū)R良好,并且3DGS無(wú)法在這些區(qū)域中生成足夠的點(diǎn)。
In essence, this issue is mainly attributed to the condition of when to split or clone a point. 3DGS decides it by checking whether the average gradient magnitude of the points in the Normalized Device Coordinates (NDC) is larger than a threshold. The magnitude of the gradient is equally averaged across different viewpoints, and the threshold is fixed. Large Gaussians are usually visible in many viewpoints, and the size of their projection area varies significantly across views, leading to the number of pixels involved in the gradient calculation varies significantly. According to the mathematical form of the Gaussian distribution, a few pixels near the center of the projected Gaussian contribute much more to the gradient than the pixels far away from the center. Larger Gaussians often have many viewpoints where the area near the projected center point is not within the screen space, thereby lowering the average gradient, making them difficult to split or clone. This issue cannot be solved by merely lowering the threshold, as it would more likely encourage growing points in areas with sufficient points, as shown in the third column of Figure?1, still leaving blurry artifacts in the areas with insufficient points.
從本質(zhì)上講,這個(gè)問(wèn)題主要?dú)w因于何時(shí)分割或克隆一個(gè)點(diǎn)的條件。3DGS通過(guò)檢查歸一化設(shè)備坐標(biāo)(NDC)中的點(diǎn)的平均梯度幅度是否大于閾值來(lái)決定它。梯度的大小在不同視點(diǎn)之間相等地平均,并且閾值是固定的。大高斯通常在許多視點(diǎn)中可見(jiàn),并且它們的投影區(qū)域的大小在視圖之間變化很大,導(dǎo)致梯度計(jì)算中涉及的像素?cái)?shù)量變化很大。根據(jù)高斯分布的數(shù)學(xué)形式,靠近投影高斯中心的幾個(gè)像素比遠(yuǎn)離中心的像素對(duì)梯度的貢獻(xiàn)大得多。較大的高斯曲線通常有許多視點(diǎn),其中投影中心點(diǎn)附近的區(qū)域不在屏幕空間內(nèi),從而降低了平均梯度,使其難以分割或克隆。 這個(gè)問(wèn)題不能僅僅通過(guò)降低閾值來(lái)解決,因?yàn)樗赡芄膭?lì)在具有足夠點(diǎn)的區(qū)域中的增長(zhǎng)點(diǎn),如圖1的第三列所示,仍然在具有不足點(diǎn)的區(qū)域中留下模糊偽影。
In this paper, we propose to consider the calculation of the mean gradient magnitude of points from the perspective of pixels. During the computation of the average gradient magnitude for a Gaussian, we take into account the number of pixels covered by the Gaussian in each view by replacing the averaging across views with the weighted average across views by the number of covered pixels. The motivation behind this is to amplify the gradient contribution of large Gaussians while leaving the conditions for splitting or cloning small Gaussians unchanged, such that we can effectively grow points in the areas with large Gaussians. In the meanwhile, for small Gaussians, the weighted average only slightly impacts the final gradient since the variation of covered pixel numbers across different viewpoints is minimal. Therefore, the final number of points in areas with sufficient initial points would not change significantly to avoid unnecessary memory consumption and processing time, but importantly, points in areas with insufficient initial points can be effectively grown to reconstruct fine-grained details. As shown in the last column of Figure?1, our method effectively grows points in areas with insufficient initial points and renders high-fidelity images, while directly lowering the threshold in 3DGS to maintain a similar number of final points fails to render blurring-free results. Besides, we observe that “floaters” tend to appear near the camera, which are points that are not well aligned with the scene geometry and are not contributing to the final rendering. To this end, we propose to scale the gradient field in NDC space according to the depth value of the points, thereby suppressing the growth of “floaters” near the camera.
在本文中,我們建議考慮從像素的角度計(jì)算點(diǎn)的平均梯度幅度。在高斯平均梯度幅度的計(jì)算過(guò)程中,我們考慮到每個(gè)視圖中高斯覆蓋的像素?cái)?shù)量,通過(guò)將視圖間的平均值替換為覆蓋像素?cái)?shù)量的視圖間的加權(quán)平均值。這背后的動(dòng)機(jī)是放大大高斯的梯度貢獻(xiàn),同時(shí)保持分裂或克隆小高斯的條件不變,這樣我們就可以在具有大高斯的區(qū)域中有效地增長(zhǎng)點(diǎn)。同時(shí),對(duì)于小高斯,加權(quán)平均值僅輕微影響最終梯度,因?yàn)椴煌朁c(diǎn)之間覆蓋像素?cái)?shù)量的變化是最小的。 因此,具有足夠初始點(diǎn)的區(qū)域中的點(diǎn)的最終數(shù)目不會(huì)顯著改變以避免不必要的存儲(chǔ)器消耗和處理時(shí)間,但重要的是,具有不足夠初始點(diǎn)的區(qū)域中的點(diǎn)可以有效地增長(zhǎng)以重構(gòu)細(xì)粒度細(xì)節(jié)。如圖1的最后一列所示,我們的方法有效地在初始點(diǎn)不足的區(qū)域中增加點(diǎn)并渲染高保真圖像,而直接降低3DGS中的閾值以保持類似數(shù)量的最終點(diǎn)無(wú)法渲染無(wú)模糊的結(jié)果。此外,我們觀察到,“浮動(dòng)”往往出現(xiàn)在相機(jī)附近,這是點(diǎn),沒(méi)有很好地與場(chǎng)景幾何對(duì)齊,并沒(méi)有貢獻(xiàn)的最終渲染。為此,我們建議根據(jù)點(diǎn)的深度值來(lái)縮放NDC空間中的梯度場(chǎng),從而抑制相機(jī)附近的“漂浮物”的增長(zhǎng)。
To evaluate the effectiveness of our method, we conducted extensive experiments on the challenging Mip-NeRF 360?[3]?and Tanks?&?Temples?[22]?datasets. Experimental results validate that our method consistently outperforms the original 3DGS, both quantitatively (17.8% improvement in terms of LPIPS) and qualitatively. We also demonstrate that our method is more robust to the sparsity of the initial point cloud by manually discarding a certain proportion (up to 99%) of the initial SfM point clouds. In summary, we make the following contributions:
為了評(píng)估我們方法的有效性,我們對(duì)具有挑戰(zhàn)性的Mip-NeRF 360 [ 3]和Tanks & Temples [ 22]數(shù)據(jù)集進(jìn)行了廣泛的實(shí)驗(yàn)。實(shí)驗(yàn)結(jié)果驗(yàn)證了我們的方法始終優(yōu)于原來(lái)的3DGS,無(wú)論是定量(17.8%的LPIPS方面的改善)和定性。我們還證明了我們的方法是更強(qiáng)大的初始點(diǎn)云的稀疏手動(dòng)丟棄一定比例(高達(dá)99%)的初始SfM點(diǎn)云。總之,我們做出了以下貢獻(xiàn):
- –?
We analyzed the reason for the blurry artifacts in 3DGS and propose to optimize the number of points from the perspective of pixels, thereby enabling effectively growing points in areas with insufficient initial points.
- 我們分析了3DGS中模糊偽影的原因,并建議從像素的角度優(yōu)化點(diǎn)的數(shù)量,從而在初始點(diǎn)不足的區(qū)域中有效地增長(zhǎng)點(diǎn)。 - –?
We present a simple yet effective gradient scaling strategy to suppress the “floater” artifacts near the camera.
- 我們提出了一個(gè)簡(jiǎn)單而有效的梯度縮放策略,以抑制相機(jī)附近的“浮動(dòng)”偽影。 - –?
Our method achieves state-of-the-art performance on the challenging Mip-NeRF 360 and Tanks?&?Temples datasets and is more robust to the quality of initial points.
- 我們的方法在具有挑戰(zhàn)性的Mip-NeRF 360和Tanks & Temples數(shù)據(jù)集上實(shí)現(xiàn)了最先進(jìn)的性能,并且對(duì)初始點(diǎn)的質(zhì)量更具魯棒性。
2Related Work?2相關(guān)工作
Novel view synthesis.?The task of novel view synthesis refers to the process of generating images from perspectives different from the original input viewpoints. Recently, NeRF?[35]?has achieved impressive results in novel view synthesis by using neural networks to approximate the radiance field and employing volumetric rendering?[10,?27,?32,?33]?techniques for rendering. These approaches use implicit functions (such as MLPs?[35,?2,?3], feature grid-based representations?[6,?13,?29,?37,?46], or feature point-based representations?[21,?50]) to fit the scene’s radiance field and utilize a rendering formula for rendering. Due to the requirement to process each sampled point along a ray through an MLP to obtain its density and color, during the volume rendering, these works significantly suffer from low rendering speed. Subsequent methods?[15,?41,?42,?56,?58]?have refined a pre-trained NeRF into a sparse representation, thus achieving real-time rendering of NeRF. Although some advanced scene representations?[6,?7,?13,?29,?25,?37,?46,?2,?3,?4,?16]?have been proposed to improve one or more aspects of NeRF, such as training cost, rendering results, and rendering speed, 3D Gaussian Splatting (3DGS)?[21]?still draws increasing attention due to its explicit representation, high-fidelity results, and real-time rendering speed. Some subsequent works on 3DGS have further improved it from perspectives such as anti-aliasing?[59,?51], reducing memory usage?[12,?39,?38,?26,?36,?30], replacing spherical harmonics functions to enhance the modeling capability of high-frequency signals based on reflective surfaces?[54], and modeling dynamic scenes?[31,?55,?11,?49,?53,?20,?24,?17]. However, 3DGS still tends to exhibit blurring and needle-like artifacts in areas where the initial points are sparse. This is because 3DGS initializes the scale of each Gaussian based on the distance to neighboring Gaussians, making it challenging for the point cloud growth mechanism of 3DGS to generate sufficient points to accurately model these areas.
新穎的視圖合成。新視角合成的任務(wù)是指從不同于原始輸入視點(diǎn)的視角生成圖像的過(guò)程。最近,NeRF [ 35]通過(guò)使用神經(jīng)網(wǎng)絡(luò)來(lái)近似輻射場(chǎng)并采用體積渲染[ 10,27,32,33]技術(shù)進(jìn)行渲染,在新穎的視圖合成中取得了令人印象深刻的結(jié)果。這些方法使用隱式函數(shù)(例如MLP [35,2,3],基于特征網(wǎng)格的表示[6,13,29,37,46]或基于特征點(diǎn)的表示[21,50])來(lái)擬合場(chǎng)景的輻射場(chǎng)并利用渲染公式進(jìn)行渲染。由于體繪制過(guò)程中需要對(duì)沿著穿過(guò)MLP的射線上的每個(gè)采樣點(diǎn)進(jìn)行處理以獲得其密度和顏色,因此這些工作的繪制速度明顯較低。隨后的方法[15,41,42,56,58]將預(yù)先訓(xùn)練的NeRF細(xì)化為稀疏表示,從而實(shí)現(xiàn)NeRF的實(shí)時(shí)渲染。 盡管已經(jīng)提出了一些先進(jìn)的場(chǎng)景表示[6,7,13,29,25,37,46,2,3,4,16]來(lái)改善NeRF的一個(gè)或多個(gè)方面,例如訓(xùn)練成本,渲染結(jié)果和渲染速度,但3D高斯飛濺(3DGS)[ 21]仍然由于其顯式表示,高保真度結(jié)果,和實(shí)時(shí)渲染速度。3DGS的一些后續(xù)工作從抗混疊[ 59,51],減少內(nèi)存使用[ 12,39,38,26,36,30],替換球諧函數(shù)以增強(qiáng)基于反射表面的高頻信號(hào)的建模能力[ 54],以及建模動(dòng)態(tài)場(chǎng)景[ 31,55,11,49,53、20、24、17]。然而,3DGS仍然傾向于在初始點(diǎn)稀疏的區(qū)域中表現(xiàn)出模糊和針狀偽影。 這是因?yàn)?DGS基于與相鄰高斯的距離來(lái)調(diào)整每個(gè)高斯的尺度,使得3DGS的點(diǎn)云增長(zhǎng)機(jī)制難以生成足夠的點(diǎn)來(lái)準(zhǔn)確地對(duì)這些區(qū)域進(jìn)行建模。
Point-based radiance field.?Point-based representations (such as point clouds) commonly represent scenes using fixed-size, unstructured points, and are rendered by rasterization using GPUs?[5,?43,?45]. Although this is a simple and convenient solution to address topological changes, it often results in holes or outliers, leading to artifacts during rendering. To mitigate issues of discontinuity, researchers have proposed differentiable rendering based on points, utilizing points to model local domains?[14,?18,?28,?57,?50,?21,?48]. Among these approaches, ?[1,?23]?employs neural networks to represent point features and utilizes 2D CNNs for rendering. Point-NeRF?[50]?models 3D scenes using neural 3D points and presents strategies for pruning and growing points to repair common holes and outliers in point-based radiance fields. 3DGS?[21]?renders using a rasterization approach, which significantly speeds up the rendering process. It starts with a sparse point cloud initialization from SfM and fits each point’s influence area and color features using three-dimensional Gaussian distributions and spherical harmonics functions, respectively. To enhance the representational capability of this point-based spatial function, 3DGS introduces a density control mechanism based on the gradient of each point’s NDC (Normalized Device Coordinates) coordinates and opacity, managing the growth and elimination of the point cloud. Recent work?[8]?on 3DGS has improved the point cloud growth process by incorporating depth and normals to enhance the fitting ability in low-texture areas. In contrast, our Pixel-GS does not require any additional priors or information resources,?e.g.?depths and normals, and can directly grow points in areas with insufficient initializing points, reducing blurring and needle-like artifacts.
基于點(diǎn)的輻射場(chǎng)。基于點(diǎn)的表示(如點(diǎn)云)通常使用固定大小的非結(jié)構(gòu)化點(diǎn)表示場(chǎng)景,并使用GPU通過(guò)光柵化渲染[ 5,43,45]。雖然這是解決拓?fù)渥兓囊环N簡(jiǎn)單方便的解決方案,但它通常會(huì)導(dǎo)致空洞或離群值,從而導(dǎo)致渲染過(guò)程中出現(xiàn)偽影。為了減輕不連續(xù)性的問(wèn)題,研究人員提出了基于點(diǎn)的可微分渲染,利用點(diǎn)來(lái)建模局部域[ 14,18,28,57,50,21,48]。在這些方法中,[ 1,23]采用神經(jīng)網(wǎng)絡(luò)來(lái)表示點(diǎn)特征,并利用2D CNN進(jìn)行渲染。Point-NeRF [ 50]使用神經(jīng)3D點(diǎn)對(duì)3D場(chǎng)景進(jìn)行建模,并提出了修剪和生長(zhǎng)點(diǎn)的策略,以修復(fù)基于點(diǎn)的輻射場(chǎng)中的常見(jiàn)孔和離群值。3DGS [ 21]使用光柵化方法進(jìn)行渲染,這顯著加快了渲染過(guò)程。 它從SfM的稀疏點(diǎn)云初始化開(kāi)始,分別使用三維高斯分布和球諧函數(shù)擬合每個(gè)點(diǎn)的影響區(qū)域和顏色特征。為了增強(qiáng)這種基于點(diǎn)的空間函數(shù)的表示能力,3DGS引入了一種基于每個(gè)點(diǎn)的NDC(歸一化設(shè)備坐標(biāo))坐標(biāo)梯度和不透明度的密度控制機(jī)制,管理點(diǎn)云的增長(zhǎng)和消除。最近的3DGS工作[ 8]通過(guò)結(jié)合深度和法線來(lái)提高低紋理區(qū)域的擬合能力,從而改進(jìn)了點(diǎn)云生長(zhǎng)過(guò)程。相比之下,我們的Pixel-GS不需要任何額外的先驗(yàn)或信息資源,例如深度和法線,并且可以在初始化點(diǎn)不足的區(qū)域中直接生長(zhǎng)點(diǎn),減少模糊和針狀偽影。
Floater artifacts.?Most radiance field scene representation methods encounter floater artifacts, which predominantly appear near the camera and are more severe with sparse input views. Some papers?[44,?9]?address floaters by introducing depth priors. NeRFshop?[19]?proposes an editing method to remove floaters. Mip-NeRF 360?[3]?introduces a distortion loss by adding a prior that the density distribution along each ray is unimodal, effectively reducing floaters near the camera. NeRF in the Dark?[34]?suggests a variance loss of weights to decrease floaters. FreeNeRF?[52]?introduces a penalty term for the density of points close to the camera as a loss to reduce floaters near the camera. Most of these methods suppress floaters by incorporating priors through loss or editing methods, while “Floaters No More”?[40]?attempts to explore the fundamental reason for the occurrence of floaters and points out that floaters primarily arise because, for two regions of the same volume and shape, the number of pixels involved in the computation is proportional to the inverse square of each region’s distance from the camera. Under the same learning rate, areas close to the camera rapidly complete optimization and, after optimization, block the optimization of areas behind them, leading to an increased likelihood of floaters near the camera. Our method is inspired by this analysis and deals with floaters by a simple yet effective strategy,?i.e., scaling the gradient field by the distance to the camera.
浮尸藏物。大多數(shù)輻射場(chǎng)場(chǎng)景表示方法遇到漂浮物偽影,主要出現(xiàn)在相機(jī)附近,并且在稀疏輸入視圖中更嚴(yán)重。一些論文[ 44,9]通過(guò)引入深度先驗(yàn)來(lái)解決浮動(dòng)。NeRFshop [ 19]提出了一種編輯方法來(lái)刪除浮動(dòng)項(xiàng)。Mip-NeRF 360 [ 3]通過(guò)添加一個(gè)先驗(yàn)來(lái)引入失真損失,即沿沿著每條射線的密度分布是單峰的,從而有效地減少了相機(jī)附近的漂浮物。NeRF在黑暗中[ 34]建議方差損失的重量,以減少浮動(dòng)。FreeNeRF [ 52]引入了一個(gè)懲罰項(xiàng),用于接近攝像機(jī)的點(diǎn)的密度,作為減少攝像機(jī)附近漂浮物的損失。 這些方法中的大多數(shù)通過(guò)丟失或編輯方法合并先驗(yàn)來(lái)抑制浮動(dòng),而“Floaters No More”[ 40]試圖探索浮動(dòng)發(fā)生的根本原因,并指出浮動(dòng)主要是因?yàn)?,?duì)于相同體積和形狀的兩個(gè)區(qū)域,計(jì)算中涉及的像素?cái)?shù)量與每個(gè)區(qū)域到相機(jī)的距離的平方成反比。在相同的學(xué)習(xí)率下,靠近攝像頭的區(qū)域快速完成優(yōu)化,優(yōu)化后會(huì)阻礙后面區(qū)域的優(yōu)化,導(dǎo)致攝像頭附近出現(xiàn)飛蚊的可能性增加。我們的方法受到這種分析的啟發(fā),并通過(guò)一種簡(jiǎn)單而有效的策略來(lái)處理漂浮物,即,通過(guò)到相機(jī)的距離來(lái)縮放梯度場(chǎng)。
3Method?3方法
We first review the point cloud growth condition of “Adaptive density control” in 3DGS. Then, we propose a method for calculating the average gradient magnitude in the point cloud growth condition from a pixel perspective, significantly enhancing the reconstruction capability in areas with insufficient initial points. Finally, we show that by scaling the spatial gradient field that controls point growth, floaters near the input cameras can be significantly suppressed.
首先回顧了3DGS中“自適應(yīng)密度控制”的點(diǎn)云生長(zhǎng)條件。然后,我們提出了一種從像素角度計(jì)算點(diǎn)云增長(zhǎng)條件下的平均梯度幅值的方法,顯著增強(qiáng)了初始點(diǎn)不足區(qū)域的重建能力。最后,我們表明,通過(guò)縮放空間梯度場(chǎng),控制點(diǎn)的增長(zhǎng),輸入攝像機(jī)附近的浮動(dòng)可以顯著抑制。
3.1Preliminaries
In 3D Gaussian Splatting, Gaussian???under viewpoint???generates a 2D covariance matrix?Σ2???,?=(??,???,???,???,?), and the corresponding influence range radius?????can be determined by:
在3D高斯濺射中,視點(diǎn) ? 下的高斯 ? 生成2D協(xié)方差矩陣 Σ2???,?=(??,???,???,???,?) ,對(duì)應(yīng)的影響范圍半徑 ??? 可以由下式確定:
???=3×(??,?+??,?2+(??,?+??,?2)2?(??,????,??(??,?)2)), | (1) |
which covers 99%?of the probability in the Gaussian distribution. For Gaussian??, under viewpoint??, the coordinates in the camera coordinate system are?(??,??,?,??,??,?,??,??,?), and in the pixel coordinate system, they are?(??,??,?,??,??,?,??,??,?). With the image width being???pixels and the height???pixels, Gaussian???participates in the calculation for viewpoint???when it simultaneously satisfies the following six conditions:
它覆蓋了高斯分布中的99 % 概率。對(duì)于高斯 ? ,在視點(diǎn) ? 下,相機(jī)坐標(biāo)系中的坐標(biāo)是 (??,??,?,??,??,?,??,??,?) ,并且在像素坐標(biāo)系中,它們是 (??,??,?,??,??,?,??,??,?) 。在圖像寬度為 ? 像素且高度為 ? 像素的情況下,當(dāng)高斯 ? 同時(shí)滿足以下六個(gè)條件時(shí),高斯 ? 參與視點(diǎn) ? 的計(jì)算:
{???>0,??,??,?>0.2,?????0.5<??,??,?<???+??0.5,?????0.5<??,??,?<???+??0.5. | (2) |
In 3D Gaussian Splatting, whether a point is split or cloned is determined by the average magnitude of the gradient of the NDC coordinates for the viewpoints in which the Gaussian participates in the calculation. Specifically, for Gaussian???under viewpoint??, the NDC coordinate is?(?ndc,x?,?,?ndc,y?,?,?ndc,z?,?), and the loss under viewpoint???is???. During “Adaptive Density Control” every 100 iterations, Gaussian???participates in the calculation for????viewpoints. The threshold??pos?is set to 0.0002 in 3D Gaussian Splatting. When Gaussian satisfies
在3D高斯飛濺中,點(diǎn)是被分割還是克隆由高斯參與計(jì)算的視點(diǎn)的NDC坐標(biāo)的梯度的平均幅度確定。具體地,對(duì)于視點(diǎn) ? 下的高斯 ? ,NDC坐標(biāo)是 (?ndc,x?,?,?ndc,y?,?,?ndc,z?,?) ,并且視點(diǎn) ? 下的損失是 ?? 。在每100次迭代的“自適應(yīng)密度控制”期間,高斯 ? 參與 ?? 視點(diǎn)的計(jì)算。在3D高斯濺射中,閾值 ?pos 被設(shè)置為0.0002。當(dāng)Gaussian滿足
∑?=1??(?????ndc,x?,?)2+(?????ndc,y?,?)2??>?pos, | (3) |
it is transformed into two Gaussians.
它被轉(zhuǎn)換成兩個(gè)高斯。
∑‖???‖∑1>?pos∑p????(‖???‖)∑p?>?pos?depth??????
Figure 2:Pipeline of Pixel-GS.?p??represents the number of pixels participating in the calculation for the Gaussian from this viewpoint, and?????represents the gradient of the Gaussian’s NDC coordinates. We changed the condition for deciding whether a Gaussian should split or clone from the left to the right side.
圖2:Pixel-GS的流水線。 p? 表示從該視點(diǎn)參與高斯計(jì)算的像素?cái)?shù), ??? 表示高斯的NDC坐標(biāo)的梯度。我們改變了決定高斯是否應(yīng)該從左側(cè)分裂或克隆到右側(cè)的條件。
3.2Pixel-aware Gradient?3.2像素感知漸變
Although the current criteria used to decide whether a point should split or clone are sufficient for appropriately distributing Gaussians in most areas, artifacts tend to occur in regions where initial points are sparse. In 3DGS, the lengths of the three axes of the ellipsoid corresponding to Gaussian???are initialized using the values calculated by:
雖然用于決定點(diǎn)是否應(yīng)該分裂或克隆的當(dāng)前標(biāo)準(zhǔn)足以在大多數(shù)區(qū)域中適當(dāng)?shù)胤植几咚?,但偽影往往發(fā)生在初始點(diǎn)稀疏的區(qū)域中。在3DGS中,對(duì)應(yīng)于高斯 ? 的橢圓體的三個(gè)軸的長(zhǎng)度使用由下式計(jì)算的值來(lái)初始化:
??=(?1?)2+(?2?)2+(?3?)23, | (4) |
where??1?,??2?, and??3??are the distances to the three nearest points to Gaussian??, respectively. We observed that areas inadequately modeled often have very sparse initial SfM point clouds, leading to the initialization of Gaussians in these areas with ellipsoids having larger axis lengths. This results in their involvement in the computation from too many viewpoints. These Gaussians exhibit larger gradients only in viewpoints where the center point, after projection, is within or near the pixel space. This implies that, from these viewpoints, the large Gaussians cover a larger area in the pixel space after projection. This results in these points having a smaller average gradient size of their NDC coordinates during the “Adaptive Density Control” process every 100 iterations (Eq.?3), because they participate in the computation from too many viewpoints and only have significant gradient sizes in individual viewpoints. Consequently, it is difficult for these points to split or clone, leading to poor modeling in these areas.
其中 ?1? 、 ?2? 和 ?3? 分別是到高斯 ? 的三個(gè)最近點(diǎn)的距離。我們觀察到,未充分建模的區(qū)域通常具有非常稀疏的初始SfM點(diǎn)云,導(dǎo)致這些區(qū)域中的高斯初始化具有較大軸長(zhǎng)的橢球。這導(dǎo)致他們從太多的觀點(diǎn)參與計(jì)算。這些高斯曲線僅在投影后中心點(diǎn)位于像素空間內(nèi)或附近的視點(diǎn)中表現(xiàn)出較大的梯度。這意味著,從這些觀點(diǎn)來(lái)看,大高斯在投影后覆蓋像素空間中的更大區(qū)域。這導(dǎo)致這些點(diǎn)在每100次迭代的“自適應(yīng)密度控制”過(guò)程期間具有它們的NDC坐標(biāo)的較小平均梯度大?。ǖ仁?)。3),因?yàn)樗鼈儚奶嗟囊朁c(diǎn)參與計(jì)算,并且僅在各個(gè)視點(diǎn)中具有顯著的梯度大小。 因此,這些點(diǎn)很難分割或克隆,導(dǎo)致這些區(qū)域的建模效果不佳。
Below, we analyze through equations why the Gaussians in the previously mentioned sparser areas can only obtain larger NDC coordinate gradients from viewpoints with sufficient coverage, whereas for viewpoints that only affect the edge areas, the NDC coordinate gradients are smaller. The contribution of a pixel under viewpoint???to the NDC coordinate gradient of Gaussian???can be computed as:
下面,我們通過(guò)方程來(lái)分析為什么在前面提到的稀疏區(qū)域中的高斯只能從具有足夠覆蓋的視點(diǎn)獲得較大的NDC坐標(biāo)梯度,而對(duì)于僅影響邊緣區(qū)域的視點(diǎn),NDC坐標(biāo)梯度較小。視點(diǎn) ? 下的像素對(duì)高斯 ? 的NDC坐標(biāo)梯度的貢獻(xiàn)可以計(jì)算為:
(?????ndc,x?,??????ndc,y?,?)=∑?????=1???∑?=13(???????????×???????????,??????×(???,????????ndc,x?,????,????????ndc,y?,?)), | (5) |
where both????,????????ndc,x?,??and????,????????ndc,y?,??contain factor????, which can be calculated as:
其中 ???,????????ndc,x?,? 和 ???,????????ndc,y?,? 都包含因子 ??? ,其可以計(jì)算為:
??,??????=??×exp?(?12?(?????????,??,??????????,??,?)??(Σ2???,?)?1?(?????????,??,??????????,??,?)), | (6) |
where?????????represents the color of the??th channel of the current pixel, and?????represents the number of pixels involved in the calculation for Gaussian???under viewpoint??.???,???????as a function of the distance between the center of the projected Gaussian and the pixel center, exhibits exponential decay as the distance increases.
其中 ??????? 表示當(dāng)前像素的第 ? 通道的顏色,并且 ??? 表示在視點(diǎn) ? 下的高斯 ? 的計(jì)算中涉及的像素的數(shù)量。作為投影高斯的中心與像素中心之間的距離的函數(shù),隨著距離的增加呈現(xiàn)指數(shù)衰減。
This results in a few pixels close to the center position of the projected Gaussian making a primary contribution to the NDC coordinate gradient of this Gaussian. For large Gaussians, many viewpoints will only affect the edge areas, projecting onto pixels in these viewpoints, leading to the involvement of these viewpoints in the calculation but with very small NDC coordinate gradients. On the other hand, we observe that for these points, for a given viewpoint, when a large number of pixels are involved in the calculation after projection, these points often exhibit larger gradients of NDC coordinates in this viewpoint. This is easy to understand because, when a large number of pixels are involved in the calculation after projection, the projected center point tends to be within the pixel plane, and according to previous calculations, a few pixels near the center point are the main contributors to the gradient of the NDC coordinates.
這導(dǎo)致靠近投影高斯的中心位置的幾個(gè)像素對(duì)該高斯的NDC坐標(biāo)梯度做出主要貢獻(xiàn)。對(duì)于大高斯,許多視點(diǎn)將僅影響邊緣區(qū)域,投影到這些視點(diǎn)中的像素上,導(dǎo)致這些視點(diǎn)參與計(jì)算,但具有非常小的NDC坐標(biāo)梯度。另一方面,我們觀察到,對(duì)于這些點(diǎn),對(duì)于給定的視點(diǎn),當(dāng)投影后的計(jì)算中涉及大量像素時(shí),這些點(diǎn)在該視點(diǎn)中往往表現(xiàn)出較大的NDC坐標(biāo)梯度。這很容易理解,因?yàn)楫?dāng)投影后的計(jì)算中涉及大量像素時(shí),投影的中心點(diǎn)往往在像素平面內(nèi),根據(jù)之前的計(jì)算,中心點(diǎn)附近的幾個(gè)像素是NDC坐標(biāo)梯度的主要貢獻(xiàn)者。
To solve this problem, we assign a weight to the gradient size of the NDC coordinates for each Gaussian at every viewpoint, where the weight is the number of pixels involved in the computation for that Gaussian from the corresponding viewpoint. The advantage of this computational approach is that, for large Gaussians, the number of pixels involved in the calculations varies significantly across different viewpoints. According to previous derivations, these large Gaussians only receive larger gradients in viewpoints where a higher number of pixels are involved in the calculations. Weighting the magnitude of gradients by the number of participating pixels in an average manner can more rationally promote the splitting or cloning of these Gaussians. Additionally, for smaller Gaussians, the variation in the number of pixels involved across different viewpoints is minimal. The current averaging method does not produce a significant change compared to the original conditions and does not result in excessive additional memory consumption. The modified equation to decide whether a Gaussian undergoes split or clone is given by:
為了解決這個(gè)問(wèn)題,我們?yōu)槊總€(gè)視點(diǎn)處每個(gè)高斯的NDC坐標(biāo)的梯度大小分配一個(gè)權(quán)重,其中權(quán)重是從相應(yīng)視點(diǎn)計(jì)算該高斯時(shí)所涉及的像素?cái)?shù)。這種計(jì)算方法的優(yōu)點(diǎn)是,對(duì)于大高斯,計(jì)算中涉及的像素?cái)?shù)量在不同的視點(diǎn)之間變化很大。根據(jù)先前的推導(dǎo),這些大高斯僅在計(jì)算中涉及更多像素的視點(diǎn)中接收更大的梯度。以平均方式通過(guò)參與像素的數(shù)量來(lái)加權(quán)梯度的大小可以更合理地促進(jìn)這些高斯的分裂或克隆。此外,對(duì)于較小的高斯,跨不同視點(diǎn)所涉及的像素?cái)?shù)量的變化是最小的。 與原始條件相比,電流平均方法不會(huì)產(chǎn)生顯著變化,也不會(huì)導(dǎo)致額外的內(nèi)存消耗。 決定高斯是分裂還是克隆的修改后的方程由下式給出:
∑?=1?????×(?????ndc,x?,?)2+(?????ndc,y?,?)2∑?=1?????>?pos, | (7) |
where????is the number of viewpoints in which Gaussian???participates in the computation during the corresponding 100 iterations of “Adaptive Density Control”,?????is the number of pixels Gaussian???participates in at viewpoint??, and??????ndc,x?,??and??????ndc,y?,??respectively represent the gradients of Gaussian???in the???and???directions of NDC space at viewpoint??. The conditions under which a Gaussian participates in the computation for a pixel is given by:
其中 ?? 是在“自適應(yīng)密度控制”的對(duì)應(yīng)100次迭代期間高斯 ? 參與計(jì)算的視點(diǎn)的數(shù)目, ??? 是高斯 ? 在視點(diǎn) ? 處參與的像素的數(shù)目,而 ?????ndc,x?,? 和 ?????ndc,y?,? 分別表示在視點(diǎn) ? 處高斯 ? 在NDC空間的 ? 和 ? 方向上的梯度。高斯參與像素計(jì)算的條件由下式給出:
{(?????????,??,?)2+(?????????,??,?)2<???,∏?=1?(1???,??????)?10?4,??,???????1255, | (8) |
while the conditions under which a Gaussian participates in the computation from a viewpoint is given by Eq.?2.
而高斯參與計(jì)算的條件從一個(gè)觀點(diǎn)由方程給出。2.
3.3Scaled Gradient Field?3.3縮放梯度場(chǎng)
While using “Pixel-aware Gradient” to decide whether a point should split or clone (Eq.?7) can address artifacts in modeling areas with insufficient viewpoints and repetitive texture, we found that this condition for point cloud growth also exacerbates the presence of floaters near the camera. This is mainly because floaters near the camera occupy a large screen space and have significant gradients in their NDC coordinates, leading to an increasing number of floaters during the point cloud growth process. To address this issue, we scale the gradient field of the NDC coordinates.
當(dāng)使用“像素感知梯度”來(lái)決定一個(gè)點(diǎn)是否應(yīng)該分裂或克隆時(shí)(等式10),7)可以解決建模區(qū)域中視點(diǎn)不足和重復(fù)紋理的偽影,我們發(fā)現(xiàn)點(diǎn)云增長(zhǎng)的這種情況也加劇了相機(jī)附近的漂浮物的存在。這主要是因?yàn)閿z像機(jī)附近的漂浮物占據(jù)了很大的屏幕空間,并且在其NDC坐標(biāo)中具有顯著的梯度,導(dǎo)致在點(diǎn)云增長(zhǎng)過(guò)程中漂浮物的數(shù)量不斷增加。為了解決這個(gè)問(wèn)題,我們縮放NDC坐標(biāo)的梯度場(chǎng)。
Specifically, we use the radius to determine the scale of the scene, where the radius is calculated by:
具體來(lái)說(shuō),我們使用半徑來(lái)確定場(chǎng)景的比例,其中半徑通過(guò)以下公式計(jì)算:
radius=1.1?max??{‖????1??∑?=1????‖2}. | (9) |
In the training set, there are???viewpoints, with?????representing the coordinates of the??th viewpoint’s camera in the world coordinate system. We scale the gradient of the NDC coordinates for each Gaussian???under the??th viewpoint, with the scaling factor???(?,?)?being calculated by:
在訓(xùn)練集中,有 ? 個(gè)視點(diǎn),其中 ??? 表示世界坐標(biāo)系中第 ? 個(gè)視點(diǎn)的相機(jī)的坐標(biāo)。我們?cè)诘?? 個(gè)視點(diǎn)下縮放每個(gè)高斯 ? 的NDC坐標(biāo)的梯度,其中縮放因子 ??(?,?) 通過(guò)下式計(jì)算:
??(?,?)=clip?((??,??,??depth×radius)2,0,1), | (10) |
where???,??,??is the z-coordinate of Gaussian???in the camera coordinate system under the??th viewpoint, indicating the depth of this Gaussian from the viewpoint, and??depth?is a hyperparameter set manually.
其中, ??,??,? 是第 ? 視點(diǎn)下的相機(jī)坐標(biāo)系中的高斯 ? 的z坐標(biāo),指示該高斯距視點(diǎn)的深度,并且 ?depth 是手動(dòng)設(shè)置的超參數(shù)。
The primary inspiration for using squared terms as scaling coefficients in Eq.?10?comes from “Floaters No More”?[40]. This paper notes that floaters in NeRF?[35]?are mainly due to regions close to the camera occupying more pixels after projection, which leads to receiving more gradients during optimization. This results in these areas being optimized first, consequently obscuring the originally correct spatial positions from being optimized. The number of pixels occupied is inversely proportional to the square of the distance to the camera, hence the scaling of gradients by the squared distance.
在方程中使用平方項(xiàng)作為比例系數(shù)的主要靈感。10來(lái)自“不再漂浮”[ 40]。這篇論文指出,NeRF [ 35]中的浮動(dòng)主要是由于靠近相機(jī)的區(qū)域在投影后占據(jù)更多像素,這導(dǎo)致在優(yōu)化過(guò)程中接收更多梯度。這導(dǎo)致這些區(qū)域首先被優(yōu)化,從而使最初正確的空間位置無(wú)法被優(yōu)化。所占用的像素?cái)?shù)與到相機(jī)的距離的平方成反比,因此梯度的縮放比例為平方距離。
In summary, a major issue with pixel-based optimization is the imbalance in the spatial gradient field, leading to inconsistent optimization speeds across different areas. Adaptive scaling of the gradient field in different spatial regions can effectively address this problem. Therefore, the final calculation equation that determines whether a Gaussian undergoes a “split” or “clone” is given by:
總之,基于像素的優(yōu)化的一個(gè)主要問(wèn)題是空間梯度場(chǎng)的不平衡,導(dǎo)致不同區(qū)域的優(yōu)化速度不一致。梯度場(chǎng)在不同空間區(qū)域的自適應(yīng)縮放可以有效地解決這個(gè)問(wèn)題。因此,確定高斯是否經(jīng)歷“分裂”或“克隆”的最終計(jì)算方程由下式給出:
∑?=1?????×??(?,?)×(?????ndc,x?,?)2+(?????ndc,y?,?)2∑?=1?????>?pos. | (11) |
4Experiments?4實(shí)驗(yàn)
4.1Experimental Setup?4.1實(shí)驗(yàn)裝置
Datasets and benchmarks.?We evaluated our method across a total of 30 real-world scenes, including all scenes from Mip-NeRF 360 (9 scenes)?[3]?and Tanks?&?Temples (21 scenes)?[22], which are two most widely used datasets in the field of 3D reconstruction. They contain both bounded indoor scenes and unbounded outdoor scenes, allowing for a comprehensive evaluation of our method’s performance.
數(shù)據(jù)集和基準(zhǔn)。我們?cè)诳偣?0個(gè)真實(shí)世界場(chǎng)景中評(píng)估了我們的方法,包括來(lái)自Mip-NeRF 360(9個(gè)場(chǎng)景)[ 3]和Tanks & Temples(21個(gè)場(chǎng)景)[ 22]的所有場(chǎng)景,這是3D重建領(lǐng)域最廣泛使用的兩個(gè)數(shù)據(jù)集。它們包含有界的室內(nèi)場(chǎng)景和無(wú)界的室外場(chǎng)景,允許我們的方法的性能進(jìn)行全面的評(píng)估。
Evaluation metrics.?We assess the quality of reconstruction through PSNR↑, SSIM↑?[47], and LPIPS↓?[60]. Among them, PSNR reflects pixel-aware errors but does not quite correspond to human visual perception as it treats all errors as noise without distinguishing between structural and non-structural distortions. SSIM accounts for structural transformations in luminance, contrast, and structure, thus more closely mirroring human perception of image quality. LPIPS uses a pre-trained deep neural network to extract features and measures the high-level semantic differences between images, offering a similarity that is closer to human perceptual assessment compared to PSNR and SSIM.
評(píng)價(jià)指標(biāo)。我們通過(guò)PSNR ↑ 、SSIM ↑ [ 47]和LPIPS ↓ [ 60]評(píng)估重建質(zhì)量。其中,PSNR反映了像素感知誤差,但并不完全對(duì)應(yīng)于人類視覺(jué)感知,因?yàn)樗鼘⑺姓`差視為噪聲,而不區(qū)分結(jié)構(gòu)性和非結(jié)構(gòu)性失真。SSIM解釋了亮度、對(duì)比度和結(jié)構(gòu)的結(jié)構(gòu)轉(zhuǎn)換,從而更接近地反映了人類對(duì)圖像質(zhì)量的感知。LPIPS使用預(yù)訓(xùn)練的深度神經(jīng)網(wǎng)絡(luò)來(lái)提取特征并測(cè)量圖像之間的高級(jí)語(yǔ)義差異,與PSNR和SSIM相比,提供更接近人類感知評(píng)估的相似性。
Implementation details.?Our method only requires minor modifications to the original code of 3DGS, so it is compatible with almost all subsequent works on 3DGS. We use the default parameters of 3DGS to ensure consistency with the original implementation, including maintaining the same threshold????????for splitting and cloning points as in the original 3DGS. For all scenes, we set a constant??depth?value in Eq.?10?as 0.37 which is obtained through experimentations. All experiments were conducted on one RTX 3090 GPU with 24GB memory.
實(shí)施細(xì)節(jié)。我們的方法只需要對(duì)3DGS的原始代碼進(jìn)行微小的修改,因此它與幾乎所有的3DGS后續(xù)工作兼容。我們使用3DGS的默認(rèn)參數(shù)來(lái)確保與原始實(shí)現(xiàn)的一致性,包括保持與原始3DGS中相同的分割和克隆點(diǎn)的閾值 ?????? 。對(duì)于所有場(chǎng)景,我們?cè)诘仁街性O(shè)置恒定的 ?depth 值。10為0.37,通過(guò)實(shí)驗(yàn)得到。所有實(shí)驗(yàn)均在具有24 GB內(nèi)存的RTX 3090 GPU上進(jìn)行。
4.2Main Results?4.2主要結(jié)果
We select several representative methods for comparison, including the NeRF methods,?e.g., Plenoxels?[13], INGP?[37], and Mip-NeRF 360?[3], and the 3DGS method?[21]. We used the official implementation for all of the compared methods, and the same training/testing split as Mip-NeRF 360, selecting one out of every eight photos for testing.
我們選擇了幾種有代表性的方法進(jìn)行比較,包括NeRF方法,例如,Plenoxels [ 13],INGP [ 37]和Mip-NeRF 360 [ 3]以及3DGS方法[ 21]。我們對(duì)所有比較的方法都使用了官方實(shí)現(xiàn),并使用了與Mip-NeRF 360相同的訓(xùn)練/測(cè)試劃分,每八張照片中選擇一張進(jìn)行測(cè)試。
Quantitative results.?The quantitative results (PSNR, SSIM, and LPIPS) on the Mip-NeRF 360 and Tanks?&?Temples datasets are presented in Tables?1?and?2, respectively. We also provide the results of three challenging scenes for each dataset for more detailed information. Here, we retrained the 3DGS (noted as 3DGS?) as doing so yields a better performance than the original 3DGS (noted as 3DGS). We can see that our method consistently outperforms all the other methods, especially in terms of the LPIPS metric, while maintaining real-time rendering speed (to be discussed later). Besides, compared to 3DGS, our method shows significant improvements in the three challenging scenes in both datasets and achieves better performance over the entire dataset. It quantitatively validates the effectiveness of our method in improving the quality of reconstruction.
定量結(jié)果。Mip-NeRF 360和Tanks & Temples數(shù)據(jù)集的定量結(jié)果(PSNR、SSIM和LPIPS)分別見(jiàn)表1和表2。我們還為每個(gè)數(shù)據(jù)集提供了三個(gè)具有挑戰(zhàn)性的場(chǎng)景的結(jié)果,以獲得更詳細(xì)的信息。在這里,我們重新訓(xùn)練了3DGS(標(biāo)記為3DGS ? ),因?yàn)檫@樣做會(huì)產(chǎn)生比原始3DGS(標(biāo)記為3DGS)更好的性能。我們可以看到,我們的方法始終優(yōu)于所有其他方法,特別是在LPIPS指標(biāo)方面,同時(shí)保持實(shí)時(shí)渲染速度(稍后討論)。此外,與3DGS相比,我們的方法在兩個(gè)數(shù)據(jù)集中的三個(gè)具有挑戰(zhàn)性的場(chǎng)景中表現(xiàn)出顯著的改進(jìn),并在整個(gè)數(shù)據(jù)集上實(shí)現(xiàn)了更好的性能。定量驗(yàn)證了該方法在提高重建質(zhì)量方面的有效性。
Table 1:Quantitative results on the Mip-NeRF 360 dataset.?Cells are highlighted as follows:?best,?second best, and?third best. We also show the results of three challenging scenes. 3DGS??is our retrained 3DGS model with better performance.
表1:Mip-NeRF 360數(shù)據(jù)集的定量結(jié)果。單元格突出顯示如下:最佳、第二佳和第三佳。我們還展示了三個(gè)具有挑戰(zhàn)性的場(chǎng)景的結(jié)果。3DGS ? 是我們重新訓(xùn)練的3DGS模型,具有更好的性能。
Mip-NeRF 360 (all scenes) Mip-NeRF 360(所有場(chǎng)景) |
Flowers?花 | Bicycle?自行車 | Stump?殘端 | |||||||||
Method?方法 | PSNR↑ | SSIM↑ | LPIPS↓ | PSNR↑ | SSIM↑ | LPIPS↓ | PSNR↑ | SSIM↑ | LPIPS↓ | PSNR↑ | SSIM↑ | LPIPS↓ |
Plenoxels?[13]? | 23.08 | 0.625 | 0.463 | 20.10 | 0.431 | 0.521 | 21.91 | 0.496 | 0.506 | 20.66 | 0.523 | 0.503 |
INGP-Base?[37]?INGP-基礎(chǔ)[ 37]? | 25.30 | 0.671 | 0.371 | 20.35 | 0.450 | 0.481 | 22.19 | 0.491 | 0.487 | 23.63 | 0.574 | 0.450 |
INGP-Big?[37]?[ 37]第三十七話? | 25.59 | 0.699 | 0.331 | 20.65 | 0.486 | 0.441 | 22.17 | 0.512 | 0.446 | 23.47 | 0.594 | 0.421 |
Mip-NeRF 360?[3]? | 27.69 | 0.792 | 0.237 | 21.73 | 0.583 | 0.344 | 24.37 | 0.685 | 0.301 | 26.40 | 0.744 | 0.261 |
3DGS?[21]? | 27.21 | 0.815 | 0.214 | 21.52 | 0.605 | 0.336 | 25.25 | 0.771 | 0.205 | 26.55 | 0.775 | 0.210 |
3DGS??[21]? | 27.71 | 0.826 | 0.202 | 21.89 | 0.622 | 0.328 | 25.63 | 0.778 | 0.204 | 26.90 | 0.785 | 0.207 |
Pixel-GS (Ours)?Pixel-GS(我們的) | 27.88 | 0.834 | 0.176 | 21.94 | 0.652 | 0.251 | 25.74 | 0.793 | 0.173 | 27.11 | 0.796 | 0.181 |
Table 2:Quantitative results on the Tanks?&?Temples dataset.?We also show the results of three challenging scenes.???indicates retraining for better performance.
表2:Tanks & Temples數(shù)據(jù)集的定量結(jié)果。我們還展示了三個(gè)具有挑戰(zhàn)性的場(chǎng)景的結(jié)果。 ? 表示重新培訓(xùn)以獲得更好的性能。
Tanks?&?Temples (all scenes) 坦克和寺廟(所有場(chǎng)景) |
Train?火車 | Barn?谷倉(cāng) | Caterpillar | |||||||||
Method?方法 | PSNR↑ | SSIM↑ | LPIPS↓ | PSNR↑ | SSIM↑ | LPIPS↓ | PSNR↑ | SSIM↑ | LPIPS↓ | PSNR↑ | SSIM↑ | LPIPS↓ |
3DGS??[21]? | 24.19 | 0.844 | 0.194 | 22.02 | 0.812 | 0.209 | 28.46 | 0.869 | 0.182 | 23.79 | 0.809 | 0.211 |
Pixel-GS (Ours)?Pixel-GS(我們的) | 24.38 | 0.850 | 0.178 | 22.13 | 0.823 | 0.180 | 29.00 | 0.888 | 0.144 | 24.08 | 0.832 | 0.173 |
Qualitative results.?In Figures?1?and?3, we showcase the comparisons between our method and 3DGS?. We can see our approach significantly reduces the blurring and needle-like artifacts,?e.g.?the region of the flowers in the second row and the blow-up region in the last row, compared against the 3DGS?. These regions are initialized with insufficient points from SfM, and our method effectively grows points in these areas, leading to a more accurate and detailed reconstruction. Please refer to the supplemental materials for the point cloud comparison. These examples clearly validate that our method is more robust to the quality of initialization point clouds and can reconstruct high-fidelity details.
定性結(jié)果。在圖1和圖3中,我們展示了我們的方法和3DGS ? 之間的比較。我們可以看到,與3DGS ? 相比,我們的方法顯著減少了模糊和針狀偽影,例如第二行中的花朵區(qū)域和最后一行中的放大區(qū)域。這些區(qū)域是用SfM中的不足點(diǎn)初始化的,我們的方法有效地在這些區(qū)域中增加點(diǎn),從而實(shí)現(xiàn)更準(zhǔn)確和更詳細(xì)的重建。點(diǎn)云比較請(qǐng)參考補(bǔ)充資料。這些例子清楚地驗(yàn)證了我們的方法是更強(qiáng)大的初始化點(diǎn)云的質(zhì)量,可以重建高保真的細(xì)節(jié)。
4.3Ablation Studies?4.3消融研究
To evaluate the effectiveness of individual components of our method,?i.e.?the pixel-aware gradient and the scaled gradient field, we conducted ablation studies on the Mip-NeRF 360 and Tanks?&?Temples datasets. The quantitative and qualitative results are presented in Table?3?and Figure?4, respectively. We can see that both the pixel-aware gradient and the scaled gradient field contribute to the improvement of the reconstruction quality in the Mip-NeRF 360 dataset. However, the pixel-aware gradient strategy reduces the reconstruction quality in the Tanks?&?Temples dataset. This is mainly due to floaters that tend to appear near the camera in some large scenes in Tanks?&?Temples and the pixel-aware gradient encourages more Gaussians, as shown in column (b) of Figure?4. Notably, this phenomenon also exists for the 3DGS when the threshold??pos?is lowered, which also promots more Gaussians, as shown in Table?4. But importantly, the combination of both proposed strategies achieves the best performance in the Tanks & Temples dataset, as shown in Table?3, since the scaled gradient field can suppress the growth of floaters near the camera. In summary, the ablation studies demonstrate the effectiveness of our proposed individual components and the necessity of combining them to achieve the best performance.
為了評(píng)估我們的方法的各個(gè)組成部分的有效性,即像素感知梯度和縮放梯度場(chǎng),我們對(duì)Mip-NeRF 360和Tanks & Temples數(shù)據(jù)集進(jìn)行了消融研究。定量和定性結(jié)果分別見(jiàn)表3和圖4。我們可以看到,像素感知梯度和縮放梯度場(chǎng)都有助于提高M(jìn)ip-NeRF 360數(shù)據(jù)集中的重建質(zhì)量。然而,像素感知梯度策略降低了Tanks & Temples數(shù)據(jù)集的重建質(zhì)量。這主要是由于在坦克和寺廟中的一些大型場(chǎng)景中,漂浮物往往出現(xiàn)在相機(jī)附近,并且像素感知梯度鼓勵(lì)更多的高斯,如圖4的列(B)所示。值得注意的是,當(dāng)閾值 ?pos 降低時(shí),3DGS也存在這種現(xiàn)象,這也促進(jìn)了更多的高斯,如表4所示。 但重要的是,這兩種策略的組合在Tanks & Temples數(shù)據(jù)集中實(shí)現(xiàn)了最佳性能,如表3所示,因?yàn)榭s放的梯度場(chǎng)可以抑制相機(jī)附近漂浮物的增長(zhǎng)??傊谘芯孔C明了我們提出的單個(gè)組件的有效性以及將它們組合以實(shí)現(xiàn)最佳性能的必要性。
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
(a) Ground Truth?(a)地面實(shí)況 | (b) Pixel-GS (Ours)?(b)Pixel-GS(我們的) | (c) 3DGS??[21] (c)3DGS ? [ 21] |
Figure 3:?Qualitative comparison between Pixel-GS (Ours) and 3DGS?.?The first three scenes are from the Mip-NeRF 360 dataset (Bicycle, Flowers, and Treehill), while the last four scenes are from the Tanks?&?Temples dataset (Barn, Caterpillar, Playground, and Train). The blow-up regions or arrows highlight the parts with distinct differences in quality. 3DGS??is our retrained 3DGS model with better performance.
圖3:Pixel-GS(我們的)和3DGS ? 之間的定性比較。前三個(gè)場(chǎng)景來(lái)自Mip-NeRF 360數(shù)據(jù)集(自行車,鮮花和樹(shù)丘),而最后四個(gè)場(chǎng)景來(lái)自坦克和寺廟數(shù)據(jù)集(谷倉(cāng),卡特彼勒,游樂(lè)場(chǎng)和火車)。放大區(qū)域或箭頭突出顯示具有明顯質(zhì)量差異的零件。3DGS ? 是我們重新訓(xùn)練的3DGS模型,具有更好的性能。
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
(a) 3DGS?(a)3DGS?? | (b) Pixel-aware Gradient?(b)像素感知漸變 | (c) Scaled Gradient Field (c)縮放梯度場(chǎng) |
(d) Complete Model?(d)完整模型 |
Figure 4:Qualitative results of the ablation study.?The PSNR↑?results are shown on the corresponding images.
圖4:消融研究的定性結(jié)果。PSNR ↑ 結(jié)果顯示在相應(yīng)的圖像上。 Table 3:Ablation study.?The metrics are derived from the average values across all scenes of the Mip-NeRF 360 and Tanks?&?Temples datasets, respectively.
表3:消融研究。這些指標(biāo)分別來(lái)自Mip-NeRF 360和Tanks & Temples數(shù)據(jù)集所有場(chǎng)景的平均值。
Mip-NeRF 360 | Tanks?&?Temples?坦克和寺廟 | |||||
Method?方法 | PSNR↑ | SSIM↑ | LPIPS↓ | PSNR↑ | SSIM↑ | LPIPS↓ |
3DGS??[21]? | 27.71 | 0.826 | 0.202 | 24.23 | 0.844 | 0.194 |
Pixel-aware Gradient?像素感知漸變 | 27.74 | 0.833 | 0.176 | 21.80 | 0.791 | 0.239 |
Scaled Gradient Field?縮放梯度場(chǎng) | 27.72 | 0.825 | 0.202 | 24.34 | 0.843 | 0.198 |
Complete Model?完整模型 | 27.88 | 0.834 | 0.176 | 24.38 | 0.850 | 0.178 |
Table 4:Impact of lowering??pos.?We show the corresponding quality and efficiency metrics when lowering the threshold??pos?of point growth for 3DGS??and our method.
表4:降低 ?pos 的影響。當(dāng)降低3DGS ? 和我們的方法的點(diǎn)增長(zhǎng)的閾值 ?pos 時(shí),我們顯示了相應(yīng)的質(zhì)量和效率指標(biāo)。
Dataset?數(shù)據(jù)集 | Strategy?戰(zhàn)略 | PSNR↑ | SSIM↑ | LPIPS↓ | Train?火車 | FPS | Memory?存儲(chǔ)器 |
Mip-NeRF 360 | 3DGS??(?pos=2???4) | 27.71 | 0.826 | 0.202 | 25m40s | 126 | 0.72GB |
3DGS??(?pos=1.28???4)? | 27.83 | 0.833 | 0.181 | 43m23s | 90 | 1.4GB | |
Ours?(?pos=2???4)?我們的 (?pos=2???4) | 27.88 | 0.834 | 0.176 | 41m25s | 89 | 1.2GB | |
Tanks?&?Temples?坦克和寺廟 | 3DGS??(?pos=2???4) | 24.19 | 0.844 | 0.194 | 16m3s?16 m3秒 | 135 | 0.41GB |
3DGS??(?pos=1???4)? | 23.86 | 0.842 | 0.187 | 27m59s | 87 | 0.94GB | |
Ours?(?pos=2???4)?熊 (?pos=2???4) | 24.38 | 0.850 | 0.178 | 26m36s?26米36秒 | 92 | 0.84GB |
|
|
|
Figure 5:?Reconstruction quality (PSNR↑, SSIM↑, and LPIPS↓)?vs. Dropping rate of initializing points.?Here, the dropping rate refers to the percentage of points dropped from the original SfM point clouds for initializing Gaussians. The results are obtained on the Mip-NeRF 360 dataset.
圖5:重建質(zhì)量(PSNR ↑ 、SSIM ↑ 和LPIPS ↓ )與初始化點(diǎn)丟棄率的關(guān)系。這里,丟棄率是指從用于初始化高斯的原始SfM點(diǎn)云丟棄的點(diǎn)的百分比。結(jié)果是在Mip-NeRF 360數(shù)據(jù)集上獲得的。
4.4Analysis
The impact of lowering the threshold??pos.?As the blurring and needle-like artifacts in 3DGS mainly occur in areas with insufficient initializing points, one straightforward solution would be to lower the threshold??pos?to encourage the growth of more points. To verify this, we experimented on the Mip-NeRF 360 and Tanks?&?Temples datasets by lowering the threshold??pos?from?2???4?to?1.28???4?for 3DGS to make the final optimized number of points comparable to ours. From Table?4, we can see that lowering the threshold??pos?for 3DGS significantly increases the memory consumption and decreases the rendering speed, while still falling behind ours in terms of reconstruction quality. As can be seen from the qualitative comparison in Figure?1, this is because the point cloud growth mechanism of 3DGS struggles to generate points in areas with insufficient initializing points and only yields unnecessary points in areas where the initial SfM point cloud is already dense. In contrast, although our method also results in additional memory consumption, our method’s point cloud distribution is more uniform, enabling effectively growing points in areas with insufficient initializing points, thereby leading to a more accurate and detailed reconstruction while still maintaining real-time rendering speed.
降低閾值 ?pos 的影響。由于3DGS中的模糊和針狀偽影主要發(fā)生在初始化點(diǎn)不足的區(qū)域中,因此一種直接的解決方案是降低閾值 ?pos 以鼓勵(lì)更多點(diǎn)的增長(zhǎng)。為了驗(yàn)證這一點(diǎn),我們?cè)贛ip-NeRF 360和Tanks & Temples數(shù)據(jù)集上進(jìn)行了實(shí)驗(yàn),將3DGS的閾值 ?pos 從 2???4 降低到 1.28???4 ,以使最終優(yōu)化的點(diǎn)數(shù)與我們的點(diǎn)數(shù)相當(dāng)。從表4中,我們可以看到,降低3DGS的閾值 ?pos 會(huì)顯著增加內(nèi)存消耗并降低渲染速度,但在重建質(zhì)量方面仍然落后于我們。從圖1中的定性比較可以看出,這是因?yàn)?DGS的點(diǎn)云增長(zhǎng)機(jī)制難以在初始化點(diǎn)不足的區(qū)域中生成點(diǎn),并且僅在初始SfM點(diǎn)云已經(jīng)密集的區(qū)域中生成不必要的點(diǎn)。 相比之下,雖然我們的方法也會(huì)導(dǎo)致額外的內(nèi)存消耗,但我們的方法的點(diǎn)云分布更均勻,能夠在初始化點(diǎn)不足的區(qū)域有效地增長(zhǎng)點(diǎn),從而在保持實(shí)時(shí)渲染速度的同時(shí)實(shí)現(xiàn)更準(zhǔn)確和詳細(xì)的重建。
Robustness to the quality of initialization point clouds.?Finally, SfM algorithms often fail to produce high-quality point clouds in some areas,?e.g., too few observations, repetitive textures, or low textures. The point cloud produced by SfM is usually the necessary input for 3DGS and our method. Therefore, we explored the robustness of our method to the quality of initialization point clouds by randomly dropping points from the SfM point clouds used for initialization and compared the results with that of 3DGS. Figure?5?shows how the reconstruction quality varies with the proportion of dropped points. We can see that our method consistently outperforms 3DGS in terms of all the metrics (PSNR, SSIM, and LPIPS). And more importantly, our method is less affected by the dropping rate than 3DGS. Notably, even though the?99%?initializing points have been dropped, the reconstruction quality of our method still surpasses that of 3DGS initialized with complete SfM point clouds, in terms of LPIPS. These results demonstrate the robustness of our method to the quality of initialization point clouds, which is crucial for real-world applications.
對(duì)初始化點(diǎn)云質(zhì)量的魯棒性。最后,SfM算法通常無(wú)法在某些區(qū)域產(chǎn)生高質(zhì)量的點(diǎn)云,例如,太少的觀察,重復(fù)的紋理,或低紋理。由SfM產(chǎn)生的點(diǎn)云通常是3DGS和我們的方法的必要輸入。因此,我們探討了我們的方法的魯棒性的初始化點(diǎn)云的質(zhì)量隨機(jī)下降點(diǎn)從SfM點(diǎn)云用于初始化,并比較結(jié)果與3DGS。圖5顯示了重建質(zhì)量如何隨丟棄點(diǎn)的比例而變化。我們可以看到,我們的方法在所有指標(biāo)(PSNR,SSIM和LPIPS)方面始終優(yōu)于3DGS。更重要的是,我們的方法比3DGS受下降率的影響更小。 值得注意的是,即使 99% 初始化點(diǎn)已經(jīng)被丟棄,我們的方法的重建質(zhì)量仍然超過(guò)了用完整的SfM點(diǎn)云初始化的3DGS的重建質(zhì)量,就LPIPS而言。這些結(jié)果表明,我們的方法的魯棒性的初始化點(diǎn)云的質(zhì)量,這是至關(guān)重要的現(xiàn)實(shí)世界中的應(yīng)用。文章來(lái)源:http://www.zghlxwxcb.cn/news/detail-856538.html
5Conclusion?5結(jié)論
The blurring and needle-like artifacts in 3DGS are mainly attributed to its inability to grow points in areas with insufficient initializing points. To address this issue, we propose Pixel-GS, which considers the number of pixels covered by a Gaussian in each view to dynamically weigh the gradient of each view during the computation of the growth condition. This strategy effectively grows Gaussians with large scales, which are more likely to exist in areas with insufficient initializing points, such that our method can adaptively grow points in these areas while avoiding unnecessary growth in areas with enough points. We also introduce a simple yet effective strategy to deal with floaters,?i.e., scaling the gradient field by the distance to the camera. Extensive experiments demonstrate that our method significantly reduces blurring and needle-like artifacts and effectively suppresses floaters, achieving state-of-the-art performance in terms of rendering quality. Meanwhile, although our method consumes slightly more memory consumption, the increased points are mainly distributed in areas with insufficient initializing points, which are necessary for high-quality reconstruction, and our method still maintains real-time rendering speed. Finally, our method is more robust to the number of initialization points, thanks to our effective pixel-aware gradient and scaled gradient field.
3DGS中的模糊和針狀偽影主要?dú)w因于其無(wú)法在初始化點(diǎn)不足的區(qū)域中生長(zhǎng)點(diǎn)。為了解決這個(gè)問(wèn)題,我們提出了Pixel-GS,它認(rèn)為在每個(gè)視圖中的高斯覆蓋的像素的數(shù)量動(dòng)態(tài)加權(quán)的梯度的每個(gè)視圖在計(jì)算的增長(zhǎng)條件。該策略有效地增長(zhǎng)了大尺度的高斯,這些高斯更有可能存在于初始化點(diǎn)不足的區(qū)域中,因此我們的方法可以自適應(yīng)地在這些區(qū)域中增長(zhǎng)點(diǎn),同時(shí)避免在有足夠點(diǎn)的區(qū)域中不必要的增長(zhǎng)。我們還介紹了一個(gè)簡(jiǎn)單而有效的策略來(lái)處理飛蚊癥,即,通過(guò)到相機(jī)的距離來(lái)縮放梯度場(chǎng)。大量的實(shí)驗(yàn)表明,我們的方法顯著減少模糊和針狀文物,并有效地抑制浮動(dòng),實(shí)現(xiàn)最先進(jìn)的性能方面的渲染質(zhì)量。 同時(shí),雖然我們的方法消耗了更多的內(nèi)存消耗,但增加的點(diǎn)主要分布在初始化點(diǎn)不足的區(qū)域,這是高質(zhì)量重建所必需的,我們的方法仍然保持實(shí)時(shí)渲染速度。最后,由于我們有效的像素感知梯度和縮放梯度場(chǎng),我們的方法對(duì)初始化點(diǎn)的數(shù)量更具魯棒性。文章來(lái)源地址http://www.zghlxwxcb.cn/news/detail-856538.html
到了這里,關(guān)于Pixel-GS:用于3D高斯濺射的具有像素感知梯度的密度控制的文章就介紹完了。如果您還想了解更多內(nèi)容,請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!