專屬領(lǐng)域論文訂閱
VX關(guān)注{曉理紫},每日更新論文,如感興趣,請(qǐng)轉(zhuǎn)發(fā)給有需要的同學(xué),謝謝支持
VX關(guān)注曉理紫,并留下郵箱可免費(fèi)獲取每日論文推送服務(wù)
分類:
- 大語言模型LLM
- 視覺模型VLM
- 擴(kuò)散模型
- 視覺導(dǎo)航
- 具身智能,機(jī)器人
- 強(qiáng)化學(xué)習(xí)
- 開放詞匯,檢測(cè)分割
曉理紫今日論文推送
== 具身智能,機(jī)器人==
標(biāo)題: Augmented Reality User Interface for Command, Control, and Supervision of Large Multi-Agent Teams
作者: Frank Regal, Chris Suarez, Fabian Parra
中文摘要: 多智能體人機(jī)團(tuán)隊(duì)通過開發(fā)和結(jié)合人類和機(jī)器人的優(yōu)勢(shì),可以更有效地收集各種環(huán)境的信息。在國防、搜救、第一反應(yīng)等行業(yè),異構(gòu)的人類機(jī)器人團(tuán)隊(duì)有望通過將人類從未知和潛在危險(xiǎn)的情況中轉(zhuǎn)移出來,加快數(shù)據(jù)收集,提高團(tuán)隊(duì)安全性。這項(xiàng)工作建立在AugRE之上,AugRE是一個(gè)基于增強(qiáng)現(xiàn)實(shí)(AR)的可擴(kuò)展人機(jī)團(tuán)隊(duì)框架。它使用戶能夠本地化并與50多個(gè)自主代理進(jìn)行通信。通過我們的努力,用戶能夠指揮、控制和監(jiān)督大型團(tuán)隊(duì)中的代理,包括視線和非視線,而無需事先修改環(huán)境,也無需用戶在現(xiàn)場(chǎng)使用典型的硬件(即操縱桿、鍵盤、筆記本電腦、平板電腦等)。所展示的工作表明,早期跡象表明,將這些基于AR HMD的用戶交互模式結(jié)合起來進(jìn)行指揮、控制和監(jiān)督將有助于提高人機(jī)團(tuán)隊(duì)的協(xié)作性、穩(wěn)健性和信任度
摘要: Multi-agent human-robot teaming allows for the potential to gather information about various environments more efficiently by exploiting and combining the strengths of humans and robots. In industries like defense, search and rescue, first-response, and others alike, heterogeneous human-robot teams show promise to accelerate data collection and improve team safety by removing humans from unknown and potentially hazardous situations. This work builds upon AugRE, an Augmented Reality (AR) based scalable human-robot teaming framework. It enables users to localize and communicate with 50+ autonomous agents. Through our efforts, users are able to command, control, and supervise agents in large teams, both line-of-sight and non-line-of-sight, without the need to modify the environment prior and without requiring users to use typical hardware (i.e. joysticks, keyboards, laptops, tablets, etc.) in the field. The demonstrated work shows early indications that combining these AR-HMD-based user interaction modalities for command, control, and supervision will help improve human-robot team collaboration, robustness, and trust.
[Downlink:]http://arxiv.org/abs/2401.05665v1
[Project:]https://sites.google.com/view/xr-robotics-iros2023/home?authuser=0|
標(biāo)題: Unified Learning from Demonstrations, Corrections, and Preferences during Physical Human-Robot Interaction
作者: Shaunak A. Mehta, Dylan P. Losey
中文摘要: 人類可以利用物理交互來教授機(jī)器人手臂。這種物理交互有多種形式,具體取決于任務(wù)、用戶以及機(jī)器人迄今為止所學(xué)的知識(shí)?,F(xiàn)有技術(shù)的方法側(cè)重于從單一模態(tài)學(xué)習(xí),或者通過假設(shè)機(jī)器人具有關(guān)于人類預(yù)期任務(wù)的先驗(yàn)信息來組合多種交互類型。相比之下,在本文中,我們引入了一種算法形式主義,它將從演示、更正和偏好中學(xué)習(xí)結(jié)合起來。我們的方法不對(duì)人類想要教機(jī)器人的任務(wù)進(jìn)行假設(shè);相反,我們通過將人類的輸入與附近的替代品進(jìn)行比較,從頭開始學(xué)習(xí)獎(jiǎng)勵(lì)模型。我們首先推導(dǎo)出一個(gè)損失函數(shù),該函數(shù)訓(xùn)練一組獎(jiǎng)勵(lì)模型,以匹配人類的演示、校正和偏好。反饋的類型和順序取決于人類老師:我們使機(jī)器人能夠被動(dòng)或主動(dòng)地收集反饋。然后,我們應(yīng)用約束優(yōu)化將我們學(xué)到的獎(jiǎng)勵(lì)轉(zhuǎn)化為所需的機(jī)器人軌跡。通過模擬和用戶研究,我們證明了我們提出的方法比現(xiàn)有的基線更準(zhǔn)確地從物理人類交互中學(xué)習(xí)操縱任務(wù),特別是當(dāng)機(jī)器人面臨新的或意想不到的目標(biāo)時(shí)。我們的用戶研究視頻可在以下網(wǎng)站獲?。篽ttps://youtu.be/FSUJsTYvEKU
摘要: Humans can leverage physical interaction to teach robot arms. This physical interaction takes multiple forms depending on the task, the user, and what the robot has learned so far. State-of-the-art approaches focus on learning from a single modality, or combine multiple interaction types by assuming that the robot has prior information about the human’s intended task. By contrast, in this paper we introduce an algorithmic formalism that unites learning from demonstrations, corrections, and preferences. Our approach makes no assumptions about the tasks the human wants to teach the robot; instead, we learn a reward model from scratch by comparing the human’s inputs to nearby alternatives. We first derive a loss function that trains an ensemble of reward models to match the human’s demonstrations, corrections, and preferences. The type and order of feedback is up to the human teacher: we enable the robot to collect this feedback passively or actively. We then apply constrained optimization to convert our learned reward into a desired robot trajectory. Through simulations and a user study we demonstrate that our proposed approach more accurately learns manipulation tasks from physical human interaction than existing baselines, particularly when the robot is faced with new or unexpected objectives. Videos of our user study are available at: https://youtu.be/FSUJsTYvEKU
[Downlink:]http://arxiv.org/abs/2207.03395v2
[Project:]https://youtu.be/FSUJsTYvEKU|
標(biāo)題: Transferability of HRI Research: Potential and Challenges
作者: Wafa Johal
中文摘要: 隨著機(jī)器人技術(shù)和人工智能的發(fā)展,機(jī)器人技術(shù)的應(yīng)用正在蓬勃發(fā)展。人機(jī)交互(HRI)是機(jī)器人學(xué)的一個(gè)重要領(lǐng)域,因?yàn)樗试S機(jī)器人更接近人類(與人類或?yàn)槿祟悾┕ぷ鳌RI研究成功的一個(gè)關(guān)鍵因素是可轉(zhuǎn)移性,即研究成果被行業(yè)采用并為社會(huì)提供利益的能力。在本文中,我們探討了可轉(zhuǎn)移性在HRI研究中的潛力和挑戰(zhàn)。首先,我們檢查了HRI研究的現(xiàn)狀,并確定了可能導(dǎo)致成功結(jié)果的各種類型的貢獻(xiàn)。其次,我們討論了每種貢獻(xiàn)的潛在好處,并確定了可以促進(jìn)行業(yè)采用HRI研究的因素。然而,我們也認(rèn)識(shí)到,與可轉(zhuǎn)移性相關(guān)的一些挑戰(zhàn),如人力資源研究從業(yè)者所需的明確定義的工作/技能集的多樣性,缺乏行業(yè)主導(dǎo)的研究,以及人力資源研究方法缺乏標(biāo)準(zhǔn)化。我們討論了這些挑戰(zhàn),并提出了潛在的解決方案,以彌合行業(yè)期望與HRI學(xué)術(shù)研究之間的差距
摘要: With advancement of robotics and artificial intelligence, applications for robotics are flourishing. Human-robot interaction (HRI) is an important area of robotics as it allows robots to work closer to humans (with them or for them). One crucial factor for the success of HRI research is transferability, which refers to the ability of research outputs to be adopted by industry and provide benefits to society. In this paper, we explore the potentials and challenges of transferability in HRI research. Firstly, we examine the current state of HRI research and identify various types of contributions that could lead to successful outcomes. Secondly, we discuss the potential benefits for each type of contribution and identify factors that could facilitate industry adoption of HRI research. However, we also recognize that there are several challenges associated with transferability, such as the diversity of well-defined job/skill-sets required from HRI practitioners, the lack of industry-led research, and the lack of standardization in HRI research methods. We discuss these challenges and propose potential solutions to bridge the gap between industry expectations and academic research in HRI.
[Downlink:]http://arxiv.org/abs/2401.05802v1
標(biāo)題: Theory of Mind abilities of Large Language Models in Human-Robot Interaction : An Illusion?
作者: Mudit Verma, Siddhant Bhambri, Subbarao Kambhampati
中文摘要: 大型語言模型在各種自然語言和生成任務(wù)中表現(xiàn)出非凡的生成能力。然而,可能的擬人化和對(duì)失敗案例的寬容推動(dòng)了對(duì)大語言模型涌現(xiàn)能力的討論,尤其是對(duì)大語言模式中心理理論能力的討論。雖然存在一些錯(cuò)誤信念測(cè)試來驗(yàn)證推斷和維護(hù)另一個(gè)實(shí)體的心理模型的能力,但我們研究了ToM能力的一個(gè)特殊應(yīng)用,它具有更高的風(fēng)險(xiǎn)和可能不可逆轉(zhuǎn)的后果:人機(jī)交互。在這項(xiàng)工作中,我們探索了感知行為識(shí)別的任務(wù),其中機(jī)器人采用大型語言模型(LLM)以類似于人類觀察者的方式評(píng)估機(jī)器人生成的行為。我們關(guān)注四種行為類型,即可解釋、可閱讀、可預(yù)測(cè)和模糊行為,這些行為已被廣泛用于合成可解釋的機(jī)器人行為。因此,LLM的目標(biāo)是成為代理的人類代理,并回答某個(gè)代理行為將如何被循環(huán)中的人類感知,例如“給定機(jī)器人的行為X,人類觀察者會(huì)發(fā)現(xiàn)它是可解釋的嗎?”。我們進(jìn)行了一項(xiàng)人類受試者研究,以驗(yàn)證用戶能夠在五個(gè)領(lǐng)域的精心策劃的情況下(機(jī)器人設(shè)置和計(jì)劃)正確回答這樣的問題。信念測(cè)試的第一個(gè)分析產(chǎn)生了非常積極的結(jié)果,夸大了人們對(duì)LLM擁有ToM能力的期望。然后,我們提出并執(zhí)行了一套打破這種錯(cuò)覺的擾動(dòng)測(cè)試,即不一致信念、不一致上下文和信念測(cè)試。我們得出的結(jié)論是,LLM在香草提示上的高分顯示了它在HRI設(shè)置中的潛在用途,然而,在LLM缺乏的情況下,擁有ToM要求對(duì)瑣碎或無關(guān)的擾動(dòng)保持不變
摘要: Large Language Models have shown exceptional generative abilities in various natural language and generation tasks. However, possible anthropomorphization and leniency towards failure cases have propelled discussions on emergent abilities of Large Language Models especially on Theory of Mind (ToM) abilities in Large Language Models. While several false-belief tests exists to verify the ability to infer and maintain mental models of another entity, we study a special application of ToM abilities that has higher stakes and possibly irreversible consequences : Human Robot Interaction. In this work, we explore the task of Perceived Behavior Recognition, where a robot employs a Large Language Model (LLM) to assess the robot’s generated behavior in a manner similar to human observer. We focus on four behavior types, namely - explicable, legible, predictable, and obfuscatory behavior which have been extensively used to synthesize interpretable robot behaviors. The LLMs goal is, therefore to be a human proxy to the agent, and to answer how a certain agent behavior would be perceived by the human in the loop, for example “Given a robot’s behavior X, would the human observer find it explicable?”. We conduct a human subject study to verify that the users are able to correctly answer such a question in the curated situations (robot setting and plan) across five domains. A first analysis of the belief test yields extremely positive results inflating ones expectations of LLMs possessing ToM abilities. We then propose and perform a suite of perturbation tests which breaks this illusion, i.e. Inconsistent Belief, Uninformative Context and Conviction Test. We conclude that, the high score of LLMs on vanilla prompts showcases its potential use in HRI settings, however to possess ToM demands invariance to trivial or irrelevant perturbations in the context which LLMs lack.
[Downlink:]http://arxiv.org/abs/2401.05302v1
標(biāo)題: Evaluating Gesture Recognition in Virtual Reality
作者: Sandeep Reddy Sabbella, Sara Kaszuba, Francesco Leotta
中文摘要: 隨著機(jī)器人融入日常生活的各個(gè)方面,人機(jī)交互(HRI)變得越來越重要。HRI的一個(gè)關(guān)鍵方面是手勢(shì)識(shí)別,它允許機(jī)器人實(shí)時(shí)解釋和響應(yīng)人類手勢(shì)。手勢(shì)識(shí)別在HRI的非言語交際中起著重要作用。為此,正在進(jìn)行的研究是,這種非語言交流如何加強(qiáng)語言交流,提高系統(tǒng)的整體效率,從而增強(qiáng)機(jī)器人的用戶體驗(yàn)。然而,手勢(shì)識(shí)別系統(tǒng)需要解決幾個(gè)挑戰(zhàn),包括數(shù)據(jù)生成、可傳輸性、可擴(kuò)展性、可推廣性、標(biāo)準(zhǔn)化以及缺乏手勢(shì)系統(tǒng)的基準(zhǔn)測(cè)試。在這篇初步論文中,我們希望通過向一些可以用作地面機(jī)器人標(biāo)準(zhǔn)的命令提供手勢(shì),來解決使用虛擬現(xiàn)實(shí)模擬生成數(shù)據(jù)的挑戰(zhàn)和標(biāo)準(zhǔn)化問題
摘要: Human-Robot Interaction (HRI) has become increasingly important as robots are being integrated into various aspects of daily life. One key aspect of HRI is gesture recognition, which allows robots to interpret and respond to human gestures in real-time. Gesture recognition plays an important role in non-verbal communication in HRI. To this aim, there is ongoing research on how such non-verbal communication can strengthen verbal communication and improve the system’s overall efficiency, thereby enhancing the user experience with the robot. However, several challenges need to be addressed in gesture recognition systems, which include data generation, transferability, scalability, generalizability, standardization, and lack of benchmarking of the gestural systems. In this preliminary paper, we want to address the challenges of data generation using virtual reality simulations and standardization issues by presenting gestures to some commands that can be used as a standard in ground robots.
[Downlink:]http://arxiv.org/abs/2401.04545v1
標(biāo)題: Testing Human-Robot Interaction in Virtual Reality: Experience from a Study on Speech Act Classification
作者: Sara Kaszuba, Sandeep Reddy Sabbella, Francesco Leotta
中文摘要: 近年來,越來越多的人機(jī)交互(HRI)方法在虛擬現(xiàn)實(shí)(VR)中得到了實(shí)施和評(píng)估,因?yàn)樗梢约涌煸O(shè)計(jì)迭代,并使最終用戶更安全地評(píng)估和掌握HRI原語。然而,確定最合適的VR體驗(yàn)并不簡(jiǎn)單。在這項(xiàng)工作中,我們?cè)u(píng)估了在智能農(nóng)業(yè)場(chǎng)景中,用戶如何在語音行為理解任務(wù)中感知沉浸式和非沉浸式VR。特別是,我們收集了參與這兩個(gè)實(shí)驗(yàn)的81名參與者的意見和建議,以突出這些不同經(jīng)歷的優(yōu)勢(shì)和劣勢(shì)
摘要: In recent years, an increasing number of Human-Robot Interaction (HRI) approaches have been implemented and evaluated in Virtual Reality (VR), as it allows to speed-up design iterations and makes it safer for the final user to evaluate and master the HRI primitives. However, identifying the most suitable VR experience is not straightforward. In this work, we evaluate how, in a smart agriculture scenario, immersive and non-immersive VR are perceived by users with respect to a speech act understanding task. In particular, we collect opinions and suggestions from the 81 participants involved in both experiments to highlight the strengths and weaknesses of these different experiences.
[Downlink:]http://arxiv.org/abs/2401.04534v1
== 強(qiáng)化學(xué)習(xí) ==
標(biāo)題: Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint
作者: Zhipeng Chen, Kun Zhou, Wayne Xin Zhao
中文摘要: 強(qiáng)化學(xué)習(xí)(RL)已被廣泛用于訓(xùn)練大型語言模型,以防止意外輸出,例如減少危害和錯(cuò)誤。然而,現(xiàn)有的RL方法大多采用實(shí)例級(jí)獎(jiǎng)勵(lì),無法對(duì)復(fù)雜的推理任務(wù)提供細(xì)粒度的監(jiān)督,也無法關(guān)注導(dǎo)致錯(cuò)誤的少數(shù)關(guān)鍵令牌。為了解決這一問題,我們提出了一種新的RL方法,名為\textbf{RLMEC},該方法結(jié)合了一個(gè)生成模型作為獎(jiǎng)勵(lì)模型,該模型由錯(cuò)誤解重寫任務(wù)在最小編輯約束下進(jìn)行訓(xùn)練,并可以為RL訓(xùn)練產(chǎn)生令牌級(jí)獎(jiǎng)勵(lì)?;谏瑟?jiǎng)勵(lì)模型,我們?cè)O(shè)計(jì)了用于訓(xùn)練的令牌級(jí)RL目標(biāo)和用于穩(wěn)定RL過程的基于模仿的正則化。這兩個(gè)目標(biāo)都集中在學(xué)習(xí)錯(cuò)誤解決方案的關(guān)鍵令牌上,減少其他不重要令牌的影響。數(shù)學(xué)任務(wù)和問答任務(wù)的實(shí)驗(yàn)結(jié)果證明了該方法的有效性。我們的代碼和數(shù)據(jù)位于\url{https://github.com/RUCAIBox/RLMEC}.
摘要: Reinforcement learning (RL) has been widely used in training large language models~(LLMs) for preventing unexpected outputs, \eg reducing harmfulness and errors. However, existing RL methods mostly adopt the instance-level reward, which is unable to provide fine-grained supervision for complex reasoning tasks, and can not focus on the few key tokens that lead to the incorrectness. To address it, we propose a new RL method named \textbf{RLMEC} that incorporates a generative model as the reward model, which is trained by the erroneous solution rewriting task under the minimum editing constraint, and can produce token-level rewards for RL training. Based on the generative reward model, we design the token-level RL objective for training and an imitation-based regularization for stabilizing RL process. And the both objectives focus on the learning of the key tokens for the erroneous solution, reducing the effect of other unimportant tokens. The experiment results on mathematical tasks and question-answering tasks have demonstrated the effectiveness of our approach. Our code and data are available at \url{https://github.com/RUCAIBox/RLMEC}.
[Downlink:]http://arxiv.org/abs/2401.06081v1
[GitHub:]https://github.com/RUCAIBox/RLMEC|
標(biāo)題: Open-Source Reinforcement Learning Environments Implemented in MuJoCo with Franka Manipulator
作者: Zichun Xu, Yuntao Li, Xiaohang Yang
中文摘要: 本文介紹了在MuJoCo物理引擎上與MuJoCo動(dòng)物園的Franka Emika Panda手臂開發(fā)的三個(gè)開源強(qiáng)化學(xué)習(xí)環(huán)境。通過繼承體育館核心技術(shù)的體育館機(jī)器人API,實(shí)現(xiàn)了推、滑、取、放三項(xiàng)具有代表性的任務(wù)。支持稀疏二進(jìn)制和密集獎(jiǎng)勵(lì),并且觀察空間包含所需和已實(shí)現(xiàn)目標(biāo)的關(guān)鍵,以遵循多目標(biāo)強(qiáng)化學(xué)習(xí)框架。使用三種不同的非策略算法來驗(yàn)證仿真屬性,以確保所有任務(wù)的逼真度,并給出了基準(zhǔn)測(cè)試結(jié)果。每個(gè)環(huán)境和任務(wù)都以干凈的方式定義,并保留用于修改環(huán)境的主要參數(shù)以反映主要差異。存儲(chǔ)庫(包括所有環(huán)境)位于https://github.com/zichunxx/panda_mujoco_gym.
摘要: This paper presents three open-source reinforcement learning environments developed on the MuJoCo physics engine with the Franka Emika Panda arm in MuJoCo Menagerie. Three representative tasks, push, slide, and pick-and-place, are implemented through the Gymnasium Robotics API, which inherits from the core of Gymnasium. Both the sparse binary and dense rewards are supported, and the observation space contains the keys of desired and achieved goals to follow the Multi-Goal Reinforcement Learning framework. Three different off-policy algorithms are used to validate the simulation attributes to ensure the fidelity of all tasks, and benchmark results are also given. Each environment and task are defined in a clean way, and the main parameters for modifying the environment are preserved to reflect the main difference. The repository, including all environments, is available at https://github.com/zichunxx/panda_mujoco_gym.
[Downlink:]http://arxiv.org/abs/2312.13788v2
[GitHub:]https://github.com/zichunxx/panda_mujoco_gym.|
標(biāo)題: Edge Generation Scheduling for DAG Tasks Using Deep Reinforcement Learning
作者: Binqi Sun, Mirco Theile, Ziyuan Qin
中文摘要: 有向無環(huán)圖(DAG)任務(wù)目前在實(shí)時(shí)領(lǐng)域中被采用,用于對(duì)汽車、航空電子和工業(yè)領(lǐng)域的復(fù)雜應(yīng)用程序進(jìn)行建模,這些應(yīng)用程序通過相互通信的任務(wù)鏈來實(shí)現(xiàn)其功能。本文基于平凡可調(diào)度性的概念,提出了一種新的可調(diào)度性測(cè)試方法,研究了實(shí)時(shí)DAG任務(wù)的調(diào)度問題。利用這種可調(diào)度性測(cè)試,我們提出了一種新的DAG調(diào)度框架(邊緣生成調(diào)度——EGS),該框架試圖通過迭代生成邊緣來最小化DAG寬度,同時(shí)保證最后期限約束。我們研究了如何通過開發(fā)深度強(qiáng)化學(xué)習(xí)算法與圖表示神經(jīng)網(wǎng)絡(luò)相結(jié)合來學(xué)習(xí)有效的EGS邊緣生成策略,從而有效地解決邊緣生成問題。我們通過將所提出的算法與最先進(jìn)的DAG調(diào)度啟發(fā)式算法和最優(yōu)混合整數(shù)線性規(guī)劃基線進(jìn)行比較來評(píng)估其有效性。實(shí)驗(yàn)結(jié)果表明,所提出的算法在調(diào)度相同DAG任務(wù)時(shí)需要更少的處理器,優(yōu)于現(xiàn)有技術(shù)。代碼位于https://github.com/binqi-sun/egs.
摘要: Directed acyclic graph (DAG) tasks are currently adopted in the real-time domain to model complex applications from the automotive, avionics, and industrial domains that implement their functionalities through chains of intercommunicating tasks. This paper studies the problem of scheduling real-time DAG tasks by presenting a novel schedulability test based on the concept of trivial schedulability. Using this schedulability test, we propose a new DAG scheduling framework (edge generation scheduling – EGS) that attempts to minimize the DAG width by iteratively generating edges while guaranteeing the deadline constraint. We study how to efficiently solve the problem of generating edges by developing a deep reinforcement learning algorithm combined with a graph representation neural network to learn an efficient edge generation policy for EGS. We evaluate the effectiveness of the proposed algorithm by comparing it with state-of-the-art DAG scheduling heuristics and an optimal mixed-integer linear programming baseline. Experimental results show that the proposed algorithm outperforms the state-of-the-art by requiring fewer processors to schedule the same DAG tasks. The code is available at https://github.com/binqi-sun/egs.
[Downlink:]http://arxiv.org/abs/2308.14647v2
[GitHub:]https://github.com/binqi-sun/egs.|
標(biāo)題: HomeRobot: Open-Vocabulary Mobile Manipulation
作者: Sriram Yenamandra, Arun Ramachandran, Karmesh Yadav
中文摘要: 家庭機(jī)器人(名詞):一種價(jià)格合理的順從機(jī)器人,可以在家中導(dǎo)航并操縱各種物體以完成日常任務(wù)。開放詞匯移動(dòng)操作(OVMM)是指在任何看不見的環(huán)境中拾取任何對(duì)象,并將其放置在命令位置的問題。這是機(jī)器人在人類環(huán)境中成為有用助手的一個(gè)基本挑戰(zhàn),因?yàn)樗婕暗浇鉀Q機(jī)器人的子問題:感知、語言理解、導(dǎo)航和操作都是OVMM的關(guān)鍵。此外,這些子問題的解決方案的一體化也帶來了自身的重大挑戰(zhàn)。為了推動(dòng)這一領(lǐng)域的研究,我們引入了HomeRobot OVMM基準(zhǔn),在該基準(zhǔn)中,代理導(dǎo)航家庭環(huán)境,以抓取新物體并將其放置在目標(biāo)容器上。HomeRobot有兩個(gè)組件:一個(gè)模擬組件,在新的、高質(zhì)量的多房間家庭環(huán)境中使用大型和多樣化的策劃對(duì)象集;和一個(gè)真實(shí)世界的組件,為低成本的Hello Robot Stretch提供了一個(gè)軟件堆棧,以鼓勵(lì)在實(shí)驗(yàn)室中復(fù)制真實(shí)世界的實(shí)驗(yàn)。我們實(shí)現(xiàn)了強(qiáng)化學(xué)習(xí)和啟發(fā)式(基于模型的)基線,并展示了模擬到真實(shí)轉(zhuǎn)移的證據(jù)。我們的基線在現(xiàn)實(shí)世界中實(shí)現(xiàn)了20%的成功率;我們的實(shí)驗(yàn)確定了未來研究工作提高性能的方法。查看我們網(wǎng)站上的視頻:https://ovmm.github.io/.
摘要: HomeRobot (noun): An affordable compliant robot that navigates homes and manipulates a wide range of objects in order to complete everyday tasks. Open-Vocabulary Mobile Manipulation (OVMM) is the problem of picking any object in any unseen environment, and placing it in a commanded location. This is a foundational challenge for robots to be useful assistants in human environments, because it involves tackling sub-problems from across robotics: perception, language understanding, navigation, and manipulation are all essential to OVMM. In addition, integration of the solutions to these sub-problems poses its own substantial challenges. To drive research in this area, we introduce the HomeRobot OVMM benchmark, where an agent navigates household environments to grasp novel objects and place them on target receptacles. HomeRobot has two components: a simulation component, which uses a large and diverse curated object set in new, high-quality multi-room home environments; and a real-world component, providing a software stack for the low-cost Hello Robot Stretch to encourage replication of real-world experiments across labs. We implement both reinforcement learning and heuristic (model-based) baselines and show evidence of sim-to-real transfer. Our baselines achieve a 20% success rate in the real world; our experiments identify ways future research work improve performance. See videos on our website: https://ovmm.github.io/.
[Downlink:]http://arxiv.org/abs/2306.11565v2
[Project:]https://ovmm.github.io/.|
標(biāo)題: Yes, this is what I was looking for! Towards Multi-modal Medical Consultation Concern Summary Generation
作者: Abhisek Tiwari, Shreyangshu Bera, Sriparna Saha
中文摘要: 在過去幾年中,互聯(lián)網(wǎng)在醫(yī)療保健相關(guān)任務(wù)中的使用突飛猛進(jìn),這對(duì)有效管理和處理信息以確保其高效利用提出了挑戰(zhàn)。在情緒動(dòng)蕩和心理挑戰(zhàn)的時(shí)刻,我們經(jīng)常求助于互聯(lián)網(wǎng)作為我們最初的支持來源,由于相關(guān)的社會(huì)污名,我們選擇了互聯(lián)網(wǎng)而不是與他人討論我們的感受。在本文中,我們提出了一項(xiàng)新的任務(wù),即生成多模式醫(yī)療問題摘要(MMCS),該任務(wù)對(duì)患者在咨詢過程中提出的主要問題進(jìn)行了簡(jiǎn)短而準(zhǔn)確的總結(jié)。非語言提示,如患者的手勢(shì)和面部表情,有助于準(zhǔn)確識(shí)別患者的擔(dān)憂。醫(yī)生還會(huì)考慮患者的個(gè)人信息,如年齡和性別,以便適當(dāng)?shù)孛枋鲠t(yī)療狀況。受患者個(gè)人背景和視覺手勢(shì)的潛在功效的啟發(fā),我們提出了一個(gè)基于轉(zhuǎn)換器的多任務(wù)、多模式意圖識(shí)別和醫(yī)療問題摘要生成(IR-MMCSG)系統(tǒng)。此外,我們提出了一個(gè)多任務(wù)框架,用于醫(yī)患會(huì)診的意圖識(shí)別和醫(yī)療問題摘要生成。我們構(gòu)建了第一個(gè)多模式醫(yī)療問題摘要生成(MM MediConSummation)語料庫,其中包括用醫(yī)療問題摘要、意圖、患者個(gè)人信息、醫(yī)生建議和關(guān)鍵詞注釋的醫(yī)患咨詢。我們的實(shí)驗(yàn)和分析證明了(a)患者的表情/手勢(shì)及其個(gè)人信息在意圖識(shí)別和醫(yī)療問題摘要生成中的重要作用,以及(b)意圖識(shí)別和患者醫(yī)療問題摘要生成器之間的強(qiáng)相關(guān)性。數(shù)據(jù)集和源代碼可在https://github.com/NLP-RL/MMCSG.
摘要: Over the past few years, the use of the Internet for healthcare-related tasks has grown by leaps and bounds, posing a challenge in effectively managing and processing information to ensure its efficient utilization. During moments of emotional turmoil and psychological challenges, we frequently turn to the internet as our initial source of support, choosing this over discussing our feelings with others due to the associated social stigma. In this paper, we propose a new task of multi-modal medical concern summary (MMCS) generation, which provides a short and precise summary of patients’ major concerns brought up during the consultation. Nonverbal cues, such as patients’ gestures and facial expressions, aid in accurately identifying patients’ concerns. Doctors also consider patients’ personal information, such as age and gender, in order to describe the medical condition appropriately. Motivated by the potential efficacy of patients’ personal context and visual gestures, we propose a transformer-based multi-task, multi-modal intent-recognition, and medical concern summary generation (IR-MMCSG) system. Furthermore, we propose a multitasking framework for intent recognition and medical concern summary generation for doctor-patient consultations. We construct the first multi-modal medical concern summary generation (MM-MediConSummation) corpus, which includes patient-doctor consultations annotated with medical concern summaries, intents, patient personal information, doctor’s recommendations, and keywords. Our experiments and analysis demonstrate (a) the significant role of patients’ expressions/gestures and their personal information in intent identification and medical concern summary generation, and (b) the strong correlation between intent recognition and patients’ medical concern summary generation The dataset and source code are available at https://github.com/NLP-RL/MMCSG.
[Downlink:]http://arxiv.org/abs/2401.05134v1
[GitHub:]https://github.com/NLP-RL/MMCSG.|
標(biāo)題: Human as AI Mentor: Enhanced Human-in-the-loop Reinforcement Learning for Safe and Efficient Autonomous Driving
作者: Zilin Huang, Zihao Sheng, Chengyuan Ma
中文摘要: 盡管自動(dòng)駕駛汽車取得了重大進(jìn)展,但尚未充分探索制定既能確保自動(dòng)駕駛汽車安全又能確保交通流效率的駕駛政策。在本文中,我們提出了一種增強(qiáng)的人在環(huán)強(qiáng)化學(xué)習(xí)方法,稱為基于人工智能導(dǎo)師的深度強(qiáng)化學(xué)習(xí)(HAIM-DRL)框架,該框架有助于混合交通車隊(duì)中安全高效的自動(dòng)駕駛。從人類學(xué)習(xí)過程中汲取靈感,我們首先引入了一種創(chuàng)新的學(xué)習(xí)范式,將人類智能有效地注入人工智能,稱為“人類即人工智能導(dǎo)師”(HAIM)。在這種范式中,人類專家充當(dāng)人工智能代理的導(dǎo)師。在允許智能體充分探索不確定環(huán)境的同時(shí),人類專家可以在危險(xiǎn)情況下進(jìn)行控制,并展示正確的行動(dòng)以避免潛在的事故。另一方面,可以引導(dǎo)代理最小化交通流干擾,從而優(yōu)化交通流效率。詳細(xì)地說,HAIM-DRL利用從自由探索和部分人類演示中收集的數(shù)據(jù)作為其兩個(gè)訓(xùn)練來源。值得注意的是,我們避開了手動(dòng)設(shè)計(jì)獎(jiǎng)勵(lì)函數(shù)的復(fù)雜過程;相反,我們直接從部分人類演示中導(dǎo)出代理狀態(tài)動(dòng)作值,以指導(dǎo)代理的策略學(xué)習(xí)。此外,我們采用最小干預(yù)技術(shù)來減少人類導(dǎo)師的認(rèn)知負(fù)荷。比較結(jié)果表明,HAIM-DRL在駕駛安全性、采樣效率、交通流干擾的緩解以及對(duì)未知交通場(chǎng)景的可推廣性方面優(yōu)于傳統(tǒng)方法。本文的代碼和演示視頻可訪問:https://zilin-huang.github.io/HAIM-DRL-website/
摘要: Despite significant progress in autonomous vehicles (AVs), the development of driving policies that ensure both the safety of AVs and traffic flow efficiency has not yet been fully explored. In this paper, we propose an enhanced human-in-the-loop reinforcement learning method, termed the Human as AI mentor-based deep reinforcement learning (HAIM-DRL) framework, which facilitates safe and efficient autonomous driving in mixed traffic platoon. Drawing inspiration from the human learning process, we first introduce an innovative learning paradigm that effectively injects human intelligence into AI, termed Human as AI mentor (HAIM). In this paradigm, the human expert serves as a mentor to the AI agent. While allowing the agent to sufficiently explore uncertain environments, the human expert can take control in dangerous situations and demonstrate correct actions to avoid potential accidents. On the other hand, the agent could be guided to minimize traffic flow disturbance, thereby optimizing traffic flow efficiency. In detail, HAIM-DRL leverages data collected from free exploration and partial human demonstrations as its two training sources. Remarkably, we circumvent the intricate process of manually designing reward functions; instead, we directly derive proxy state-action values from partial human demonstrations to guide the agents’ policy learning. Additionally, we employ a minimal intervention technique to reduce the human mentor’s cognitive load. Comparative results show that HAIM-DRL outperforms traditional methods in driving safety, sampling efficiency, mitigation of traffic flow disturbance, and generalizability to unseen traffic scenarios. The code and demo videos for this paper can be accessed at: https://zilin-huang.github.io/HAIM-DRL-website/
[Downlink:]http://arxiv.org/abs/2401.03160v2
[Project:]https://zilin-huang.github.io/HAIM-DRL-website/|
== 開放詞匯檢測(cè) ==
標(biāo)題: CoSSegGaussians: Compact and Swift Scene Segmenting 3D Gaussians
作者: Bin Dou, Tianyu Zhang, Yongjia Ma
中文摘要: 我們提出了緊湊和快速分割3D高斯(CoSSegGaussians),這是一種僅使用RGB圖像輸入以快速渲染速度進(jìn)行緊湊3D一致場(chǎng)景分割的方法。先前基于NeRF的3D分割方法依賴于隱式或體素神經(jīng)場(chǎng)景表示和射線行進(jìn)體繪制,這是耗時(shí)的。最近的3D高斯Splatting顯著提高了渲染速度,然而,現(xiàn)有的基于高斯的分割方法(例如:高斯分組)無法提供緊湊的分割掩模,尤其是在零樣本分割中,這主要是由于當(dāng)遇到不一致的2D機(jī)器生成標(biāo)簽時(shí),缺乏用于直接將可學(xué)習(xí)參數(shù)分配給每個(gè)高斯的魯棒性和緊湊性。我們的方法旨在通過用淺層解碼網(wǎng)絡(luò)映射每個(gè)高斯點(diǎn)的融合空間和語義有意義的特征,快速實(shí)現(xiàn)緊湊可靠的零樣本場(chǎng)景分割。具體來說,我們的方法首先在RGB圖像的監(jiān)督下優(yōu)化高斯點(diǎn)的位置、協(xié)方差和顏色屬性。在高斯定位之后,我們通過對(duì)每個(gè)高斯進(jìn)行非投影來提取從圖像中提取的多尺度DINO特征,然后將其與來自快速點(diǎn)特征處理網(wǎng)絡(luò)(即RandLA-Net)的空間特征相結(jié)合。然后將淺層解碼MLP應(yīng)用于多尺度融合特征以獲得緊湊分割。實(shí)驗(yàn)結(jié)果表明,我們的模型可以進(jìn)行高質(zhì)量的零樣本場(chǎng)景分割,因?yàn)槲覀兊哪P驮谡Z義和全景分割任務(wù)上都優(yōu)于其他分割方法,同時(shí)與基于NeRF的分割相比,只消耗了大約10%的分割時(shí)間。代碼和更多結(jié)果將在https://David-Dou.github.io/CoSSegGaussians
摘要: We propose Compact and Swift Segmenting 3D Gaussians(CoSSegGaussians), a method for compact 3D-consistent scene segmentation at fast rendering speed with only RGB images input. Previous NeRF-based 3D segmentation methods have relied on implicit or voxel neural scene representation and ray-marching volume rendering which are time consuming. Recent 3D Gaussian Splatting significantly improves the rendering speed, however, existing Gaussians-based segmentation methods(eg: Gaussian Grouping) fail to provide compact segmentation masks especially in zero-shot segmentation, which is mainly caused by the lack of robustness and compactness for straightforwardly assigning learnable parameters to each Gaussian when encountering inconsistent 2D machine-generated labels. Our method aims to achieve compact and reliable zero-shot scene segmentation swiftly by mapping fused spatial and semantically meaningful features for each Gaussian point with a shallow decoding network. Specifically, our method firstly optimizes Gaussian points’ position, convariance and color attributes under the supervision of RGB images. After Gaussian Locating, we distill multi-scale DINO features extracted from images through unprojection to each Gaussian, which is then incorporated with spatial features from the fast point features processing network, i.e. RandLA-Net. Then the shallow decoding MLP is applied to the multi-scale fused features to obtain compact segmentation. Experimental results show that our model can perform high-quality zero-shot scene segmentation, as our model outperforms other segmentation methods on both semantic and panoptic segmentation task, meanwhile consumes approximately only 10% segmenting time compared to NeRF-based segmentation. Code and more results will be available at https://David-Dou.github.io/CoSSegGaussians
[Downlink:]http://arxiv.org/abs/2401.05925v1
[Project:]https://David-Dou.github.io/CoSSegGaussians|
標(biāo)題: IODeep: an IOD for the introduction of deep learning in the DICOM standard
作者: Salvatore Contino, Luca Cruciata, Orazio Gambino
中文摘要: 背景和目的:近年來,隨著越來越多的數(shù)據(jù)集的可用性和知名競(jìng)賽的建立,人工智能(AI),特別是深度神經(jīng)網(wǎng)絡(luò)(DNN)成為生物醫(yī)學(xué)圖像分割的相關(guān)研究課題。盡管基于DNN的分割在研究方面很受歡迎,但這些技術(shù)在日常臨床實(shí)踐中幾乎沒有使用過,即使它們可以在診斷過程中有效地支持醫(yī)生。除了與神經(jīng)模型預(yù)測(cè)的可解釋性相關(guān)的問題外,這些系統(tǒng)沒有集成在診斷工作流程中,需要對(duì)其使用進(jìn)行標(biāo)準(zhǔn)化以實(shí)現(xiàn)這一目標(biāo)。方法:本文向IODeep提出了一種新的DICOM信息對(duì)象定義(IOD),旨在存儲(chǔ)已經(jīng)在特定圖像數(shù)據(jù)集上訓(xùn)練的DNN的權(quán)重和架構(gòu),該圖像數(shù)據(jù)集被標(biāo)記為采集模式、解剖區(qū)域和正在研究的疾病。結(jié)果:IOD體系結(jié)構(gòu)以及基于上述標(biāo)簽的PACS服務(wù)器的DNN選擇算法,以及一個(gè)專門設(shè)計(jì)用于演示DICOM集成有效性的簡(jiǎn)單PACS查看器,而不需要在PACS服務(wù)器端進(jìn)行修改。此外,還實(shí)現(xiàn)了支持整個(gè)工作流的基于服務(wù)的體系結(jié)構(gòu)。結(jié)論:IODeep確保了訓(xùn)練后的人工智能模型在DICOM基礎(chǔ)設(shè)施中的完全集成,它還實(shí)現(xiàn)了一種場(chǎng)景,即訓(xùn)練后的模型可以根據(jù)醫(yī)院數(shù)據(jù)進(jìn)行微調(diào),也可以在不同醫(yī)院共享的聯(lián)合學(xué)習(xí)方案中進(jìn)行訓(xùn)練。通過這種方式,人工智能模型可以根據(jù)放射科病房產(chǎn)生的真實(shí)數(shù)據(jù)進(jìn)行定制,從而改進(jìn)醫(yī)生的決策過程。源代碼免費(fèi)提供于https://github.com/CHILab1/IODeep.git
摘要: Background and Objective: In recent years, Artificial Intelligence (AI) and in particular Deep Neural Networks (DNN) became a relevant research topic in biomedical image segmentation due to the availability of more and more data sets along with the establishment of well known competitions. Despite the popularity of DNN based segmentation on the research side, these techniques are almost unused in the daily clinical practice even if they could support effectively the physician during the diagnostic process. Apart from the issues related to the explainability of the predictions of a neural model, such systems are not integrated in the diagnostic workflow, and a standardization of their use is needed to achieve this goal. Methods: This paper presents IODeep a new DICOM Information Object Definition (IOD) aimed at storing both the weights and the architecture of a DNN already trained on a particular image dataset that is labeled as regards the acquisition modality, the anatomical region, and the disease under investigation. Results: The IOD architecture is presented along with a DNN selection algorithm from the PACS server based on the labels outlined above, and a simple PACS viewer purposely designed for demonstrating the effectiveness of the DICOM integration, while no modifications are required on the PACS server side. Also a service based architecture in support of the entire workflow has been implemented. Conclusion: IODeep ensures full integration of a trained AI model in a DICOM infrastructure, and it is also enables a scenario where a trained model can be either fine-tuned with hospital data or trained in a federated learning scheme shared by different hospitals. In this way AI models can be tailored to the real data produced by a Radiology ward thus improving the physician decision making process. Source code is freely available at https://github.com/CHILab1/IODeep.git
[Downlink:]http://arxiv.org/abs/2311.16163v3
[GitHub:]https://github.com/CHILab1/IODeep.git|
標(biāo)題: LKCA: Large Kernel Convolutional Attention
作者: Chenghao Li, Boheng Zeng, Yi Lu
中文摘要: 我們重新審視了視覺變換器中注意力機(jī)制與大核卷積網(wǎng)之間的關(guān)系,并提出了一種新的空間注意力,稱為大核卷積注意力(LKCA)。它通過用單個(gè)大內(nèi)核卷積代替注意力運(yùn)算來簡(jiǎn)化注意力運(yùn)算。LKCA結(jié)合了卷積神經(jīng)網(wǎng)絡(luò)和視覺轉(zhuǎn)換器的優(yōu)勢(shì),具有大的感受野、局部性和參數(shù)共享。我們從卷積和注意力的角度解釋了LKCA的優(yōu)越性,為每個(gè)視圖提供了等效的代碼實(shí)現(xiàn)。實(shí)驗(yàn)證實(shí),從卷積和注意力角度實(shí)現(xiàn)的LKCA表現(xiàn)出等效的性能。我們?cè)诜诸惡头指钊蝿?wù)中對(duì)ViT的LKCA變體進(jìn)行了廣泛的實(shí)驗(yàn)。實(shí)驗(yàn)表明LKCA在視覺任務(wù)中表現(xiàn)出有競(jìng)爭(zhēng)力的表現(xiàn)。我們的代碼將在https://github.com/CatworldLee/LKCA.
摘要: We revisit the relationship between attention mechanisms and large kernel ConvNets in visual transformers and propose a new spatial attention named Large Kernel Convolutional Attention (LKCA). It simplifies the attention operation by replacing it with a single large kernel convolution. LKCA combines the advantages of convolutional neural networks and visual transformers, possessing a large receptive field, locality, and parameter sharing. We explained the superiority of LKCA from both convolution and attention perspectives, providing equivalent code implementations for each view. Experiments confirm that LKCA implemented from both the convolutional and attention perspectives exhibit equivalent performance. We extensively experimented with the LKCA variant of ViT in both classification and segmentation tasks. The experiments demonstrated that LKCA exhibits competitive performance in visual tasks. Our code will be made publicly available at https://github.com/CatworldLee/LKCA.
[Downlink:]http://arxiv.org/abs/2401.05738v1
[GitHub:]https://github.com/CatworldLee/LKCA.|
標(biāo)題: Recurrent Generic Contour-based Instance Segmentation with Progressive Learning
作者: Hao Feng, Keyi Zhou, Wengang Zhou
中文摘要: 基于輪廓的實(shí)例分割因其在處理復(fù)雜背景下的視覺對(duì)象時(shí)的靈活性和優(yōu)雅性而受到積極研究。在這項(xiàng)工作中,我們提出了一種新的深度網(wǎng)絡(luò)架構(gòu),即PolySnake,用于基于通用輪廓的實(shí)例分割。受經(jīng)典Snake算法的啟發(fā),所提出的PolySnake通過迭代和漸進(jìn)的輪廓細(xì)化策略實(shí)現(xiàn)了卓越和穩(wěn)健的分割性能。從技術(shù)上講,PolySnake引入了一個(gè)遞歸更新算子來迭代估計(jì)對(duì)象輪廓。它保持對(duì)輪廓的單一估計(jì),該輪廓朝著對(duì)象邊界逐漸變形。在每次迭代中,PolySnake都會(huì)為當(dāng)前輪廓構(gòu)建一個(gè)語義豐富的表示,并將其提供給遞歸算子以進(jìn)行進(jìn)一步的輪廓調(diào)整。通過迭代細(xì)化,輪廓逐漸收斂到緊緊包圍對(duì)象實(shí)例的穩(wěn)定狀態(tài)。除了一般實(shí)例分割的范圍外,還進(jìn)行了大量實(shí)驗(yàn),以驗(yàn)證我們的PolySnake在兩個(gè)額外的特定任務(wù)場(chǎng)景中的有效性和可推廣性,包括場(chǎng)景文本檢測(cè)和車道檢測(cè)。結(jié)果表明,在三個(gè)任務(wù)中,所提出的PolySnake在多個(gè)流行的基準(zhǔn)測(cè)試上優(yōu)于現(xiàn)有的高級(jí)方法。代碼和經(jīng)過預(yù)訓(xùn)練的模型可在https://github.com/fh2019ustc/PolySnake
摘要: Contour-based instance segmentation has been actively studied, thanks to its flexibility and elegance in processing visual objects within complex backgrounds. In this work, we propose a novel deep network architecture, i.e., PolySnake, for generic contour-based instance segmentation. Motivated by the classic Snake algorithm, the proposed PolySnake achieves superior and robust segmentation performance with an iterative and progressive contour refinement strategy. Technically, PolySnake introduces a recurrent update operator to estimate the object contour iteratively. It maintains a single estimate of the contour that is progressively deformed toward the object boundary. At each iteration, PolySnake builds a semantic-rich representation for the current contour and feeds it to the recurrent operator for further contour adjustment. Through the iterative refinements, the contour progressively converges to a stable status that tightly encloses the object instance. Beyond the scope of general instance segmentation, extensive experiments are conducted to validate the effectiveness and generalizability of our PolySnake in two additional specific task scenarios, including scene text detection and lane detection. The results demonstrate that the proposed PolySnake outperforms the existing advanced methods on several multiple prevalent benchmarks across the three tasks. The codes and pre-trained models are available at https://github.com/fh2019ustc/PolySnake
[Downlink:]http://arxiv.org/abs/2301.08898v2
[GitHub:]https://github.com/fh2019ustc/PolySnake|
標(biāo)題: LinK3D: Linear Keypoints Representation for 3D LiDAR Point Cloud
作者: Yunge Cui, Yinlong Zhang, Jiahua Dong
中文摘要: 特征提取和匹配是許多機(jī)器人視覺任務(wù)的基本部分,如2D或3D對(duì)象檢測(cè)、識(shí)別和配準(zhǔn)。眾所周知,二維特征提取和匹配已經(jīng)取得了巨大的成功。不幸的是,在3D領(lǐng)域,由于3D激光雷達(dá)傳感器的描述性差和效率低,目前的方法可能無法支持其在機(jī)器人視覺任務(wù)中的廣泛應(yīng)用。為了解決這一限制,我們提出了一種新的3D特征表示方法:3D激光雷達(dá)點(diǎn)云的線性關(guān)鍵點(diǎn)表示,稱為LinK3D。LinK3D的新穎之處在于,它充分考慮了激光雷達(dá)點(diǎn)云的特性(如稀疏性和復(fù)雜性),并用其魯棒的鄰居關(guān)鍵點(diǎn)來表示關(guān)鍵點(diǎn),這在關(guān)鍵點(diǎn)的描述中提供了強(qiáng)大的約束。在三個(gè)公共數(shù)據(jù)集上對(duì)所提出的LinK3D進(jìn)行了評(píng)估,實(shí)驗(yàn)結(jié)果表明,我們的方法具有很好的匹配性能。更重要的是,LinK3D還顯示出出色的實(shí)時(shí)性能,比典型旋轉(zhuǎn)激光雷達(dá)傳感器在10Hz下的傳感器幀速率更快。LinK3D從64束激光雷達(dá)收集的點(diǎn)云中提取特征平均只需30毫秒,在配備英特爾酷睿i7處理器的計(jì)算機(jī)上執(zhí)行時(shí),匹配兩次激光雷達(dá)掃描僅需約20毫秒。此外,我們的方法可以擴(kuò)展到激光雷達(dá)里程計(jì)任務(wù),并顯示出良好的可擴(kuò)展性。我們?cè)诎l(fā)布方法的實(shí)現(xiàn)https://github.com/YungeCui/LinK3D.
摘要: Feature extraction and matching are the basic parts of many robotic vision tasks, such as 2D or 3D object detection, recognition, and registration. As is known, 2D feature extraction and matching have already achieved great success. Unfortunately, in the field of 3D, the current methods may fail to support the extensive application of 3D LiDAR sensors in robotic vision tasks due to their poor descriptiveness and inefficiency. To address this limitation, we propose a novel 3D feature representation method: Linear Keypoints representation for 3D LiDAR point cloud, called LinK3D. The novelty of LinK3D lies in that it fully considers the characteristics (such as the sparsity and complexity) of LiDAR point clouds and represents the keypoint with its robust neighbor keypoints, which provide strong constraints in the description of the keypoint. The proposed LinK3D has been evaluated on three public datasets, and the experimental results show that our method achieves great matching performance. More importantly, LinK3D also shows excellent real-time performance, faster than the sensor frame rate at 10 Hz of a typical rotating LiDAR sensor. LinK3D only takes an average of 30 milliseconds to extract features from the point cloud collected by a 64-beam LiDAR and takes merely about 20 milliseconds to match two LiDAR scans when executed on a computer with an Intel Core i7 processor. Moreover, our method can be extended to LiDAR odometry task, and shows good scalability. We release the implementation of our method at https://github.com/YungeCui/LinK3D.
[Downlink:]http://arxiv.org/abs/2206.05927v3
[GitHub:]https://github.com/YungeCui/LinK3D.|
標(biāo)題: DC-Net: Divide-and-Conquer for Salient Object Detection
作者: Jiayi Zhu, Xuebin Qin, Abdulmotaleb Elsaddik
中文摘要: 在本文中,我們將分割和征服引入顯著對(duì)象檢測(cè)(SOD)任務(wù),以使模型能夠?qū)W習(xí)用于預(yù)測(cè)顯著圖的先驗(yàn)知識(shí)。我們?cè)O(shè)計(jì)了一種新的網(wǎng)絡(luò),即分治網(wǎng)絡(luò)(DC Net),它使用兩個(gè)編碼器來解決有助于預(yù)測(cè)最終顯著性圖的不同子任務(wù),這里是預(yù)測(cè)寬度為4的邊緣圖和顯著對(duì)象的位置圖,然后將具有不同語義信息的特征圖聚合到解碼器中,以預(yù)測(cè)最終的顯著性圖。DC Net的解碼器由我們新設(shè)計(jì)的兩級(jí)殘差嵌套ASPP(ResASPP 2 ^{2} 2)模塊組成,該模塊能夠用少量卷積運(yùn)算捕獲大量不同尺度的特征,并具有始終保持高分辨率和能夠獲得大而緊湊的有效感受野(ERF)的優(yōu)點(diǎn)?;贒ivide and Conquer并行計(jì)算的優(yōu)勢(shì),我們使用并行加速來加速DCNet,使其能夠在6個(gè)LR-SOD和5個(gè)HR-SOD數(shù)據(jù)集上以高效(60 FPS和55 FPS)的速度獲得有競(jìng)爭(zhēng)力的性能。代碼和結(jié)果可用:https://github.com/PiggyJerry/DC-Net.
摘要: In this paper, we introduce Divide-and-Conquer into the salient object detection (SOD) task to enable the model to learn prior knowledge that is for predicting the saliency map. We design a novel network, Divide-and-Conquer Network (DC-Net) which uses two encoders to solve different subtasks that are conducive to predicting the final saliency map, here is to predict the edge maps with width 4 and location maps of salient objects and then aggregate the feature maps with different semantic information into the decoder to predict the final saliency map. The decoder of DC-Net consists of our newly designed two-level Residual nested-ASPP (ResASPP 2 ^{2} 2) modules, which have the ability to capture a large number of different scale features with a small number of convolution operations and have the advantages of maintaining high resolution all the time and being able to obtain a large and compact effective receptive field (ERF). Based on the advantage of Divide-and-Conquer’s parallel computing, we use Parallel Acceleration to speed up DC-Net, allowing it to achieve competitive performance on six LR-SOD and five HR-SOD datasets under high efficiency (60 FPS and 55 FPS). Codes and results are available: https://github.com/PiggyJerry/DC-Net.
[Downlink:]http://arxiv.org/abs/2305.14955v3
[GitHub:]https://github.com/PiggyJerry/DC-Net.|文章來源:http://www.zghlxwxcb.cn/news/detail-800592.html
專屬領(lǐng)域論文訂閱
VX關(guān)注{曉理紫},每日更新論文,如感興趣,請(qǐng)轉(zhuǎn)發(fā)給有需要的同學(xué),謝謝支持
VX關(guān)注曉理紫,并留下郵箱可免費(fèi)獲取每日論文推送服務(wù)文章來源地址http://www.zghlxwxcb.cn/news/detail-800592.html
到了這里,關(guān)于[曉理紫]每日論文推送(有中文摘要,源碼或項(xiàng)目地址)--機(jī)器人、視覺相關(guān)的文章就介紹完了。如果您還想了解更多內(nèi)容,請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!