KoMA: Knowledge-driven Multi-agent Framework for Autonomous Driving with Large Language Models

¹Beihang University, 100191, Beijing, P.R.China., ²Zhongguancun National Laboratory, 100191, Beijing, P.R.China.
First author: jiangkemou@buaa.edu.cn
^*Corresponding author: zhiyongc@buaa.edu.cn, yilongren@buaa.edu.cn

Abstract

Large language models (LLMs) as autonomous agents offer a novel avenue for tackling real-world challenges through a knowledge-driven manner. These LLM-enhanced methodologies excel in generalization and interpretability. However, the complexity of driving tasks often necessitates the collaboration of multiple, heterogeneous agents, underscoring the need for such LLM-driven agents to engage in cooperative knowledge sharing and cognitive synergy. Despite the promise of LLMs, current applications predominantly center around single-agent scenarios, which limits their scope in the face of intricate, interconnected tasks. To broaden the horizons of knowledge-driven strategies and bolster the generalization capabilities of autonomous agents, we propose the KoMA framework consisting of the multi-agent interaction, the multi-step planning, the shared-memory, and the ranking-based reflection modules to enhance multi-agents' decision-making in complex driving scenarios. Based on the framework's generated text descriptions of driving scenarios, the multi-agent interaction module enables LLM agents to analyze and infer the intentions of surrounding vehicles based on scene information , akin to human cognition. The multi-step planning module enables LLM agents to analyze and obtain final action decisions layer by layer to ensure consistent goals for short-term action decisions. The shared memory module can accumulate collective experience to make superior decisions, and the ranking-based reflection module can evaluate and improve agent behavior with the aim of enhancing driving safety and efficiency. The KoMA framework not only enhances the robustness and adaptability of autonomous driving agents but also significantly elevates their generalization capabilities across diverse scenarios. Empirical results demonstrate the superiority of our approach over traditional methods, particularly in its ability to handle complex, unpredictable driving environments without extensive retraining.

Method Overview

We investigate the potential of leveraging LLMs within a multi-agent framework to enhance the decision-making capabilities of autonomous driving system. This study aims to develop a multi-agent autonomous driving decision-making framework based on large language model, which simulates human decision-making process and combines LLM's own reasoning ability to form knowledge-driven, so as to improve the safety and efficiency of LLM agents driving in relatively complex environments. The framework consists of five complete modules: environment module, multi-agent interaction module, multi-step planning module, shared-memory module and ranking-based reflection module. The specific research contents are as follows: (1) Environment module: Create a realistic autonomous driving simulation scenario for highway ramp merging, and convert the simulation images and data into textual descriptions of the scene to serve as part of the input for the Large Language Model. (2) Multi-agent interaction module:By meticulously analyzing the historical behaviors and real-time status information of other vehicles to infer their potential intentions, the intelligent agent can formulate a series of subsequent action plans, achieving implicit interactions similar to those of human drivers. (3) Multi-step planning module: build a three-layer progressive thinking chain of goal - plan - action, so that the LLM can reason and think step by step and layer for complex scenes, and get the final action decision. (4) Shared-memory module: Develop a unified vector database accessible to all Large Language Model (LLM) agents, ensuring consistency in experience and performance. This approach is akin to the parameter-sharing mechanism used among multiple agents in reinforcement learning, which achieves similar benefits. (5) Ranking-based reflection module: By employing specific metrics to quantify the safety and efficiency of the vehicle's state after each action decision is executed, the reflection agent reflects on and revises low-scoring erroneous decisions after the scenario concludes. Finally, the revised decisions, along with high-scoring ones, are stored in the shared memory module for collective learning and improvement.

Reasoning process

The multi-step planning module is a chain of thought(CoT) to guide LLM make the final action decision.LLM firstly analyzes the goal according to the current scenario, then makes the plan, and finally makes the action decision. This structured planning process enables the LLM agent to maintain a clear goal for its actions and more effectively pursue long-term goals. Based on the textual description of the current scene and the experiential playback of historical similar scenes, the LLM finally selects an action decision through continuous analysis and then returns the action decision to the environment module for execution.

BibTeX

@article{jiang2024koma, title={Koma: Knowledge-driven multi-agent framework for autonomous driving with large language models}, author={Jiang, Kemou and Cai, Xuan and Cui, Zhiyong and Li, Aoyong and Ren, Yilong and Yu, Haiyang and Yang, Hao and Fu, Daocheng and Wen, Licheng and Cai, Pinlong}, journal={IEEE Transactions on Intelligent Vehicles}, year={2024}, publisher={IEEE} }

KoMA: Knowledge-driven Multi-agent Framework for Autonomous Driving with Large Language Models

The scenario of merging onto the highway from an entrance ramp.

Abstract

Overview of the KoMA framework

Method Overview

Reasoning process

Other scenarios for testing generalization capabilities.

Decrease the number of lanes on the main roadway

Increase the number of lanes on the main roadway

Roundabout

BibTeX