「The Rise and Potential of Large Language Model Based Agents: A Survey」を読んだメモ

2023 年 9 月の論文「The Rise and Potential of Large Language Model Based Agents: A Survey」を読んだメモです。

GitHub: https://github.com/WooooDyy/LLM-Agent-Paper-List

1 Introduction

AI エージェントの定義が以下のように書かれている。

Typically in AI, an agent refers to an artificial entity capable of perceiving its surroundings using sensors, making decisions, and then taking actions in response using actuators

ChatGPT (GPT-3.5) による翻訳は以下の通り。

典型的には、AIでは、エージェントはセンサーを使用して環境を感知し、意思決定を行い、アクチュエータを使用して反応的に行動を取る人工的な実体を指します。

The concept of agents originated in Philosophy

エージェントの概念の起源は哲学とのこと。

2 Background

2.1 Origin of AI Agent

In a general sense, an “agent” is an entity with the capacity to act

一般に、エージェントとは行動する能力を持つモノを指すとのこと。

While in a narrow sense, “agency” is usually used to refer to the performance of intentional actions; and correspondingly, the term “agent” denotes entities that possess desires, beliefs, intentions, and the ability to act

狭義ではエージェントは、欲望・信念・意図を持ち、行動する能力を持つモノを指すとのこと。

Importantly, the concept of an agent involves individual autonomy, granting them the ability to exercise volition, make choices, and take actions, rather than passively reacting to external stimuli.

重要な点は、個々に自律性を持っていて、意志をもって選択し行動することだとのこと。

2.2 Technological Trends in Agent Research

AI エージェントが以下のようにいくつかの方式で発展してきたことが書かれている。

Symbolic Agents
Reactive agents
Reinforcement learning-based agents
Agents with transfer learning and meta learning
Large language model-based agents

2.3 Why is LLM suitable as the primary component of an Agent’s brain?

LLM がエージェントの脳として適している理由が、以下の観点で説明されている。

Autonomy
Reactivity
Pro-activeness
Social ability

3 The Birth of An Agent: Construction of LLM-based Agents

Inspired by this, we present a general conceptual framework of an LLM-based agent composed of three key parts: brain, perception, and action (see Figure 2).

LLM ベースのエージェントのフレームワークとして、脳・知覚・アクションという構成を提案している。

これは LLM ベースのエージェントの基本構造として自分が考えていたものにかなり近い。

3.1 Brain

3.1.2 Knowledge

必要な知識を以下の 3 種類で整理している。

言語的な知識
常識的な知識
ドメインの専門知識

4 Agents in Practice: Harnessing AI for Good

Assist users in breaking free from daily tasks and repetitive labor, thereby Alleviating human work pressure and enhancing task-solving efficiency.

No longer necessitates users to provide explicit low-level instructions. Instead, the agent can independently analyze, plan, and solve problems.

After freeing users’ hands, the agent also liberates their minds to engage in exploratory and innovative work, realizing their full potential in cutting-edge scientific fields.

AI エージェントの達成すべき目標として、3 つの方向性が書かれている。

ChatGPT (GPT-3.5) による翻訳は以下のとおり。

ユーザーが日常のタスクや繰り返しの労働から解放されるのを支援し、人間の仕事の圧力を軽減し、タスク解決の効率を向上させます。

ユーザーが明示的な低レベルの指示を提供する必要がなくなります。代わりに、エージェントは独自に分析し、計画し、問題を解決できます。

ユーザーの手を解放した後、エージェントは彼らの心も解放し、先端科学分野での彼らの全能力を実現するために探求的で革新的な作業に従事できます。

4.2 Coordinating Potential of Multiple Agents

4.2.1 Cooperative Interaction for Complementarity

However, during MetaGPT’s practical exploration, a potential threat to multi-agent cooperation has been identified. Without setting corresponding rules, frequent interactions among multiple agents can amplify minor hallucinations indefinitely.

マルチエージェントが小さなハルシネーションを増幅してしまうケースがあったとのこと。

おわりに

最後まで読んだが、とくに最初のあたりの AI エージェントの歴史・定義であったり、基本構成のフレームワークが良かった。

この記事を SNS でシェアする