- Voyager is the first llm powered embodied lifelong learning agneti n minecraft that continueously explores the world,.
- Three main components-
- Automatic Curriculum → maximizes exploration
- ever growing skill library → storing and retreivingg complex behaviors → how is it able to handle long context memory? → how is the information retreived, where are the skills getting stored? how effective is the search?
- iterative prompting mechanism → self verification, execution errors,, environment feedback
- interacts with GPT 4 via blackkbox queiries → bypasses the need for model parameter fine tuning → how?
- Strong incontext lifelong learning capapbility and exhibits exceptional proficiency in playing minecraft
- Faster than prior SOTA → how?
- Able to learn in new minecraft world from scratch

- Objective
- Propose suitable tasks based on its current skil llevel and world state
- Refine skills based on feedbacks
- Commit mastered skills ot memeory for future resuse in similar siutation
- Conitnually epxlore the world and seek tasks in self driven manner
- Tasks are porposed by the automatic curriculum
- Voyager keeps on checking if the action code produced by it is correct, on any error loggings it sees errors and correct it self, then self verification module confirms the task completion and adds the task to skill library
- Voyager demonstrates strong in context life long learning capabiltiies
Automatic Curriculum
- The input prompt to gpt 4 contains-
- Directives encouraging diverse behaviors and imposing constraints, such as
“My ultimate goal is to discover as many diverse things as possible
... The next task should not be too hard since I may not have the
necessary resources or have learned enough skills to complete it
yet.”;
- (2) The agent’s current state, including inventory, equipment, nearby blocks and entities, biome, time, health and hunger bars, and position;
- (3) Previously completed and failed tasks, reflecting the agent’s current exploration progress and capabilities frontier;
- (4) Additional context: We also leverage GPT-3.5 to self-ask questions based on the agent’s current state and exploration progress and self-answer questions. We opt to use GPT-3.5 instead of GPT-4 for standard NLP tasks due to budgetary considerations.
Skill Library
- The input prompt to GPT-4 consists of the following components:
- (1) Guidelines for code generation, such as “Your function will be reused
for building more complex functions. Therefore, you should make
it generic and reusable.”;
- (2) Control primitive APIs, and relevant skills retrieved from the skill library, which are crucial for in-context learning [36–38] to work well;
- (3) The generated code from the last round, environment feedback, execution errors, and critique, based on which GPT-4 can self-improve (Sec. 2.3);
- (4) The agent’s current state, including inventory, equipment, nearby blocks and entities, biome, time, health and hunger bars, and position;
- Chain of Thought prompting
- The program is iterateivyely refined through a novel iterative promptign mechanism in corporate it into the skill library as a new skill and index it by the embedding of its description. For skill retreival, we query the skill libraryw ith the embedding of self generated task aplans and envionment feedback
- Environment feedback, which illustrates the intermediate progress of program execution
(Fig. 5, left). For example, “I cannot make an iron chestplate because I need:
7 more iron ingots” highlights the cause of failure in crafting an iron chestplate. We use
bot.chat() inside control primitive APIs to generate environment feedback and prompt
GPT-4 to use this function as well during code generation;
- (2) Execution errors from the program interpreter that reveal any invalid operations or syntax
errors in programs, which are valuable for bug fixing (Fig. 5, right);
- (3) Self-verification for checking task success. Instead of manually coding success checkers for each new task proposed by the automatic curriculum, we instantiate another GPT-4 agent for self-verification. By providing VOYAGER’s current state and the task to GPT-4, we ask it to act as a critic [47–49] and inform us whether the program achieves the task. In addition, if the task fails, it provides a critique by suggesting how to complete the task (Fig. 6). Hence, our self-verification is more comprehensive than self-reflection [30] by both checking success and reflecting on mistakes → isnt adding too many gpt agents cost ineffective? how are we tackling the cost here?
Experiments
- Prompting algorithms followed
- ReAct [29] uses chain-of-thought prompting [46] by generating both reasoning traces and action plans with LLMs. We provide it with our environment feedback and the agent states as observations.
- Reflexion [30] is built on top of ReAct [29] with self-reflection to infer more intuitive future actions. We provide it with execution errors and our self-verification module.
- AutoGPT [28] is a popular software tool that automates NLP tasks by decomposing a high-level goal into multiple subgoals and executing them in a ReAct-style loop. We re-implement AutoGPT by using GPT-4 to do task decomposition and provide it with the agent states, environment feedback,
and execution errors as observations for subgoal execution. Compared with VOYAGER, AutoGPT lacks the skill library for accumulating knowledge, self-verification for assessing task success, and automatic curriculum for open-ended exploration.
- Voyager performs better with zero shot generalization to unseen tasks with the help of skill library and gpt4
Ablation Studies
- There are six design choices- automatic curriculum, skill library, environment feedback, execution errors, self-verification, and GPT-4 for code generation)