AI Agents
An AI agent is an autonomous entity that perceives its environment and takes actions to achieve specific goals. They can be software-based or even physical robots, and use artificial intelligence techniques to make decisions and learn.
Here are some key characteristics of AI agents:
Autonomy: They can operate independently without constant human intervention.
Perception: They can gather information from the environment through sensors (like cameras, microphones, or data feeds).
Action: They can take actions in the environment through actuators (like wheels, arms, or software outputs).
Goals: They have specific objectives they aim to achieve.
Learning: Some agents can learn and adapt their behavior based on experience.
When are AI Agents Used?
AI agents are used in a wide range of applications, from simple tasks like sorting emails to complex tasks like managing self-driving cars. Here are some common examples:
Customer service: Chatbots and virtual assistants answer customer inquiries, provide support, and complete transactions.
Recommendation systems: Recommend products, movies, or music based on your preferences.
Fraud detection: Analyze financial data to identify suspicious activity.
Personal assistants: Schedule appointments, manage calendars, and remind you of tasks.
Game playing: Play games against humans or other AI agents.
Robotics: Control robots for tasks like manufacturing, logistics, and healthcare.
Scientific research: Analyze data and make predictions in various fields.
Examples of AI Agents
Alexa: A virtual assistant that can answer questions, play music, control smart home devices, and more.
Roomba: A robotic vacuum cleaner that uses AI to navigate your home and avoid obstacles.
DeepMind AlphaFold: An AI program that predicts the 3D structure of proteins from their amino acid sequence.
and Google DeepMind announced SIMA, 13 March 2024
Tesla Autopilot: An advanced driver-assistance system that uses AI to steer, accelerate, and brake the car.
Duolingo: A language learning app that uses AI to personalize your learning experience.
AI Agent System overview (by Lilian Weng)
In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:
Planning
Subgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.
Reflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results.
Memory
Short-term memory: I would consider all the in-context learning (See Prompt Engineering) as utilizing short-term memory of the model to learn.
Long-term memory: This provides the agent with the capability to retain and recall (infinite) information over extended periods, often by leveraging an external vector store and fast retrieval.
Tool use
The agent learns to call external APIs for extra information that is missing from the model weights (often hard to change after pre-training), including current information, code execution capability, access to proprietary information sources and more.
The role of Functions (read on below the diagram)
The role of Functions (Google Whitepaper AI Agents, 2024)
Functions in the context of AI agents are self-contained modules of code that accomplish a specific task and can be reused as needed. A model can use a set of known functions and decide when to use each function and what arguments the function needs based on its specification.
Key aspects of functions:
Similarity to software engineering: Functions in AI agents operate similarly to functions in software engineering, but the model replaces the software developer in deciding when to use each function.
Model's role: A model can take a set of known functions and decide when to use each function and what arguments the function needs based on its specification [2].
Distinction from Extensions:
A model outputs a Function and its arguments, but doesn’t make a live API call.
Functions are executed on the client-side, while Extensions are executed on the agent-side.
Client-side execution: With Functions, the logic and execution of calling the actual API endpoint is offloaded away from the agent and back to the client-side application.
Developer control: Functions offer developers more granular control over the flow of data in the application.
Reasons to use Functions: Developers might choose to use Functions over Extensions for several reasons:
API calls need to be made at another layer of the application stack, outside of the direct agent architecture flow.
Security or Authentication restrictions prevent the agent from calling an API directly.
Timing or order-of-operations constraints prevent the agent from making API calls in real-time.
Additional data transformation logic needs to be applied to the API Response that the agent cannot perform.
The developer wants to iterate on agent development without deploying additional infrastructure for the API endpoints.
Use case example: A model can be used to invoke functions in order to handle complex, client-side execution flows for the end user, where the agent Developer might not want the language model to manage the API execution (as is the case with Extensions).
Fine-grained control: Functions offer a framework that empowers application developers with fine-grained control over data flow and system execution, while effectively leveraging the agent/model for critical input generation.
Selective data handling: Developers can selectively choose whether to keep the agent “in the loop” by returning external data, or omit it based on specific application architecture requirements.
References / Sources:
Google Whitepaper, 'Agents' - September 2024
LLM Powered Autonomous Agents - Lil'Log Author Author: Lilian Weng, June 2023