The 🤗 AI Agents Course: A review
As a data scientist or LLM enthusiast, you hear and read about agents everywhere now. Unfortunately, not everyone has the same idea when they talk about agents… Put less mildly: ). When concepts are unclear, building a deep understanding by taking a course is often the best thing to do — to deepen your own knowlegde and thoughts on the topic. Are you planning to take the and wondering what to expect? Or did you take it and are you wondering what other participants got out of it? In that case I hope you enjoy this review of unit 1. Looking forward to your comments, experiences and reflections!
I ❤️🤗
As a data science & AI practitioner, I love Huggingface for what they have contributed to the field over the past few years. Their role in the speed in which the community is able to learn and build on each others’ work cannot be overestimated. And they keep on adding!
Therefore, as a huge 🤗 fan, I also immediately joined the waiting list for the AI Agents Course the minute it was introduced. I’ve been building pipelines with LLMs for some time now, and i run into more and more examples where making things more ‘agentic’ is the logical next step. There are dozens of examples out there to build upon, but first gathering a strong knowledge base is in my experience always a good idea. By understanding the language and concepts, you can better communicate with peers and ask informed questions, to them or to copilot. Aside from that, I really enjoyed the ride of most of the Coursera (early days), Udemy and deeplearning.ai courses I took over the years. So bring it on!
In the next sections I will go through the contents of the unit 1, in which most important concepts around agents are introduced, and briefly discuss my key takeaways and experiences.
What is an Agent?
One of the main reasons for participating in this course is to get a firm conceptual understanding of what defines and Agent and what it is not. As the (gen)AI field is developing so fastly and so many people have joined the data science field in recent years, some common ground on concepts is essential to understand what pioneers and peers mean when they talk about AI agents. Everything is called AI nowadays and I (regretfully) sense the same is happening to Agents… I kept on asking myself: When does a solution or pipeline that includes an LLM become an actual agent? And when is it not that agentic so better not call it that? The helps to make a better distinction.
An agent is an AI model capable of reasoning, planning, and interacting with its environment.
This simple but complete definition tells the story on AI agents. The emphasis is on the so called agency of it: the ability of the agent to interact with the environment. The ability to interact is reflected in execution of actions the agent can perform, often via external tools. The analogy of The Brain — to think and plan — and The Body — to (inter)act — helps in understanding the essence of AI Agents. In short, Large Language Models (LLMs) are mainly used as the AI model that represents the brain — although alternatives like Vision Language Models (VLMs) are available. And depending on the tools we equip our agent with — the body —the sky is the limit in how our agent can interact with the environment.
An Agent can perform any task we implement via Tools to complete Actions.
For those who have been using LLMs for some time now, this is the essence of what agents bring to the table: Ability to interact with the environment via Tools. An isolated LLM can reason and can answer based on information in its training data, but it is limited to that knowledge and has no means to interact with the environment, not without an Agent (a body).
The next section of unit 1 is on . It’s a nice, dense intro or refresher on LLMs — the encoder/decoder models, tokenization, the attention mechanism… Some clear visualisations and links to other 🤗 resources for more details. Towards building an agent, it is important to understand how llms are instructed via different types of messages, like system messages and user messages and how conversation history is treated.
Details on messages are covered in the next section of unit 1 on that are used to help the LLM distinguish between text that comes from the user and text that contains generic instructions or history. The essence is that each time ‘the Brain’ is used (the LLM gets an input to complete), it should receive input that contains all the information needed to complete the text and respond: The brain has no memory! Put differently, the LLM doesn’t keep track of what it was asked before, the new input text to complete therefore has to contain all information needed to answer correctly. Special tokens and templates are crucial in helping the llm distinguish between the generic task it has, the conversation history to build upon and the user input provided.
And specifically for Agents, the information the LLM receives about the Tools that the agent has availabe to use, is crucial. This information needs to be presented to the LLM via the system message. How to do so is the topic of the next section of unit 1: In short:
A Tool is a function given to the LLM
Tools come in many forms. Valuable tools do something that the LLM could not do on its own: it needs to complement the LLM — use information the LLM could not have been trained on (up-to-date data, internal sources) or performs tasks an LLM is not the best at (calculations, queries, generate images,…). It is vital that the LLM receives clear textual information about the tool: a generic description, something to call (like a function), arguments and typings of inputs and outputs. As the LLM can only generate text as output, the LLM needs to understand when to use which of the available tools, how to prepare the input for the tool and what to do with the output.
Although so far in the course there has been some example python code, the next part was for me the first piece of code I carefully inspected:
@tool
def calculator(a: int, b: int) -> int:
"""Multiply two integers."""
return a * b
print(calculator.to_string())
# Tool Name: calculator, Description: Multiply two integers.,
# Arguments: a: int, b: int, Outputs: int
What are we looking at? With the @tool
decorator a Tool is defined that the LLM can use. The .to_string()
method shows what this decorator enables: translating a (well defined) function into a string that tells the LLM what the function does, how to call it and what to expect to come out of it. A more advanced way to achieve this — by defining a class Tool:
is also discussed. It gives more flexibility but does the same.
Thought: ReAct vs Thought- Action-Observation
Now that we know how to instruct the Agent (LLM) to use external Tools, the next section of unit 1 elaborates on the continuous agents operate in. Thought is about deciding on the next steps, Action concerns the Tool actions to perform and Observation regards processing the output of the model.
What was somewhat confusing to me at first, was the introduction of the ReAct (Reasoning-Acting) approach in the discussion of the Thought-Action-Observation cycle: So first we look at the agent operating in Think-Act-Observe cycles, now within that a Reasoning-Acting approach was introduced. I’m probably not the only one who got a bit confused by the apparent similarities (in terms Though-Action-Observation <> Reasoning-Acting). As second glance it becomes clearer: the ReAct approach is relevant within the Thought step. It encourages the Brain (LLM) not to jump to conclusions but think carefully, if possible break down the task in steps. This approach helps overcome the problem of ‘greedy’ LLMs that jump to conclusions too fast, by creating time for the LLM to think more carefully. As such, ReAct boosts the Thought step to come up with a better plan to turn into Actions and Observations. Most recent models like deepseek and openai’s o1 model have similar reasoning strategies incorporated in the model to boost performance.
Actions and three (or two?) types of Agents
The next cycle step concerns the essence of agents - Actions:
Actions are the concrete steps an AI agent takes to interact with its environment.
Immediately a distinction is made between different types of agent (action)s — JSON agents, code agents and function-calling agents— which makes things a bit more complex. It’s debatable whether this distinction really helps — conceptually its suffices to conclude there are numerous ways agents can interact with the environment and dependent on how they are set up, actions are triggered by providing tools with formatted JSON input, by generating code by the LLM that needs to be executed or with function input generated by the LLM.
If this distinction is not immediately clear to you: It also wasn’t for me… In the rest of the section it became a bit clearer: More generally speaking, there are two types of agents: those that generate input for an exernal tool (JSON formatted or otherwise suitable for the receiving function) or those that generate ‘an executable code block — typically in a high-level language like Python.’ The latter type is referred to as Code Agents, the former JSON Agents / Function calling Agents. In short: ‘Actions bridge an agent’s internal reasoning and its real-world interactions by executing clear, structured tasks — whether through JSON, code, or function calls.’
Observations — as actions have consequences
The last step and last section of unit 1 before an example is shown, is about the third Thought-Action-Observation-step: Observations. ‘Observations are how an Agent perceives the consequences of its actions.’ In this step, the Agent (1) collects the output of the action, (2) adds results to the existing information and (3) adapts the strategy for subsequent thoughts and actions. The Agent decides whether additional information — and therefore a new cycle run — is needed or if it’s ready to provide a final answer.
The first unit commences with a tedious example — as the authors named it themselves — just to show that building agents is complex without using one of the many frameworks that make it much more feasible to build one. After that a first real example is shown, using the smolagents
library to build your own agent. It’s a nice way to see a first simple Agent in action and to get familiar with thesmolagents
library, with some predefined tools available and also to experience how to add a custom tool. All of this is a first exploration, getting you ready to go into much more detail of designing your own Agents in the units to follow. Unit 1 is concluded with the Quiz that checks your understanding on unit 1. Don’t worry, its not that difficult. If you value the certificate, make sure you download and store it at the end — i dont think you can download it later without retaking the quiz.
Unit 1: Wrapping it up
In total, this first unit contains many valuable concepts that are important to understand when working with agents. For several reasons, it is advisable to go through this unit at your own pace, and revisit the material once more after you got to the end of it. First of all, many concepts are discussed and many come with interesting links to other courses or sources. Secondly, at least for me, the relationship between some of the concepts discussed was not inmmediately clear. Cycles have nested approaches, action types and agent types are intertwined and the distinction between JSON Agents, Code agents and function calling Agents was not very intuitive at first, the same goes for the tool decorators vs Classes. For me, things fell in their place much better the second time I read through unit 1 — it prepared me much better for the units to come.