Reinforcement Learning
This is a type of ML in which an agent learns to make decisions by interacting with an environment. The goal is for the agent to take actions that maximize a reward over time, commonly measured as a “cumulative sum”.
In RL, the agent doesn’t have a direct dataset to learn from; instead, it explores, makes decisions, receives feedback, and learn from the consequences of those actions. Let’s learn more, drawing on robotics!’
The works?
- Agent – The learner or decision maker.
- Environment – Everything the agent interacts with.
- Actions – The choices or decisions the agent makes.
- States – The situation of the agent within the environment.
- Rewards – The agent receives feedback on its actions, aiming to maximize long-term rewards.
- Policy – The agent’s strategy for action selection based on state.
- Value Function – A prediction of the future reward from a given state.
Markov Decision Process (MDP) framework is used to model decision-making in environments where outcomes are partly random and partly influenced by an agent’s actions. To understand, let’s consider the example of Ren – a data dog!

In the Real-World
1. Robotics (e.g., Boston Dynamics) showcases the application of RL in training robots for navigation and locomotion.
A notable example is Boston Dynamics’ use of RL to control the movement of robots like Atlas. This allows the robot to walk, jump, run, and even perform acrobatic feats. Through repeated attempts at tasks, the robot learns optimal movements by adjusting its behavior based on the success or failure of each action.
2. Locomotion & Navigation: RL can enable robots – such as quadrupeds and drones – to walk, run, or fly efficiently.
Since modeling physical dynamics precisely is challenging, RL learns adaptive policies that respond to real-world physics. For example, RL can be applied to autonomous drone navigation in cluttered environments.
3. Human-Robot Interaction (HRI): Where a dog can’t go, a robot may be able to.
Robots are being designed to adapt their behavior based on human feedback or actions, a key factor for personalizing responses and improving cooperation in environments such as homes or hospitals.
Other Areas:
The MDP framework works as the foundations for many reinforcement learning algorithms, including value based methods (i.e. estimate the value of states or state-action pairs) listed below. This is all important for decision making!
1. Q-learning a model-free reinforcement learning algorithm that estimates the expected rewards of actions in specific states using the Q-function, progressively refining its decisions to follow the optimal policy.

2. Policy Gradient methods optimize a parameterized policy by directly maximizing the expected reward through gradient ascent. Contrast this to Q-learning, which learns a value function and derives the policy from it.
3. A Deep Q Network (DQN) is an advanced extension of Q-learning that leverages deep learning to approximate Q-values in large state-action spaces. This eliminates the impracticality of maintaining a Q-table in traditional Q-learning. So, essentially, it replaces the bulky Q-table for a smarter function!

Get training @Yun.Bun I.O!
2 responses to “8”
-
This is a great overview of reinforcement learning, breaking down complex concepts like policies and value functions really helps clarify how agents learn from their environment. Using robotics as an example makes it even easier to understand how these ideas apply in real-world decision-making!
LikeLike
-
This actually helped me understand something I’ve always found super confusing. The way you broke it down with the game example made it way easier to follow. Still not an expert, but at least now it makes sense!
LikeLike
Leave a comment