A Markov Decision Process (MDP) is a mathematical framework used to model decision-making in reinforcement learning (RL). It provides a formal way to describe the environment, including the agent's states, actions, rewards, and transitions between states. An MDP is defined by five components:
- States (S): The possible situations or configurations the agent can find itself in.
- Actions (A): The set of actions the agent can take in each state.
- Transition function (T): The probability distribution over the next states, given the current state and action.
- Reward function (R): The immediate reward received after performing an action in a given state.
- Discount factor (γ): A factor that models the agent’s preference for receiving rewards sooner rather than later.
MDPs are essential in RL because they provide the structure for modeling problems where decisions are made sequentially over time, and future states depend only on the current state and action, not on past events (the Markov property).