The Bellman optimality equation is a key equation in reinforcement learning that defines the value of a state under the optimal policy. It expresses the value of a state as the maximum expected return achievable by taking the best action from that state, considering future states.
The equation is written as: ( V^*(s) = \max_a \left( R(s, a) + \gamma \sum_{s'} P(s'