Monte Carlo (MC) learning in reinforcement learning is a method for estimating the value of a policy by averaging the returns (or total rewards) following episodes of interaction with the environment. In MC learning, the agent interacts with the environment, records the sequence of states, actions, and rewards, and then updates the value estimates based on the actual returns from the episode.
Monte Carlo methods are particularly useful in problems where the environment is not fully observable at every step, and thus, the agent must rely on complete episodes of experience to make updates. The learning is performed by calculating the average return received after visiting a state or taking an action. This makes it a model-free method because it does not require any model of the environment.
For example, in a board game, after completing a game (an episode), MC learning would compute the total reward achieved and adjust the value estimates of states based on the outcome, without needing to know the exact dynamics of the game.