Monte Carlo methods in reinforcement learning are used to estimate the value of states or state-action pairs based on sample returns from episodes. These methods rely on averaging the returns observed after taking an action and following the policy until the end of the episode.
Monte Carlo methods are particularly useful for problems where the environment is episodic, meaning it consists of sequences of actions leading to a terminal state. The key advantage is that they don't require bootstrapping (like TD methods), so they can handle more complex environments where bootstrapping might not be practical.
The main limitation of Monte Carlo methods is that they require complete episodes to make updates, which can be inefficient if the environment doesn't have well-defined episodes or if the agent must wait for a long time before reaching a terminal state. Nonetheless, they are foundational in RL for tasks such as policy evaluation and policy improvement.