The Epsilon Greedy Algorithm - a Performance Review |
| ( Volume 6 issue 9,September 2020 ) OPEN ACCESS |
| Author(s): |
Riti Agarwal |
| Keywords: |
|
exploration, exploitation, regret, reward function, local maxima. |
| Abstract: |
|
Multi-Armed Bandit (MAB) is a class of reinforcement learning algorithms. A multi-armed bandit implementation has a agent (learner) that chooses between k different uncertain actions and receives a reward based on the chosen action. This paper focuses mainly on the Epsilon Greedy Algorithm in comparison to Thompson Sampling and UCB-1 (Upper Confidence Bound). It talks about the benefits of using bandit algorithms over A/B testing and evaluates the effectiveness of the 3 main solutions. It experimentally shows the best use cases for the Epsilon Greedy Algorithm - when the experimentation period is longer than that of A/B testing and you want to exploit the best performing variant. It also talks about when the algorithm does not provide statistically correct results - when the sample size, on each path of the experiment, is very small. |
| Paper Statistics: |
| Cite this Article: |
| Click here to get all Styles of Citation using DOI of the article. |

Call for Paper