Click Here for
Track Your Paper

International Journal of New Technology and Research

Impact Factor 3.953

(An ISO 9001:2008 Certified Online Journal)
India | Germany | France | Japan

The Epsilon Greedy Algorithm - a Performance Review

( Volume 6 issue 9,September 2020 ) OPEN ACCESS

Riti Agarwal


exploration, exploitation, regret, reward function, local maxima.


Multi-Armed Bandit (MAB) is a class of reinforcement learning algorithms. A multi-armed bandit implementation has a agent (learner) that chooses between k different uncertain actions and receives a reward based on the chosen action. This paper focuses mainly on the Epsilon Greedy Algorithm in comparison to Thompson Sampling and UCB-1 (Upper Confidence Bound). It talks about the benefits of using bandit algorithms over A/B testing and evaluates the effectiveness of the 3 main solutions. It experimentally shows the best use cases for the Epsilon Greedy Algorithm - when the experimentation period is longer than that of A/B testing and you want to exploit the best performing variant. It also talks about when the algorithm does not provide statistically correct results - when the sample size, on each path of the experiment, is very small.

Paper Statistics:

Total View : 129 | Downloads : 120 | Page No: 01-03 |

Cite this Article:
Click here to get all Styles of Citation using DOI of the article.