DNN Tree Search for Bayesian Reinforcement Learning to Machine Intelligence
Anil Kumar Yadav1, Ajay Kumar Sachan2
1Dr. Ajay Kumar Sachan, had completed his Ph.D in Department of (Computer Science & Engineering) from Rajeev Gandhi Technical University, Bhopal, India.
2Anil Kumar Yadav, received the B.Tech in Department of (CSE), UPTU and M.Tech (I.T), SATI, RGPV, Bhopal, India.
Manuscript received on November 02, 2014. | Revised Manuscript received on November 04, 2014. | Manuscript published on November 05, 2014. | PP: 1-3 | Volume-4 Issue-5, November 2014. | Retrieval Number: D2321094414/2014©BEIESP
Open Access | Ethics and Policies | Cite
© The Authors. Published By: Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Bayesian model-based reinforcement learning can be formulated as a partially observable Markova decision process (POMDP) to provide a principled framework for optimally balancing exploitation and exploration. Then, a POMDP solver can be used to solve the problem. If the prior distribution over the environment’s dynamics is a product of dirichlet distributions, the POMDP’s optimal value function can be represented using a set of multivariate polynomials. Unfortunately, the size of the polynomials grows exponentially with the problem horizon . During machine learning agent required lots of training inputs of execution cycle. Due to this situation look up table contain huge amount of data base. In this paper, we observe the use of dynamic neural network tree search (DNNTS) algorithm for large POMDPs, to solve the Bayesian reinforcement learning problem. The keen idea of DNN tree search is to train agent and act as a NN classifier to help agent for taking self decision without prior knowledge of the system during data learning .We will show that such an algorithm successfully searches for a near-optimal policy and achieve goal. Experiments show that the used DNN methods improve performance of Bayesian reinforcement learning in the context of training episodes, reward and discount rate.
Keywords: Bayesian reinforcement learning, machine learning, DNN tree search, POMDP