A new Q-learning algorithm based on the Metropolis criterion

MZ Guo, Y Liu, Jacek Malec

    Forskningsoutput: TidskriftsbidragArtikel i vetenskaplig tidskriftPeer review

    Sammanfattning

    The balance between exploration and exploitation is one of the key problems of action selection in Q-learning. Pure exploitation causes the agent to reach the locally optimal policies quickly, whereas excessive exploration degrades the performance of the Q-learning algorithm even if it may accelerate the learning process and allow avoiding the locally optimal policies. In this paper, finding the optimum policy in Q-learning is de scribed as search for the optimum solution in combinatorial optimization. The Metropolis criterion of simulated annealing algorithm is introduced in order to balance exploration and exploitation of Q-learning, and the modified Q-learning algorithm based on this criterion, SA-Q-learning, is presented. Experiments show that SA-Q-learning converges more quickly than Q-learning or Boltzmann exploration, and that the search does not suffer of performance degradation due to excessive exploration.
    Originalspråkengelska
    Sidor (från-till)2140-2143
    TidskriftIEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
    Volym34
    Nummer5
    DOI
    StatusPublished - 2004

    Ämnesklassifikation (UKÄ)

    • Datavetenskap (Datalogi)

    Fingeravtryck

    Utforska forskningsämnen för ”A new Q-learning algorithm based on the Metropolis criterion”. Tillsammans bildar de ett unikt fingeravtryck.

    Citera det här