All Packages  Class Hierarchy  This Package  Previous  Next  Index

Class EDU.gatech.cc.is.learning.i_QLearner_id

java.lang.Object
   |
   +----EDU.gatech.cc.is.learning.i_ReinforcementLearner_id
           |
           +----EDU.gatech.cc.is.learning.i_QLearner_id

public class i_QLearner_id
extends i_ReinforcementLearner_id
implements Cloneable, Serializable
An object that learns to select from several actions based on a reward. Uses the Q-learning method as defined by Watkins.

The module will learn to select a discrete output based on state and a continuous reinforcement input. The "i"s in front of and behind the name imply that this class takes integers as input and output. The "d" indicates a double for the reinforcement input (i.e. a continuous value).

Copyright (c)1997 Georgia Tech Research Corporation

Version:
$Revision: 1.5 $
Author:
Tucker Balch (tucker@cc.gatech.edu)

Variable Index

 o AVERAGE
Used to indicate the learner uses average rewards.
 o DISCOUNTED
Used to indicate the learner uses discounted rewards.

Constructor Index

 o i_QLearner_id(int, int)
Instantiate a Q learner using default parameters.
 o i_QLearner_id(int, int, int)
Instantiate a Q learner using default parameters.
 o i_QLearner_id(int, int, int, long)
Instantiate a Q learner using default parameters.

Method Index

 o endTrial(double, double)
Called when the current trial ends.
 o getAvgReward()
Report the average reward per step in the trial.
 o getPolicyChanges()
Report the number of policy changes in the trial.
 o getQueries()
Report the number of queries in the trial.
 o initTrial(int)
Called to initialize for a new trial.
 o query(int, double)
Select an output based on the state and reward.
 o readPolicy()
Read the policy from a file.
 o savePolicy()
Write the policy to a file.
 o saveProfile(String)
Write the policy profile to a file.
 o setAlpha(double)
Set alpha for the Q-learner.
 o setGamma(double)
Set gamma for the Q-learner.
 o setRandomRate(double)
Set the random rate for the Q-learner.
 o setRandomRateDecay(double)
Set the random decay for the Q-learner.
 o toString()
Generate a String that describes the current state of the learner.

Variables

 o AVERAGE
 public static final int AVERAGE
Used to indicate the learner uses average rewards.

 o DISCOUNTED
 public static final int DISCOUNTED
Used to indicate the learner uses discounted rewards.

Constructors

 o i_QLearner_id
 public i_QLearner_id(int numstatesin,
                      int numactionsin,
                      int criteriain,
                      long seedin)
Instantiate a Q learner using default parameters. Parameters may be adjusted using accessor methods.

Parameters:
numstates - int, the number of states the system could be in.
numactions - int, the number of actions or outputs to select from.
criteria - int, should be DISCOUNTED or AVERAGE.
seed - long, the seed.
 o i_QLearner_id
 public i_QLearner_id(int numstatesin,
                      int numactionsin,
                      int criteriain)
Instantiate a Q learner using default parameters. This version assumes you will use a seed of 0. Parameters may be adjusted using accessor methods.

Parameters:
numstates - int, the number of states the system could be in.
numactions - int, the number of actions or outputs to select from.
criteria - int, should be DISCOUNTED or AVERAGE.
 o i_QLearner_id
 public i_QLearner_id(int numstatesin,
                      int numactionsin)
Instantiate a Q learner using default parameters. This version assumes you will use discounted rewards. Parameters may be adjusted using accessor methods.

Parameters:
numstates - int, the number of states the system could be in.
numactions - int, the number of actions or outputs to select from.

Methods

 o setGamma
 public void setGamma(double g)
Set gamma for the Q-learner. This is the discount rate, 0.8 is typical value. It should be between 0 and 1.

Parameters:
g - double, the new value for gamma (0 < g < 1).
 o setAlpha
 public void setAlpha(double a)
Set alpha for the Q-learner. This reflects how quickly it should learn. Alpha should be between 0 and 1.

Parameters:
a - double, the new value for alpha (0 < a < 1).
 o setRandomRate
 public void setRandomRate(double r)
Set the random rate for the Q-learner. This reflects how frequently it picks a random action. Should be between 0 and 1.

Parameters:
r - double, the new value for random rate (0 < r < 1).
 o setRandomRateDecay
 public void setRandomRateDecay(double r)
Set the random decay for the Q-learner. This reflects how quickly the rate of chosing random actions decays. 1 would never decay, 0 would cause it to immediately quit chosing random values. Should be between 0 and 1.

Parameters:
r - double, the new value for randomdecay (0 < r < 1).
 o toString
 public String toString()
Generate a String that describes the current state of the learner.

Returns:
a String describing the learner.
Overrides:
toString in class i_ReinforcementLearner_id
 o query
 public int query(int yn,
                  double rn)
Select an output based on the state and reward.

Parameters:
statein - int, the current state.
rewardin - double, reward for the last output, positive numbers are "good."
Overrides:
query in class i_ReinforcementLearner_id
 o endTrial
 public void endTrial(double Vn,
                      double rn)
Called when the current trial ends.

Parameters:
Vn - double, the value of the absorbing state.
reward - double, the reward for the last output.
Overrides:
endTrial in class i_ReinforcementLearner_id
 o initTrial
 public int initTrial(int s)
Called to initialize for a new trial.

Overrides:
initTrial in class i_ReinforcementLearner_id
 o getAvgReward
 public double getAvgReward()
Report the average reward per step in the trial.

Returns:
the average.
Overrides:
getAvgReward in class i_ReinforcementLearner_id
 o getQueries
 public int getQueries()
Report the number of queries in the trial.

Returns:
the total.
Overrides:
getQueries in class i_ReinforcementLearner_id
 o getPolicyChanges
 public int getPolicyChanges()
Report the number of policy changes in the trial.

Returns:
the total.
Overrides:
getPolicyChanges in class i_ReinforcementLearner_id
 o readPolicy
 public void readPolicy() throws IOException
Read the policy from a file.

Parameters:
filename - String, the name of the file to read from.
Overrides:
readPolicy in class i_ReinforcementLearner_id
 o savePolicy
 public void savePolicy() throws IOException
Write the policy to a file.

Parameters:
filename - String, the name of the file to write to.
Overrides:
savePolicy in class i_ReinforcementLearner_id
 o saveProfile
 public void saveProfile(String profile_filename) throws IOException
Write the policy profile to a file.

Parameters:
filename - String, the name of the file to write to.

All Packages  Class Hierarchy  This Package  Previous  Next  Index