All Packages Class Hierarchy This Package Previous Next Index
Class EDU.gatech.cc.is.learning.i_QLearner_id
java.lang.Object
|
+----EDU.gatech.cc.is.learning.i_ReinforcementLearner_id
|
+----EDU.gatech.cc.is.learning.i_QLearner_id
- public class i_QLearner_id
- extends i_ReinforcementLearner_id
- implements Cloneable, Serializable
An object that learns to select from several actions based on
a reward. Uses the Q-learning method as defined by Watkins.
The module will learn to select a discrete output based on
state and a continuous reinforcement input. The "i"s in front
of and behind the name imply that this class takes integers as
input and output. The "d" indicates a double for the reinforcement
input (i.e. a continuous value).
Copyright
(c)1997 Georgia Tech Research Corporation
- Version:
- $Revision: 1.5 $
- Author:
- Tucker Balch (tucker@cc.gatech.edu)
-
AVERAGE
- Used to indicate the learner uses average rewards.
-
DISCOUNTED
- Used to indicate the learner uses discounted rewards.
-
i_QLearner_id(int, int)
- Instantiate a Q learner using default parameters.
-
i_QLearner_id(int, int, int)
- Instantiate a Q learner using default parameters.
-
i_QLearner_id(int, int, int, long)
- Instantiate a Q learner using default parameters.
-
endTrial(double, double)
- Called when the current trial ends.
-
getAvgReward()
- Report the average reward per step in the trial.
-
getPolicyChanges()
- Report the number of policy changes in the trial.
-
getQueries()
- Report the number of queries in the trial.
-
initTrial(int)
- Called to initialize for a new trial.
-
query(int, double)
- Select an output based on the state and reward.
-
readPolicy()
- Read the policy from a file.
-
savePolicy()
- Write the policy to a file.
-
saveProfile(String)
- Write the policy profile to a file.
-
setAlpha(double)
- Set alpha for the Q-learner.
-
setGamma(double)
- Set gamma for the Q-learner.
-
setRandomRate(double)
- Set the random rate for the Q-learner.
-
setRandomRateDecay(double)
- Set the random decay for the Q-learner.
-
toString()
- Generate a String that describes the current state of the
learner.
AVERAGE
public static final int AVERAGE
- Used to indicate the learner uses average rewards.
DISCOUNTED
public static final int DISCOUNTED
- Used to indicate the learner uses discounted rewards.
i_QLearner_id
public i_QLearner_id(int numstatesin,
int numactionsin,
int criteriain,
long seedin)
- Instantiate a Q learner using default parameters.
Parameters may be adjusted using accessor methods.
- Parameters:
- numstates - int, the number of states the system could be in.
- numactions - int, the number of actions or outputs to
select from.
- criteria - int, should be DISCOUNTED or AVERAGE.
- seed - long, the seed.
i_QLearner_id
public i_QLearner_id(int numstatesin,
int numactionsin,
int criteriain)
- Instantiate a Q learner using default parameters.
This version assumes you will use a seed of 0.
Parameters may be adjusted using accessor methods.
- Parameters:
- numstates - int, the number of states the system could be in.
- numactions - int, the number of actions or outputs to
select from.
- criteria - int, should be DISCOUNTED or AVERAGE.
i_QLearner_id
public i_QLearner_id(int numstatesin,
int numactionsin)
- Instantiate a Q learner using default parameters.
This version assumes you will use discounted rewards.
Parameters may be adjusted using accessor methods.
- Parameters:
- numstates - int, the number of states the system could be in.
- numactions - int, the number of actions or outputs to
select from.
setGamma
public void setGamma(double g)
- Set gamma for the Q-learner.
This is the discount rate, 0.8 is typical value.
It should be between 0 and 1.
- Parameters:
- g - double, the new value for gamma (0 < g < 1).
setAlpha
public void setAlpha(double a)
- Set alpha for the Q-learner.
This reflects how quickly it should learn.
Alpha should be between 0 and 1.
- Parameters:
- a - double, the new value for alpha (0 < a < 1).
setRandomRate
public void setRandomRate(double r)
- Set the random rate for the Q-learner.
This reflects how frequently it picks a random action.
Should be between 0 and 1.
- Parameters:
- r - double, the new value for random rate (0 < r < 1).
setRandomRateDecay
public void setRandomRateDecay(double r)
- Set the random decay for the Q-learner.
This reflects how quickly the rate of chosing random actions
decays. 1 would never decay, 0 would cause it to immediately
quit chosing random values.
Should be between 0 and 1.
- Parameters:
- r - double, the new value for randomdecay (0 < r < 1).
toString
public String toString()
- Generate a String that describes the current state of the
learner.
- Returns:
- a String describing the learner.
- Overrides:
- toString in class i_ReinforcementLearner_id
query
public int query(int yn,
double rn)
- Select an output based on the state and reward.
- Parameters:
- statein - int, the current state.
- rewardin - double, reward for the last output, positive
numbers are "good."
- Overrides:
- query in class i_ReinforcementLearner_id
endTrial
public void endTrial(double Vn,
double rn)
- Called when the current trial ends.
- Parameters:
- Vn - double, the value of the absorbing state.
- reward - double, the reward for the last output.
- Overrides:
- endTrial in class i_ReinforcementLearner_id
initTrial
public int initTrial(int s)
- Called to initialize for a new trial.
- Overrides:
- initTrial in class i_ReinforcementLearner_id
getAvgReward
public double getAvgReward()
- Report the average reward per step in the trial.
- Returns:
- the average.
- Overrides:
- getAvgReward in class i_ReinforcementLearner_id
getQueries
public int getQueries()
- Report the number of queries in the trial.
- Returns:
- the total.
- Overrides:
- getQueries in class i_ReinforcementLearner_id
getPolicyChanges
public int getPolicyChanges()
- Report the number of policy changes in the trial.
- Returns:
- the total.
- Overrides:
- getPolicyChanges in class i_ReinforcementLearner_id
readPolicy
public void readPolicy() throws IOException
- Read the policy from a file.
- Parameters:
- filename - String, the name of the file to read from.
- Overrides:
- readPolicy in class i_ReinforcementLearner_id
savePolicy
public void savePolicy() throws IOException
- Write the policy to a file.
- Parameters:
- filename - String, the name of the file to write to.
- Overrides:
- savePolicy in class i_ReinforcementLearner_id
saveProfile
public void saveProfile(String profile_filename) throws IOException
- Write the policy profile to a file.
- Parameters:
- filename - String, the name of the file to write to.
All Packages Class Hierarchy This Package Previous Next Index