queens.stochastic_optimizers package#

Stochastic optimizers.

Modules containing stochastic optimizers.

Submodules#

queens.stochastic_optimizers.adam module#

Adam optimizer.

class Adam[source]#

Bases: StochasticOptimizer

Adam stochastic optimizer [1].

References

[1] Kingma and Ba. “Adam: A Method for Stochastic Optimization”. ICLR 2015. 2015.

beta_1#

\(\beta_1\) parameter as described in [1].

Type:

float

beta_2#

\(\beta_2\) parameter as described in [1].

Type:

float

m#

Exponential average of the gradient.

Type:

ExponentialAveragingObject

v#

Exponential average of the gradient momentum.

Type:

ExponentialAveragingObject

eps#

Nugget term to avoid a division by values close to zero.

Type:

float

__init__(learning_rate, optimization_type, rel_l1_change_threshold, rel_l2_change_threshold, clip_by_l2_norm_threshold=inf, clip_by_value_threshold=inf, max_iteration=1000000.0, beta_1=0.9, beta_2=0.999, eps=1e-08, learning_rate_decay=None)[source]#

Initialize optimizer.

Parameters:
  • learning_rate (float) – Learning rate for the optimizer

  • optimization_type (str) – “max” in case of maximization and “min” for minimization

  • rel_l1_change_threshold (float) – If the L1 relative change in parameters falls below this value, this criterion catches.

  • rel_l2_change_threshold (float) – If the L2 relative change in parameters falls below this value, this criterion catches.

  • clip_by_l2_norm_threshold (float) – Threshold to clip the gradient by L2-norm

  • clip_by_value_threshold (float) – Threshold to clip the gradient components

  • max_iteration (int) – Maximum number of iterations

  • beta_1 (float) – \(beta_1\) parameter as described in [1]

  • beta_2 (float) – \(beta_1\) parameter as described in [1]

  • eps (float) – Nugget term to avoid a division by values close to zero

  • learning_rate_decay (LearningRateDecay) – Object to schedule learning rate decay

scheme_specific_gradient(gradient)[source]#

Adam gradient computation.

Parameters:

gradient (np.array) – Gradient

Returns:

gradient (np.array) – Adam gradient

queens.stochastic_optimizers.adamax module#

Adamax optimizer.

class Adamax[source]#

Bases: StochasticOptimizer

Adamax stochastic optimizer [1]. eps added to avoid division by zero.

References

[1] Kingma and Ba. “Adam: A Method for Stochastic Optimization”. ICLR 2015. 2015.

beta_1#

\(\beta_1\) parameter as described in [1].

Type:

float

beta_2#

\(\beta_2\) parameter as described in [1].

Type:

float

m#

Exponential average of the gradient.

Type:

ExponentialAveragingObject

u#

Maximum gradient momentum.

Type:

np.array

eps#

Nugget term to avoid a division by values close to zero.

Type:

float

__init__(learning_rate, optimization_type, rel_l1_change_threshold, rel_l2_change_threshold, clip_by_l2_norm_threshold=inf, clip_by_value_threshold=inf, max_iteration=1000000.0, beta_1=0.9, beta_2=0.999, eps=1e-08, learning_rate_decay=None)[source]#

Initialize optimizer.

Parameters:
  • learning_rate (float) – Learning rate for the optimizer

  • optimization_type (str) – “max” in case of maximization and “min” for minimization

  • rel_l1_change_threshold (float) – If the L1 relative change in parameters falls below this value, this criterion catches.

  • rel_l2_change_threshold (float) – If the L2 relative change in parameters falls below this value, this criterion catches.

  • clip_by_l2_norm_threshold (float) – Threshold to clip the gradient by L2-norm

  • clip_by_value_threshold (float) – Threshold to clip the gradient components

  • max_iteration (int) – Maximum number of iterations

  • beta_1 (float) – \(beta_1\) parameter as described in [1]

  • beta_2 (float) – \(beta_1\) parameter as described in [1]

  • eps (float) – Nugget term to avoid a division by values close to zero

  • learning_rate_decay (LearningRateDecay) – Object to schedule learning rate decay

scheme_specific_gradient(gradient)[source]#

Adamax gradient computation.

Parameters:

gradient (np.array) – Gradient

Returns:

gradient (np.array) – Adam gradient

queens.stochastic_optimizers.learning_rate_decay module#

Learning rate decay for stochastic optimization.

class DynamicLearningRateDecay[source]#

Bases: LearningRateDecay

Dynamic learning rate decay.

alpha#

Decay factor

Type:

float

rho_min#

Threshold for signal-to-noise ratio

Type:

float

k_min#

Minimum number of iterations before learning rate is decreased

Type:

int

k#

Iteration number

Type:

int

a#

Sum of parameters

Type:

np.array

b#

Sum of squared parameters

Type:

np.array

c#

Sum of parameters times iteration number

Type:

np.array

__init__(alpha=0.1, rho_min=1.0)[source]#

Initialize DynamicLearningRateDecay.

Parameters:
  • alpha (float) – Decay factor

  • rho_min (float) – Threshold for signal-to-noise ratio

class LearningRateDecay[source]#

Bases: object

Base class for learning rate decay.

class LogLinearLearningRateDecay[source]#

Bases: LearningRateDecay

Log linear learning rate decay.

slope#

Logarithmic slope

Type:

float

iteration#

Current iteration

Type:

int

__init__(slope)[source]#

Initialize LogLinearLearningRateDecay.

Parameters:

slope (float) – Logarithmic slope

class StepwiseLearningRateDecay[source]#

Bases: LearningRateDecay

Step-wise learning rate decay.

decay_factor#

Decay factor

Type:

float

decay_interval#

Decay interval

Type:

int

iteration#

Iteration number

Type:

int

__init__(decay_factor, decay_interval)[source]#

Initialize StepwiseLearningRateDecay.

Parameters:
  • decay_factor (float) – Decay factor

  • decay_interval (int) – Decay interval

queens.stochastic_optimizers.rms_prop module#

RMSprop optimizer.

class RMSprop[source]#

Bases: StochasticOptimizer

RMSprop stochastic optimizer [1].

References

[1] Tieleman and Hinton. “Lecture 6.5-rmsprop: Divide the gradient by a running average of

its recent magnitude”. Coursera. 2012.

beta#

\(\beta\) parameter as described in [1].

Type:

float

v#

Exponential average of the gradient momentum.

Type:

ExponentialAveragingObject

eps#

Nugget term to avoid a division by values close to zero.

Type:

float

__init__(learning_rate, optimization_type, rel_l1_change_threshold, rel_l2_change_threshold, clip_by_l2_norm_threshold=inf, clip_by_value_threshold=inf, max_iteration=1000000.0, beta=0.999, eps=1e-08, learning_rate_decay=None)[source]#

Initialize optimizer.

Parameters:
  • learning_rate (float) – Learning rate for the optimizer

  • optimization_type (str) – “max” in case of maximization and “min” for minimization

  • rel_l1_change_threshold (float) – If the L1 relative change in parameters falls below this value, this criterion catches.

  • rel_l2_change_threshold (float) – If the L2 relative change in parameters falls below this value, this criterion catches.

  • clip_by_l2_norm_threshold (float) – Threshold to clip the gradient by L2-norm

  • clip_by_value_threshold (float) – Threshold to clip the gradient components

  • max_iteration (int) – Maximum number of iterations

  • beta (float) – \(beta\) parameter as described in [1]

  • eps (float) – Nugget term to avoid a division by values close to zero

  • learning_rate_decay (LearningRateDecay) – Object to schedule learning rate decay

scheme_specific_gradient(gradient)[source]#

Rmsprop gradient computation.

Parameters:

gradient (np.array) – Gradient

Returns:

gradient (np.array) – RMSprop gradient

queens.stochastic_optimizers.sgd module#

SGD optimizer.

class SGD[source]#

Bases: StochasticOptimizer

Stochastic gradient descent optimizer.

__init__(learning_rate, optimization_type, rel_l1_change_threshold, rel_l2_change_threshold, clip_by_l2_norm_threshold=inf, clip_by_value_threshold=inf, max_iteration=1000000.0, learning_rate_decay=None)[source]#

Initialize optimizer.

Parameters:
  • learning_rate (float) – Learning rate for the optimizer

  • optimization_type (str) – “max” in case of maximization and “min” for minimization

  • rel_l1_change_threshold (float) – If the L1 relative change in parameters falls below this value, this criterion catches.

  • rel_l2_change_threshold (float) – If the L2 relative change in parameters falls below this value, this criterion catches.

  • clip_by_l2_norm_threshold (float) – Threshold to clip the gradient by L2-norm

  • clip_by_value_threshold (float) – Threshold to clip the gradient components

  • max_iteration (int) – Maximum number of iterations

  • learning_rate_decay (LearningRateDecay) – Object to schedule learning rate decay

scheme_specific_gradient(gradient)[source]#

SGD gradient computation.

Parameters:

gradient (np.array) – Gradient

Returns:

gradient (np.array) – SGD gradient