queens.stochastic_optimizers package#

Stochastic optimizers.

Modules containing stochastic optimizers.

Submodules#

queens.stochastic_optimizers.adam module#

Adam optimizer.

class Adam(learning_rate, optimization_type, rel_l1_change_threshold, rel_l2_change_threshold, clip_by_l2_norm_threshold=inf, clip_by_value_threshold=inf, max_iteration=1000000.0, beta_1=0.9, beta_2=0.999, eps=1e-08, learning_rate_decay=None)[source]#

Bases: StochasticOptimizer

Adam stochastic optimizer [1].

References

[1] Kingma and Ba. “Adam: A Method for Stochastic Optimization”. ICLR 2015. 2015.

beta_1#

\(\beta_1\) parameter as described in [1].

Type:

float

beta_2#

\(\beta_2\) parameter as described in [1].

Type:

float

m#

Exponential average of the gradient.

Type:

ExponentialAveragingObject

v#

Exponential average of the gradient momentum.

Type:

ExponentialAveragingObject

eps#

Nugget term to avoid a division by values close to zero.

Type:

float

scheme_specific_gradient(gradient)[source]#

Adam gradient computation.

Parameters:

gradient (np.array) – Gradient

Returns:

gradient (np.array) – Adam gradient

queens.stochastic_optimizers.adamax module#

Adamax optimizer.

class Adamax(learning_rate, optimization_type, rel_l1_change_threshold, rel_l2_change_threshold, clip_by_l2_norm_threshold=inf, clip_by_value_threshold=inf, max_iteration=1000000.0, beta_1=0.9, beta_2=0.999, eps=1e-08, learning_rate_decay=None)[source]#

Bases: StochasticOptimizer

Adamax stochastic optimizer [1]. eps added to avoid division by zero.

References

[1] Kingma and Ba. “Adam: A Method for Stochastic Optimization”. ICLR 2015. 2015.

beta_1#

\(\beta_1\) parameter as described in [1].

Type:

float

beta_2#

\(\beta_2\) parameter as described in [1].

Type:

float

m#

Exponential average of the gradient.

Type:

ExponentialAveragingObject

u#

Maximum gradient momentum.

Type:

np.array

eps#

Nugget term to avoid a division by values close to zero.

Type:

float

scheme_specific_gradient(gradient)[source]#

Adamax gradient computation.

Parameters:

gradient (np.array) – Gradient

Returns:

gradient (np.array) – Adam gradient

queens.stochastic_optimizers.learning_rate_decay module#

Learning rate decay for stochastic optimization.

class DynamicLearningRateDecay(alpha=0.1, rho_min=1.0)[source]#

Bases: LearningRateDecay

Dynamic learning rate decay.

alpha#

Decay factor

Type:

float

rho_min#

Threshold for signal-to-noise ratio

Type:

float

k_min#

Minimum number of iterations before learning rate is decreased

Type:

int

k#

Iteration number

Type:

int

a#

Sum of parameters

Type:

np.array

b#

Sum of squared parameters

Type:

np.array

c#

Sum of parameters times iteration number

Type:

np.array

class LearningRateDecay[source]#

Bases: object

Base class for learning rate decay.

class LogLinearLearningRateDecay(slope)[source]#

Bases: LearningRateDecay

Log linear learning rate decay.

slope#

Logarithmic slope

Type:

float

iteration#

Current iteration

Type:

int

class StepwiseLearningRateDecay(decay_factor, decay_interval)[source]#

Bases: LearningRateDecay

Step-wise learning rate decay.

decay_factor#

Decay factor

Type:

float

decay_interval#

Decay interval

Type:

int

iteration#

Iteration number

Type:

int

queens.stochastic_optimizers.rms_prop module#

RMSprop optimizer.

class RMSprop(learning_rate, optimization_type, rel_l1_change_threshold, rel_l2_change_threshold, clip_by_l2_norm_threshold=inf, clip_by_value_threshold=inf, max_iteration=1000000.0, beta=0.999, eps=1e-08, learning_rate_decay=None)[source]#

Bases: StochasticOptimizer

RMSprop stochastic optimizer [1].

References

[1] Tieleman and Hinton. “Lecture 6.5-rmsprop: Divide the gradient by a running average of

its recent magnitude”. Coursera. 2012.

beta#

\(\beta\) parameter as described in [1].

Type:

float

v#

Exponential average of the gradient momentum.

Type:

ExponentialAveragingObject

eps#

Nugget term to avoid a division by values close to zero.

Type:

float

scheme_specific_gradient(gradient)[source]#

Rmsprop gradient computation.

Parameters:

gradient (np.array) – Gradient

Returns:

gradient (np.array) – RMSprop gradient

queens.stochastic_optimizers.sgd module#

SGD optimizer.

class SGD(learning_rate, optimization_type, rel_l1_change_threshold, rel_l2_change_threshold, clip_by_l2_norm_threshold=inf, clip_by_value_threshold=inf, max_iteration=1000000.0, learning_rate_decay=None)[source]#

Bases: StochasticOptimizer

Stochastic gradient descent optimizer.

scheme_specific_gradient(gradient)[source]#

SGD gradient computation.

Parameters:

gradient (np.array) – Gradient

Returns:

gradient (np.array) – SGD gradient

queens.stochastic_optimizers.stochastic_optimizer module#

Stochastic optimizer.

class StochasticOptimizer(learning_rate, optimization_type, rel_l1_change_threshold, rel_l2_change_threshold, clip_by_l2_norm_threshold=inf, clip_by_value_threshold=inf, max_iteration=1000000.0, learning_rate_decay=None)[source]#

Bases: object

Base class for stochastic optimizers.

The optimizers are implemented as generators. This increases the modularity of this class, since an object can be used in different settings. Some examples:

  • Example 1: Simple optimization run (does not strongly benefit from its generator nature):
    1. Define a gradient function gradient()

    2. Create a optimizer object optimizer with the gradient function gradient

    3. Run the optimization by optimizer.run_optimization() in your script

  • Example 2: Adding additional functionality during the optimization
    1. Define a optimizer object using a gradient function.

    2. Example code snippet:

      for parameters in optimizer:

      rel_l2_change_params=optimizer.rel_l2_change

      iteration=optimizer.iteration

      # Verbose output

      print(f”Iter {iteration}, parameters {parameters}, rel L2 change “

      f”{rel_l2_change:.2f}”)

      # Some additional condition to stop optimization

      if self.number_of_simulations >= 1000:

      break

  • Example 3: Running multiple optimizer iteratively sequentially:
    1. Define optimizer1 and optimizer2 with different gradient functions

    2. Example code:

      while not done_bool:

      if not optimizer1.done:

      self.parameters1=next(optimizer1)

      if not optimizer2.done:

      self.parameters2=next(optimizer2)

      # Example on how to reduce the learning rate for optimizer2

      if optimizer2.iteration % 1000 == 0:

      optimizer2.learning_rate *= 0.5

      done_bool = optimizer1.done and optimizer2.done

learning_rate#

Learning rate for the optimizer.

Type:

float

clip_by_l2_norm_threshold#

Threshold to clip the gradient by L2-norm.

Type:

float

clip_by_value_threshold#

Threshold to clip the gradient components.

Type:

float

max_iteration#

Maximum number of iterations.

Type:

int

precoefficient#

Is 1 in case of maximization and -1 for minimization.

Type:

int

rel_l1_change_threshold#

If the L1 relative change in parameters falls below this value, this criterion catches.

Type:

float

rel_l2_change_threshold#

If the L2 relative change in parameters falls below this value, this criterion catches.

Type:

float

iteration#

Number of iterations done in the optimization so far.

Type:

int

done#

True if the optimization is done.

Type:

bool

rel_l2_change#

Relative change in L2-norm of variational params w.r.t. the previous iteration.

Type:

float

rel_l1_change#

Relative change in L1-norm of variational params w.r.t. the previous iteration.

Type:

float

current_variational_parameters#

Variational parameters.

Type:

np.array

current_gradient_value#

Current gradient vector w.r.t. the variational parameters.

Type:

np.array

gradient#

Function to compute the gradient.

Type:

function

learning_rate_decay#

Object to schedule learning rate decay

Type:

LearningRateDecay

clip_gradient(gradient)[source]#

Clip the gradient by value and then by norm.

Parameters:

gradient (np.array) – Current gradient

Returns:

gradient (np.array) – The clipped gradient

do_single_iteration(gradient)[source]#

Single iteration for a given gradient.

Iteration step for a given gradient \(g\):

\(p^{(i+1)}=p^{(i)}+\beta \alpha g\)

where \(\beta=-1\) for minimization and +1 for maximization and \(\alpha\) is the learning rate.

Parameters:

gradient (np.array) – Current gradient

run_optimization()[source]#

Run the optimization.

Returns:

np.array – variational parameters

abstract scheme_specific_gradient(gradient)[source]#

Scheme specific gradient computation.

Here the gradient is transformed according to the desired stochastic optimization approach.

Parameters:

gradient (np.array) – Current gradient

set_gradient_function(gradient_function)[source]#

Set the gradient function.

The gradient_function has to be a function of the parameters and returns the gradient value.

Parameters:

gradient_function (function) – Gradient function.

clip_by_l2_norm(gradient, l2_norm_threshold=1000000.0)[source]#

Clip gradients by L2-norm.

Parameters:
  • gradient (np.array) – Gradient

  • l2_norm_threshold (float) – Clipping threshold

Returns:

gradient (np.array) – Clipped gradients

clip_by_value(gradient, threshold=1000000.0)[source]#

Clip gradients by value.

Clips if the absolute value op the component is larger than the threshold.

Parameters:
  • gradient (np.array) – Gradient

  • threshold (float) – Threshold to clip

Returns:

gradient (np.array) – Clipped gradients