# weight decay

$$\mathbf{x}{t+1}=\mathbf{x}{t}-\alpha \nabla f_{t}(\mathbf{x}_t)$$

## weight decay

In the weight decay described by Hanson & Pratt (1988),
the weights $\mathbf{x}$ decay exponentially as
$$\mathbf{x}{t+1}=(1-w) \mathbf{x}{t}-\alpha \nabla f_{t}(\mathbf{x}_t)$$

where $w$ defines the rate of the weight decay per step and
$\nabla f_{t}(\mathbf{x}_t)$ is the $t$-th batch gradient to be multiplied by a learning rate $\alpha$.

## L2 regularization VS weight decay

Commonly, we

$$f_{t}^{reg}(\mathbf{x}{t}) = f{t}(\mathbf{x}_t)+\frac{w}{2} {\left | \mathbf{x}t \right |}{2}^{2}$$

1.