Chapter 5: Ridge regularization

Ridge regularization is an alternative to LASSO. It minimizes the L2-Norm instead of the L1-Norm, which means that the weights are maintained under control but not necessarily to zero. This offers a smoother alternative to LASSO, as the modelled function might still need to be high-dimensional in certain areas.

Ridge regularization and its gradient are defined as follows:

class Ridge:

    def __init__(self, _lambda):
        self._lambda = _lambda

    def __call__(self, theta):

        return self._lambda * 0.5 * np.sum(theta**2)

    def gradient(self, theta):

        return self._lambda * theta

The regularization is controlled by a factor typically referred to as Lambda. If Lambda equals 0, there is no regularization, otherwise, the larger the value of Lambda, the more the regularization term will impact the loss function.

As it can be seen on the animation above. Ridge regularization is “smoother”, as it maintains some of the complexity of the function, whilst tuning it down.