Skip to content

Regularisation

Regularisation is a commonly used way to reduce model variance. EloGrad supports L1 and L2 regularisation methods.

NOTE: EloGrad does not support regularisation for the entity weights - only additional regressors.

L1

L1 regularisation modifies the cost function to be optimised by

\[ \begin{equation} L_{\text{L1}} = L + \lambda \sum_i \left|w_i\right|, \end{equation} \]

where \(\lambda\) is the regularisation parameter and \(w_i\) are the model weights.

For both the logistic regression-based and Poisson regression-based Elo rating systems, it is straightforward to show that the update method becomes

\[ \begin{equation} \hat{r}^\prime_k=\hat{r}_k + k \left(y_{ij} - \mathbb{E}[y_{ij}|r_1,\cdots,r_n;\hat{r}_k]\right)x_k - k\lambda\,\text{sign}(\hat{r}_k), \end{equation} \]

where \(\text{sign}(x)\) is the sign function.

L2

L1 regularisation modifies the cost function to be optimised by

\[ \begin{equation} L_{\text{L1}} = L + \lambda \sum_i w_i^2. \end{equation} \]

For both the logistic regression-based and Poisson regression-based Elo rating systems, it is straightforward to show that the update method becomes

\[ \begin{equation} \hat{r}^\prime_k=\hat{r}_k + k \left(y_{ij} - \mathbb{E}[y_{ij}|r_1,\cdots,r_n;\hat{r}_k]\right)x_k - 2k\lambda\hat{r}_k. \end{equation} \]

Example

from elo_grad import EloEstimator, Regressor

# Input DataFrame with sorted index of Unix timestamps
# and columns entity_1 | entity_2 | score | home
# where score = 1 if player_1 won and score = 0 if
# player_2 won. In all games, entity_1 has home
# advantage, so home = 1 for all rows.
home_col = "home"
df = ...
estimator = EloEstimator(
    k_factor=20, 
    default_init_rating=1200,
    entity_cols=("player_1", "player_2"),
    score_col="result",
    # Set the initial rating for home advantage to 0
    init_ratings={home_col: (None, 0)},  
    # Set k-factor/step-size to 1 for the home advantage regressor
    # and set an L1 penalty with lambda = 0.1
    additional_regressors=[Regressor(name=home_col, k_factor=1, penalty="l1", lambda_reg=0.1)],
)
# Get expected scores
expected_scores = estimator.predict_proba(df)
# Get final ratings (of form (Unix timestamp, rating)) for home advantage
ratings = estimator.model.ratings[home_col]