Advanced example

Let’s present how one can specify different aspects of the problem formulation and model selection strategy on classo, using synthetic data.

Import the package

import sys, os
from os.path import join, dirname, abspath

classo_dir = dirname(dirname(abspath("__file__")))
sys.path.append(classo_dir)

from classo import classo_problem, random_data
import numpy as np

Generate the data

This code snippet generates a problem instance with sparse ß in dimension d=100 (sparsity d_nonzero=5). The design matrix X comprises n=100 samples generated from an i.i.d standard normal distribution. The dimension of the constraint matrix C is d x k matrix. The noise level is σ=0.5. The input zerosum=True implies that C is the all-ones vector and Cß=0. The n-dimensional outcome vector y and the regression vector ß is then generated to satisfy the given constraints. One can then see the parameters that should be selected.

m, d, d_nonzero, k, sigma = 100, 200, 5, 1, 0.5
(X, C, y), sol = random_data(
    m, d, d_nonzero, k, sigma, zerosum=True, seed=1, intercept=1.0
)

Create labels

This code snoppet creates labels that indicate where the solution ß should be nonzero.

labels = np.empty(d, dtype=str)
for i in range(d):
    if sol[i] == 0.0:
        labels[i] = "no_" + str(i)
    else:
        labels[i] = "yes_" + str(i)

Define the classo instance

Next we can define a default c-lasso problem instance with the generated data:

problem = classo_problem(X, y, C)

Check parameters

You can look at the generated problem instance by typing:

print(problem)

Out:

FORMULATION: R2

MODEL SELECTION COMPUTED:
     Cross Validation
     Stability selection
     Lambda fixed

LAMBDA FIXED PARAMETERS:
     numerical_method = not specified
     rescaled lam : True
     threshold : average of the absolute value of beta
     lam = 0.1

CROSS VALIDATION PARAMETERS:
     numerical_method : not specified
     one-SE method : True
     Nsubset = 5
     lamin = 0.001
     Nlam = 80
     with log-scale

STABILITY SELECTION PARAMETERS:
     numerical_method : not specified
     method : max
     B = 50
     q = 10
     percent_nS = 0.5
     threshold = 0.7
     lamin = 0.01
     Nlam = 50

Solve optimization problems

We use stability selection as default model selection strategy.

The command also allows you to inspect the computed stability profile for all variables at the theoretical λ. Two other model selections are computed here: computation of the solution for a fixed lambda; a path computation followed by a computation of the Approximation of the Leave-one Out error (ALO); a k-fold cross-validation.

problem.solve()

Visualisation

After completion, the results of the optimization and model selection routines can be visualized using

print(problem.solution)
  • Coefficients at $\lambda$ = 0.1
  • k-fold cross-validation profile
  • Refitted coefficients after CV model selection
  • Stability selection profile of type max using R2
  • Refitted coefficients after stability selection

Out:

LAMBDA FIXED :
  Selected variables :  intercept    12    157    181
  Running time :  0.038s

CROSS VALIDATION :
   Intercept : 1.0068530209340394
  Selected variables :  12    136    150    157    181
  Running time :  1.054s

STABILITY SELECTION :
  Selected variables :  intercept    12    157    181
  Running time :  4.23s

R1 formulation with ALO

  • Coefficients across $\lambda$-path using R1
  • ALO profile
  • Refitted coefficients after ALO model selection
  • Stability selection profile of type max using R1
  • Refitted coefficients after stability selection

Out:

FORMULATION: R1

MODEL SELECTION COMPUTED:
     ALO
     Stability selection

ALO PARAMETERS:
     numerical_method : Path-Alg
     lamin = 0.001
     Nlam = 80


STABILITY SELECTION PARAMETERS:
     numerical_method : Path-Alg
     method : max
     B = 50
     q = 10
     percent_nS = 0.5
     threshold = 0.7
     lamin = 0.01
     Nlam = 50


 ALO COMPUTATION :
   Selected variables :  n    n    n    y    n    n    n    n    n    n    n    n    n    n    n    n    n    n    n    n    n    n    n    n    n    n    n    n    n    n    n    n    n    n    n    n    n    n    n    n    n    n    n    n    y    n    n    n    n    n    n    y    n    n    n
   Running time :  0.23s

 STABILITY SELECTION :
   Selected variables :  y    n    y    y
   Running time :  2.981s

Total running time of the script: ( 0 minutes 10.409 seconds)

Gallery generated by Sphinx-Gallery