Note
Click here to download the full example code
Advanced example¶
Let’s present how one can specify different aspects of the problem formulation and model selection strategy on classo, using synthetic data.
Import the package¶
import sys, os
from os.path import join, dirname, abspath
classo_dir = dirname(dirname(abspath("__file__")))
sys.path.append(classo_dir)
from classo import classo_problem, random_data
import numpy as np
Generate the data¶
This code snippet generates a problem instance with sparse ß in dimension d=100 (sparsity d_nonzero=5). The design matrix X comprises n=100 samples generated from an i.i.d standard normal distribution. The dimension of the constraint matrix C is d x k matrix. The noise level is σ=0.5. The input zerosum=True implies that C is the all-ones vector and Cß=0. The n-dimensional outcome vector y and the regression vector ß is then generated to satisfy the given constraints. One can then see the parameters that should be selected.
Create labels¶
This code snoppet creates labels that indicate where the solution ß should be nonzero.
Define the classo instance¶
Next we can define a default c-lasso problem instance with the generated data:
Change the parameters¶
Let’s see some example of change in the parameters
problem.formulation.huber = True
problem.formulation.concomitant = False
problem.formulation.intercept = True
problem.model_selection.CV = True
problem.model_selection.LAMfixed = True
problem.model_selection.StabSelparameters.method = "max"
problem.model_selection.CVparameters.seed = 1
problem.model_selection.LAMfixedparameters.rescaled_lam = True
problem.model_selection.LAMfixedparameters.lam = 0.1
Check parameters¶
You can look at the generated problem instance by typing:
print(problem)
Out:
FORMULATION: R2
MODEL SELECTION COMPUTED:
Cross Validation
Stability selection
Lambda fixed
LAMBDA FIXED PARAMETERS:
numerical_method = not specified
rescaled lam : True
threshold : average of the absolute value of beta
lam = 0.1
CROSS VALIDATION PARAMETERS:
numerical_method : not specified
one-SE method : True
Nsubset = 5
lamin = 0.001
Nlam = 80
with log-scale
STABILITY SELECTION PARAMETERS:
numerical_method : not specified
method : max
B = 50
q = 10
percent_nS = 0.5
threshold = 0.7
lamin = 0.01
Nlam = 50
Solve optimization problems¶
We use stability selection as default model selection strategy.
The command also allows you to inspect the computed stability profile for all variables at the theoretical λ. Two other model selections are computed here: computation of the solution for a fixed lambda; a path computation followed by a computation of the Approximation of the Leave-one Out error (ALO); a k-fold cross-validation.
problem.solve()
Visualisation¶
After completion, the results of the optimization and model selection routines can be visualized using
print(problem.solution)
Out:
LAMBDA FIXED :
Selected variables : intercept 12 157 181
Running time : 0.038s
CROSS VALIDATION :
Intercept : 1.0068530209340394
Selected variables : 12 136 150 157 181
Running time : 1.054s
STABILITY SELECTION :
Selected variables : intercept 12 157 181
Running time : 4.23s
R1 formulation with ALO¶
problem.data.label = labels
problem.formulation.intercept = False
problem.formulation.huber = False
problem.model_selection.ALO = True
problem.model_selection.CV = False
problem.model_selection.LAMfixed = False
problem.solve()
print(problem)
print(problem.solution)
Out:
FORMULATION: R1
MODEL SELECTION COMPUTED:
ALO
Stability selection
ALO PARAMETERS:
numerical_method : Path-Alg
lamin = 0.001
Nlam = 80
STABILITY SELECTION PARAMETERS:
numerical_method : Path-Alg
method : max
B = 50
q = 10
percent_nS = 0.5
threshold = 0.7
lamin = 0.01
Nlam = 50
ALO COMPUTATION :
Selected variables : n n n y n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n y n n n n n n y n n n
Running time : 0.23s
STABILITY SELECTION :
Selected variables : y n y y
Running time : 2.981s
Total running time of the script: ( 0 minutes 10.409 seconds)