Basic example

Let’s present what classo does when using its default parameters on synthetic data.

Import the package

import sys, os
from os.path import dirname, abspath

classo_dir = dirname(dirname(abspath("__file__")))
sys.path.append(classo_dir)
from classo import classo_problem, random_data
import numpy as np

Generate the data

This code snippet generates a problem instance with sparse ß in dimension d=100 (sparsity d_nonzero=5). The design matrix X comprises n=100 samples generated from an i.i.d standard normal distribution. The dimension of the constraint matrix C is d x k matrix. The noise level is σ=0.5. The input zerosum=True implies that C is the all-ones vector and Cß=0. The n-dimensional outcome vector y and the regression vector ß is then generated to satisfy the given constraints.

m, d, d_nonzero, k, sigma = 100, 200, 5, 1, 0.5
(X, C, y), sol = random_data(m, d, d_nonzero, k, sigma, zerosum=True, seed=1)

Remark : one can see the parameters that should be selected :

print(np.nonzero(sol))

Out:

(array([ 12, 157, 178, 181, 185]),)

Define the classo instance

Next we can define a default c-lasso problem instance with the generated data:

problem = classo_problem(X, y, C)

Check parameters

You can look at the generated problem instance by typing:

print(problem)

Out:

FORMULATION: R3

MODEL SELECTION COMPUTED:
     Stability selection

STABILITY SELECTION PARAMETERS:
     numerical_method : not specified
     method : first
     B = 50
     q = 10
     percent_nS = 0.5
     threshold = 0.7
     lamin = 0.01
     Nlam = 50

Solve optimization problems

We only use stability selection as default model selection strategy. The command also allows you to inspect the computed stability profile for all variables at the theoretical λ

problem.solve()

Visualisation

After completion, the results of the optimization and model selection routines can be visualized using

print(problem.solution)
  • Stability selection profile of type first using R3
  • Refitted coefficients after stability selection

Out:

STABILITY SELECTION :
  Selected variables :  12    157    181
  Running time :  0.717s

Total running time of the script: ( 0 minutes 1.478 seconds)

Gallery generated by Sphinx-Gallery