Structure of problem instance¶

The package is organized as follow :

There is a main class called classo_problem, that contains a lot of information about the problem, and once the problem is solved, it will also contains the solution.

Here is the global structure of the problem instance:

A classo_problem instance contains a Data instance, a Formulation instance, a Model_selection instance and a Solution instance.

A Model_selection instance contains the instances : PATHparameters, CVparameters, StabSelparameters, LAMfixedparameters.

A Solution instance, once is computed, contains the instances : solution_PATH, solution_CV, solution_StabSel, solution_LAMfixed.

Classes

`classo_problem`(X, y[, C, Tree, label])	Class that contains all the information about the problem.
`classo_problem.solve`()	Method that solves every model required in the attributes of the problem instance and update the attribute `solution` with the characteristics of the solution.
`Data`(X, y, C[, Tree, label])	Class that contains the data of the problem ie where matrices and labels are stored.
`Formulation`()	Class that contains the information about the formulation of the problem namely, the type of formulation (R1, R2, R3, R4, C1, C2) and its parameters like rho, the weigths and the presence of an intercept.
`Model_selection`([method])	Class that contains information about the model selections to perform.
`PATHparameters`([method])	Class that contains the parameters to compute the lasso-path.
`CVparameters`([method])	Class that contains the parameters to compute the cross-validation.
`StabSelparameters`([method])	Class that contains the parameters to compute the stability selection.
`LAMfixedparameters`([method])	Class that contains the parameters to compute the lasso for a fixed lambda.
`Solution`()	Class that contains characteristics of the solution of the model_selections that are computed Before using the method `solve()` , its componant are empty/null.
`solution_PATH`(matrices, param, formulation, …)	Class that contains characteristics of the lasso-path computed, which also contains representation method that plot the graphic of this lasso-path.
`solution_ALO`(matrices, param, formulation, …)	Class that contains characteristics of the lasso-path computed, which also contains representation method that plot the graphic of this lasso-path.
`solution_CV`(matrices, param, formulation, …)	Class that contains characteristics of the cross validation computed, which also contains a representation method that plot the selected parameters and the solution of the not-sparse problem on the selected variables set.
`solution_CV.graphic`([se_max, save, …])	Method to plot the graphic showing mean squared error over along lambda path once cross validation is computed.
`solution_StabSel`(matrices, param, …)	Class that contains characteristics of the stability selection computed, which also contains a representation method that plot the selected parameters, the solution of the not-sparse problem on the selected variables set, and the stability plot.
`solution_LAMfixed`(matrices, param, …)	Class that contains characteristics of the lasso computed which also contains a representation method that plot this solution.

Class classo_problem¶

class classo.solver.classo_problem(X, y, C=None, Tree=None, label=None)¶

Class that contains all the information about the problem. It also has a representation method so one can print it.

Parameters

X (ndarray) – Matrix representing the data of the problem.
y (ndarray) – Vector representing the output of the problem.
C (str or ndarray, optional) – Matrix of constraints to the problem. If it is ‘zero-sum’ then the corresponding attribute will be all-one matrix. Default value : ‘zero-sum’
label (list,optional) – list of the labels of each variable. If None, then label are just indices. Default value : None

data¶

object containing the data (matrices) of the problem. Namely : X, y, C and the labels.

Type: Data

formulation¶

object containing the info about the formulation of the minimization problem we solve.

Type: Formulation

model_selection¶

object containing the parameters we need to do variable selection.

Type: Model_selection

solution¶

object giving caracteristics of the solution of the model_selection that is asked. Before using the method solve() , its componant are empty/null.

Type: Solution

numerical_method¶

name of the numerical method that is used, it can be : ‘Path-Alg’ (path algorithm) , ‘P-PDS’ (Projected primal-dual splitting method) , ‘PF-PDS’ (Projection-free primal-dual splitting method) or ‘DR’ (Douglas-Rachford-type splitting method). Default value : ‘not specified’, which means that the function choose_numerical_method() will choose it accordingly to the formulation.

Type: str

classo_problem.solve()¶: Method that solves every model required in the attributes of the problem instance and update the attribute solution with the characteristics of the solution.

classo.solver.choose_numerical_method(method, model, formulation, StabSelmethod=None, lam=None)¶

Annex function in order to choose the right numerical method, if the given one is invalid. In general, it will choose one of the possible optimization scheme for a given formulation. When several computation modes are possible, the rules are as follow :

If possible, always use “Path-Alg”, except for fixed lambdas smaller than 0.05 and for R4 where Path-Alg does not compute the path (paradoxically).

Else, it uses “DR”.

Parameters

method (str) – input method that is possibly wrong and should be changed.
the method is valid for this formulation (If) –
will not be changed. (it) –
model (str) – Computation mode. Can be “PATH”, “StabSel”, “CV” or “LAM”.
formulation (Formulation) – object containing the info about the formulation of the minimization problem we solve.
StabSelmethod (str, optional) – if model is “StabSel”, it can be “first” , “lam” or “max”.
lam (float, optional) – value of lam (fractional L1 penalty).

Returns :: str : method that should be used. Can be “Path-Alg”, “DR”, “P-PDS” or “PF-PDS”

Class Data¶

class classo.solver.Data(X, y, C, Tree=None, label=None)¶

Class that contains the data of the problem ie where matrices and labels are stored.

Parameters

X (ndarray) – Matrix representing the data of the problem.
y (ndarray) – Vector representing the output of the problem.
C (str or array, optional) – Matrix of constraints to the problem. If it is ‘zero-sum’ then the corresponding attribute will be all-one matrix.
label (list, optional) – list of the labels of each variable. If None, then labels are juste the indices. Default value : None
Tree (skbio.TreeNode, optional) – taxonomic tree, if not None, then the matrices X and C and the labels will be changed.

X¶

Matrix representing the data of the problem.

Type: ndarray

y¶

Vector representing the output of the problem.

Type: ndarray

C¶

Matrix of constraints to the problem. If it is ‘zero-sum’ then the corresponding attribute will be all-one matrix.

Type: str or array, optional

label¶

list of the labels of each variable. If None, then labels are juste the indices.

Type: list

tree¶

taxonomic tree.

Type: skbio.TreeNode or None

Class Formulation¶

class classo.solver.Formulation¶

Class that contains the information about the formulation of the problem namely, the type of formulation (R1, R2, R3, R4, C1, C2) and its parameters like rho, the weigths and the presence of an intercept. The type of formulation is encoded with boolean huber concomitant and classification with the rule:

False False False = R1

True False False = R2

False True False = R3

True True False = R4

False False True = C1

True False True = C2

It also has a representation method so one can print it.

huber¶

True if the formulation of the problem should be robust. Default value : False

Type: bool

concomitant¶

True if the formulation of the problem should be with an M-estimation of sigma. Default value : True

Type: bool

classification¶

True if the formulation of the problem should be classification (if yes, then it will not be concomitant). Default value : False

Type: bool

rho¶

Value of rho for R2 and R4 formulations. Default value : 1.345

Type: float

scale_rho¶

If set to True, it will become rho * sqrt( mean( y**2 ) ) while solving the problem so that it lives on the scale of y and also usefull so that we don’t have the problem with the non strict convexity (i.e. at least one sample is on the quadratic mode of the huber loss function) as long as rho is higher than one. Default value : True

Type: bool

rho_scaled¶

Actual rho after solving Default value : Not defined

Type: float

rho_classification¶

value of rho for huberized hinge loss function for classification ie C2 (it has to be strictly smaller then 1). Default value : -1.

Type: float

e¶

value of e in concomitant formulation. If ‘n/2’ then it becomes n/2 during the method solve(), same for ‘n’. Default value : ‘n’ if huber formulation ; ‘n/2’ else

Type: float or string

w¶

array of size d with the weights of the L1 penalization. This has to be positive. Default value : None (which makes it the 1,…,1 vector)

Type: numpy ndarray

intercept¶

set to true if we should use an intercept. Default value : False

Type: bool

Class Model_selection¶

class classo.solver.Model_selection(method='not specified')¶

Class that contains information about the model selections to perform. It contains boolean that states which one will be computed. It also contains objects that contain parameters of each computation modes. It also has a representation method so one can print it.

PATH¶

True if path should be computed. Default value : False

Type: bool

PATHparameters¶

object containing parameters to compute the lasso-path.

Type: PATHparameters

ALO¶

True if path should be computed. Default value : False

Type: bool

ALOparameters¶

object containing parameters to compute the ALO for c-lasso.

Type: ALOparameters

CV¶

True if Cross Validation should be computed. Default value : False

Type: bool

CVparameters¶

object containing parameters to compute the cross-validation.

Type: CVparameters

StabSel¶

True if Stability Selection should be computed. Default value : True

Type: boolean

StabSelparameters¶

object containing parameters to compute the stability selection.

Type: StabSelparameters

LAMfixed¶

True if solution for a fixed lambda should be computed. Default value : False

Type: boolean

LAMfixedparameters¶

object containing parameters to compute the lasso for a fixed lambda.

Type: LAMfixedparameters

Classes used in Model_selection¶

class classo.solver.PATHparameters(method='not specified')¶

Class that contains the parameters to compute the lasso-path. It also has a representation method so one can print it.

numerical_method¶

name of the numerical method that is used, it can be : ‘Path-Alg’ (path algorithm) , ‘P-PDS’ (Projected primal-dual splitting method), ‘PF-PDS’ (Projection-free primal-dual splitting method) or ‘DR’ (Douglas-Rachford-type splitting method). Default value : ‘not specified’, which means that the function choose_numerical_method() will choose it accordingly to the formulation

Type: str

n_active¶

if it is higher than 0, then the algo stops computing the path when n_active variables are active.

Type: int

Then the solution does not change from this point.: Default value : 0

lambdas¶

list of rescaled lambdas for computing lasso-path. Default value : None, which means line space between 1 and lamin and Nlam points, with logarithm scale or not depending on logscale.

Type: numpy.ndarray

Nlam¶

number of points in the lambda-path if lambdas is still None (default). Default value : 80

Type: int

lamin¶

lambda minimum if lambdas is still None (default). Default value : 1e-3

Type: float

logscale¶

when lambdas is set to None (default), this parameters tells if it should be set with log scale or not. Default value : True

Type: bool

plot_sigma¶

if True then the representation method of the solution will also plot the sigma-path if it is computed (formulation R3 or R4). Default value : True

Type: bool

label¶

labels on each coefficient.

Type: numpy.ndarray of str

class classo.solver.ALOparameters(method='not specified')¶

Class that contains the parameters to compute the lasso-path, then the Approximation of Leave one-out error. It also has a representation method so one can print it.

numerical_method¶

name of the numerical method that is used, it can be : ‘Path-Alg’ (path algorithm) , ‘P-PDS’ (Projected primal-dual splitting method), ‘PF-PDS’ (Projection-free primal-dual splitting method) or ‘DR’ (Douglas-Rachford-type splitting method). Default value : ‘not specified’, which means that the function choose_numerical_method() will choose it accordingly to the formulation

Type: str

n_active¶

if it is higher than 0, then the algo stops computing the path when n_active variables are active.

Type: int

Then the solution does not change from this point.: Default value : 0

lambdas¶

list of rescaled lambdas for computing lasso-path. Default value : None, which means line space between 1 and lamin and Nlam points, with logarithm scale or not depending on logscale.

Type: numpy.ndarray

Nlam¶

number of points in the lambda-path if lambdas is still None (default). Default value : 80

Type: int

lamin¶

lambda minimum if lambdas is still None (default). Default value : 1e-3

Type: float

logscale¶

when lambdas is set to None (default), this parameters tells if it should be set with log scale or not. Default value : True

Type: bool

plot_sigma¶

if True then the representation method of the solution will also plot the sigma-path if it is computed (formulation R3 or R4). Default value : True

Type: bool

label¶

labels on each coefficient.

Type: numpy.ndarray of str

class classo.solver.CVparameters(method='not specified')¶

Class that contains the parameters to compute the cross-validation. It also has a representation method so one can print it.

seed¶

Seed for random values, for an equal seed, the result will be the same. If set to False/None: pseudo-random seed. Default value : 0

Type: bool or int, optional

numerical_method¶

name of the numerical method that is used, can be : ‘Path-Alg’ (path algorithm) , ‘P-PDS’ (Projected primal-dual splitting method), ‘PF-PDS’ (Projection-free primal-dual splitting method) or ‘DR’ (Douglas-Rachford-type splitting method). Default value : ‘not specified’, which means that the function choose_numerical_method() will choose it accordingly to the formulation.

Type: str

lambdas¶

list of rescaled lambdas for computing lasso-path. Default value : None, which means line space between 1 and lamin and Nlam points, with logarithm scale or not depending on logscale.

Type: numpy.ndarray

Nlam¶

number of points in the lambda-path if lambdas is still None (default). Default value : 80

Type: int

lamin¶

lambda minimum if lambdas is still None (default). Default value : 1e-3

Type: float

logscale¶

when lambdas is set to None (default), this parameters tells if it should be set with log scale or not. Default value : True

Type: bool

oneSE¶

if set to True, the selected lambda is computed with method ‘one-standard-error’. Default value : True

Type: bool

Nsubset¶

number of subset in the cross validation method. Default value : 5

Type: int

class classo.solver.StabSelparameters(method='not specified')¶

Class that contains the parameters to compute the stability selection. It also has a representation method so one can print it.

seed¶

Seed for random values, for an equal seed, the result will be the same. If set to False/None: pseudo-random seed. Default value : 123

Type: bool or int, optional

numerical_method¶

name of the numerical method that is used, can be : ‘Path-Alg’ (path algorithm) , ‘P-PDS’ (Projected primal-dual splitting method) , ‘PF-PDS’ (Projection-free primal-dual splitting method) or ‘DR’ (Douglas-Rachford-type splitting method). Default value : ‘not specified’, which means that the function choose_numerical_method() will choose it accordingly to the formulation.

Type: str

lam¶

(only used if method = ‘lam’) lam for which the lasso should be computed. Default value : ‘theoretical’ which mean it will be equal to theoretical_lam once it is computed.

Type: float or str

rescaled_lam¶

(only used if method = ‘lam’) False if lam = lambda, False if lam = lambda/lambdamax which is between 0 and 1. If False and lam = ‘theoretical’ , then it will take the value n*theoretical_lam. Default value : True

Type: bool

theoretical_lam¶

(only used if method = ‘lam’) Theoretical lam. Default value : 0.0 (once it is not computed yet, it is computed thanks to the function theoretical_lam() used in classo_problem.solve()).

Type: float

method¶

‘first’, ‘lam’ or ‘max’ depending on the type of stability selection we do. Default value : ‘first’

Type: str

B¶

number of subsample considered. Default value : 50

Type: int

q¶

number of selected variable per subsample. Default value : 10

Type: int

percent_nS¶

size of subsample relatively to the total amount of sample. Default value : 0.5

Type: float

lamin¶

lamin when computing the lasso-path for method ‘max’. Default value : 1e-2

Type: float

hd¶

if set to True, then the ‘max’ will stop when it reaches n-k actives variables. Default value : False

Type: bool

threshold¶

threshold for stability selection. Default value : 0.7

Type: float

threshold_label¶

threshold to know when the label should be plot on the graph. Default value : 0.4

Type: float

class classo.solver.LAMfixedparameters(method='not specified')¶

Class that contains the parameters to compute the lasso for a fixed lambda. It also has a representation method so one can print it.

numerical_method¶

name of the numerical method that is used, can be : ‘Path-Alg’ (path algorithm) , ‘P-PDS’ (Projected primal-dual splitting method) , ‘PF-PDS’ (Projection-free primal-dual splitting method) or ‘DR’ (Douglas-Rachford-type splitting method). Default value : ‘not specified’, which means that the function choose_numerical_method() will choose it accordingly to the formulation

Type: str

lam¶

lam for which the lasso should be computed. Default value : ‘theoretical’ which mean it will be equal to theoretical_lam once it is computed

Type: float or str

rescaled_lam¶

False if lam = lambda, True if lam = lambda/lambdamax which is between 0 and 1. If False and lam = ‘theoretical’ , then it will takes the value n*theoretical_lam. Default value : True

Type: bool

theoretical_lam¶

Theoretical lam. Default value : 0.0 (once it is not computed yet, it is computed thanks to the function theoretical_lam() used in classo_problem.solve()).

Type: float

threshold¶

Threshold such that the parameters i selected or the ones such as the absolute value of beta[i] is greater than the threshold. If None, then it will be set to the average of the absolute value of beta. Default value : None

Type: float

Class Solution¶

class classo.solver.Solution¶

Class that contains characteristics of the solution of the model_selections that are computed Before using the method solve() , its componant are empty/null. It also has a representation method so one can print it.

PATH¶

Solution components of the model PATH.

Type: solution_PATH

CV¶

Solution components of the model CV.

Type: solution_CV

StabelSel¶

Solution components of the model StabSel.

Type: solution_StabSel

LAMfixed¶

Solution components of the model LAMfixed.

Type: solution_LAMfixed

Classes used in Solution¶

class classo.solver.solution_PATH(matrices, param, formulation, numerical_method, label)¶

Class that contains characteristics of the lasso-path computed, which also contains representation method that plot the graphic of this lasso-path.

BETAS¶

array of size Npath x d with the solution beta for each lambda on each row.

Type: numpy.ndarray

SIGMAS¶

array of size Npath with the solution sigma for each lambda when the formulation of the problem is R2 or R4.

Type: numpy.ndarray

LAMBDAS¶

array of size Npath with the lambdas (real lambdas, not divided by lambda_max) for which the solution is computed.

Type: numpy.ndarray

logscale¶

whether or not the path should be plotted with a logscale.

Type: bool

method¶

name of the numerical method that has been used. It can be ‘Path-Alg’, ‘P-PDS’ , ‘PF-PDS’ or ‘DR’.

Type: str

save¶

if it is a str, then it gives the name of the file where the graphics has been/will be saved (after using print(solution) ).

Type: bool or str

formulation¶

object containing the info about the formulation of the minimization problem we solve.

Type: Formulation

time¶

running time of this action.

Type: float

class classo.solver.solution_ALO(matrices, param, formulation, numerical_method, label)¶

Class that contains characteristics of the lasso-path computed, which also contains representation method that plot the graphic of this lasso-path.

BETAS¶

array of size Npath x d with the solution beta for each lambda on each row.

Type: numpy.ndarray

SIGMAS¶

array of size Npath with the solution sigma for each lambda when the formulation of the problem is R2 or R4.

Type: numpy.ndarray

LAMBDAS¶

array of size Npath with the lambdas (real lambdas, not divided by lambda_max) for which the solution is computed.

Type: numpy.ndarray

logscale¶

whether or not the path should be plotted with a logscale.

Type: bool

method¶

name of the numerical method that has been used. It can be ‘Path-Alg’, ‘P-PDS’ , ‘PF-PDS’ or ‘DR’.

Type: str

save1,save2

if a string is given, the corresponding graph will be saved with the given name of the file. save1 is for the path plot ; save2 for ALO plot ; and save3 for refit beta-solution.

Type: bool or string

formulation¶

object containing the info about the formulation of the minimization problem we solve.

Type: Formulation

time¶

running time of this action.

Type: float

class classo.solver.solution_CV(matrices, param, formulation, numerical_method, label)¶

Class that contains characteristics of the cross validation computed, which also contains a representation method that plot the selected parameters and the solution of the not-sparse problem on the selected variables set.

xGraph¶

array of size Nlam of the lambdas / lambda_max.

Type: numpy.ndarray

yGraph¶

array of size Nlam of the average validation residual (over the K subsets).

Type: numpy.ndarray

standard_error¶

array of size Nlam of the standard error of the validation residual (over the K subsets).

Type: numpy.ndarray

logscale¶

whether or not the path should be plotted with a logscale.

Type: bool

index_min¶

index on xGraph of the selected lambda without 1-standard-error method.

Type: int

index_1SE¶

index on xGraph of the selected lambda with 1-standard-error method.

Type: int

lambda_min¶

selected lambda without 1-standard-error method.

Type: float

lambda_oneSE¶

selected lambda with 1-standard-error method.

Type: float

beta¶

solution beta of classo at lambda_oneSE/lambda_min depending on CVparameters.oneSE.

Type: numpy.ndarray

sigma¶

solution sigma of classo at lambda_oneSE when formulation is ‘R2’ or ‘R4’.

Type: float

selected_param¶

boolean arrays of size d with True when the variable is selected.

Type: numpy.ndarray

refit¶

solution beta after solving unsparse problem over the set of selected variables.

Type: numpy.ndarray

save1,save2

if a string is given, the corresponding graph will be saved with the given name of the file. save1 is for CV curve ; and save2 for refit beta-solution.

Type: bool or string

formulation¶

object containing the info about the formulation of the minimization problem we solve.

Type: Formulation

time¶

running time of this action.

Type: float

solution_CV.graphic(se_max=None, save=None, logscale=True, errorevery=5)¶

Method to plot the graphic showing mean squared error over along lambda path once cross validation is computed.

Parameters

se_max (float) – float thanks to which the graphic will not show the lambdas from which MSE(lambda)> min(MSE) + se_max * Standard_error(lambda_min). this parameter is useful to plot a graph that zooms in the interesting part. Default value : None
logScale (bool) – input that tells to plot the mean square error as a function of lambda, or log10(lambda) Default value : True
errorevery (int) – parameter input of matplotlib.pyplot.errorbar that gives the frequency of the error bars appearence. Default value : 5
save (string) – path to the file where the figure should be saved. If None, then the figure will not be saved. Default value : None

class classo.solver.solution_StabSel(matrices, param, formulation, numerical_method, label)¶

Class that contains characteristics of the stability selection computed, which also contains a representation method that plot the selected parameters, the solution of the not-sparse problem on the selected variables set, and the stability plot.

distribution¶

d array of stability ratios.

Type: array

lambdas_path¶

for ‘first’ method : Nlam array of the lambdas used. Other cases : ‘not used’.

Type: array or string

distribution_path¶

for ‘first’ method : Nlam x d array with stability ratios as a function of lambda.

Type: array or string

threshold¶

threshold for StabSel, ie for a variable i, stability ratio that is needed to get selected.

Type: float

save1,save2

if a string is given, the corresponding graph will be saved with the given name of the file. save1 is for stability plot ; and save2 for refit beta-solution.

Type: bool or string

selected_param¶

boolean arrays of size d with True when the variable is selected.

Type: numpy.ndarray

to_label¶

boolean arrays of size d with True when the name of the variable should be seen on the graph.

Type: numpy.ndarray

refit¶

solution beta after solving unsparse problem over the set of selected variables.

Type: numpy.ndarray

formulation¶

object containing the info about the formulation of the minimization problem we solve.

Type: Formulation

time¶

running time of this action.

Type: float

class classo.solver.solution_LAMfixed(matrices, param, formulation, numerical_method, label)¶

Class that contains characteristics of the lasso computed which also contains a representation method that plot this solution.

lambdamax¶

lambda maximum for which the solution is non-null.

Type: float

rescaled_lam¶

if True, the problem had been computed for lambda*lambdamax (so lambda should be between 0 and 1).

Type: bool

lamb¶

lambda for which the problem is solved.

Type: float

beta¶

solution beta of classo.

Type: numpy.ndarray

sigma¶

solution sigma of classo when formulation is ‘R2’ or ‘R4’.

Type: float

selected_param¶

boolean arrays of size d with True when the variable is selected (which is the case when the i-th component solution of the classo is non-null).

Type: numpy.ndarray

refit¶

solution beta after solving unsparse problem over the set of selected variables.

Type: numpy.ndarray

formulation¶

object containing the info about the formulation of the minimization problem we solve.

Type: Formulation

time¶

running time of this action.

Type: float