Structure of problem instance¶
- The package is organized as follow :
There is a main class called
classo_problem
, that contains a lot of information about the problem, and once the problem is solved, it will also contains the solution.Here is the global structure of the problem instance:
A
classo_problem
instance contains aData
instance, aFormulation
instance, aModel_selection
instance and aSolution
instance.A
Model_selection
instance contains the instances :PATHparameters
,CVparameters
,StabSelparameters
,LAMfixedparameters
.A
Solution
instance, once is computed, contains the instances :solution_PATH
,solution_CV
,solution_StabSel
,solution_LAMfixed
.
Classes
|
Class that contains all the information about the problem. |
Method that solves every model required in the attributes of the problem instance and update the attribute |
|
|
Class that contains the data of the problem ie where matrices and labels are stored. |
Class that contains the information about the formulation of the problem namely, the type of formulation (R1, R2, R3, R4, C1, C2) and its parameters like rho, the weigths and the presence of an intercept. |
|
|
Class that contains information about the model selections to perform. |
|
Class that contains the parameters to compute the lasso-path. |
|
Class that contains the parameters to compute the cross-validation. |
|
Class that contains the parameters to compute the stability selection. |
|
Class that contains the parameters to compute the lasso for a fixed lambda. |
|
Class that contains characteristics of the solution of the model_selections that are computed Before using the method |
|
Class that contains characteristics of the lasso-path computed, which also contains representation method that plot the graphic of this lasso-path. |
|
Class that contains characteristics of the lasso-path computed, which also contains representation method that plot the graphic of this lasso-path. |
|
Class that contains characteristics of the cross validation computed, which also contains a representation method that plot the selected parameters and the solution of the not-sparse problem on the selected variables set. |
|
Method to plot the graphic showing mean squared error over along lambda path once cross validation is computed. |
|
Class that contains characteristics of the stability selection computed, which also contains a representation method that plot the selected parameters, the solution of the not-sparse problem on the selected variables set, and the stability plot. |
|
Class that contains characteristics of the lasso computed which also contains a representation method that plot this solution. |
Class classo_problem¶
-
class
classo.solver.
classo_problem
(X, y, C=None, Tree=None, label=None)¶ Class that contains all the information about the problem. It also has a representation method so one can print it.
- Parameters
X (ndarray) – Matrix representing the data of the problem.
y (ndarray) – Vector representing the output of the problem.
C (str or ndarray, optional) – Matrix of constraints to the problem. If it is ‘zero-sum’ then the corresponding attribute will be all-one matrix. Default value : ‘zero-sum’
label (list,optional) – list of the labels of each variable. If None, then label are just indices. Default value : None
-
data
¶ object containing the data (matrices) of the problem. Namely : X, y, C and the labels.
- Type
-
formulation
¶ object containing the info about the formulation of the minimization problem we solve.
- Type
-
model_selection
¶ object containing the parameters we need to do variable selection.
- Type
-
solution
¶ object giving caracteristics of the solution of the model_selection that is asked. Before using the method
solve()
, its componant are empty/null.- Type
-
numerical_method
¶ name of the numerical method that is used, it can be : ‘Path-Alg’ (path algorithm) , ‘P-PDS’ (Projected primal-dual splitting method) , ‘PF-PDS’ (Projection-free primal-dual splitting method) or ‘DR’ (Douglas-Rachford-type splitting method). Default value : ‘not specified’, which means that the function
choose_numerical_method()
will choose it accordingly to the formulation.- Type
-
classo_problem.
solve
()¶ Method that solves every model required in the attributes of the problem instance and update the attribute
solution
with the characteristics of the solution.
-
classo.solver.
choose_numerical_method
(method, model, formulation, StabSelmethod=None, lam=None)¶ Annex function in order to choose the right numerical method, if the given one is invalid. In general, it will choose one of the possible optimization scheme for a given formulation. When several computation modes are possible, the rules are as follow :
If possible, always use “Path-Alg”, except for fixed lambdas smaller than 0.05 and for R4 where Path-Alg does not compute the path (paradoxically).
Else, it uses “DR”.
- Parameters
method (str) – input method that is possibly wrong and should be changed.
the method is valid for this formulation (If) –
will not be changed. (it) –
model (str) – Computation mode. Can be “PATH”, “StabSel”, “CV” or “LAM”.
formulation (Formulation) – object containing the info about the formulation of the minimization problem we solve.
StabSelmethod (str, optional) – if model is “StabSel”, it can be “first” , “lam” or “max”.
lam (float, optional) – value of lam (fractional L1 penalty).
- Returns :
str : method that should be used. Can be “Path-Alg”, “DR”, “P-PDS” or “PF-PDS”
Class Data¶
-
class
classo.solver.
Data
(X, y, C, Tree=None, label=None)¶ Class that contains the data of the problem ie where matrices and labels are stored.
- Parameters
X (ndarray) – Matrix representing the data of the problem.
y (ndarray) – Vector representing the output of the problem.
C (str or array, optional) – Matrix of constraints to the problem. If it is ‘zero-sum’ then the corresponding attribute will be all-one matrix.
label (list, optional) – list of the labels of each variable. If None, then labels are juste the indices. Default value : None
Tree (skbio.TreeNode, optional) – taxonomic tree, if not None, then the matrices X and C and the labels will be changed.
-
X
¶ Matrix representing the data of the problem.
- Type
ndarray
-
y
¶ Vector representing the output of the problem.
- Type
ndarray
Class Formulation¶
-
class
classo.solver.
Formulation
¶ Class that contains the information about the formulation of the problem namely, the type of formulation (R1, R2, R3, R4, C1, C2) and its parameters like rho, the weigths and the presence of an intercept. The type of formulation is encoded with boolean huber concomitant and classification with the rule:
False False False = R1
True False False = R2
False True False = R3
True True False = R4
False False True = C1
True False True = C2
It also has a representation method so one can print it.
-
concomitant
¶ True if the formulation of the problem should be with an M-estimation of sigma. Default value : True
- Type
-
classification
¶ True if the formulation of the problem should be classification (if yes, then it will not be concomitant). Default value : False
- Type
-
scale_rho
¶ If set to True, it will become rho * sqrt( mean( y**2 ) ) while solving the problem so that it lives on the scale of y and also usefull so that we don’t have the problem with the non strict convexity (i.e. at least one sample is on the quadratic mode of the huber loss function) as long as rho is higher than one. Default value : True
- Type
-
rho_classification
¶ value of rho for huberized hinge loss function for classification ie C2 (it has to be strictly smaller then 1). Default value : -1.
- Type
-
e
¶ value of e in concomitant formulation. If ‘n/2’ then it becomes n/2 during the method
solve()
, same for ‘n’. Default value : ‘n’ if huber formulation ; ‘n/2’ else- Type
float or string
-
w
¶ array of size d with the weights of the L1 penalization. This has to be positive. Default value : None (which makes it the 1,…,1 vector)
- Type
numpy ndarray
-
Class Model_selection¶
-
class
classo.solver.
Model_selection
(method='not specified')¶ Class that contains information about the model selections to perform. It contains boolean that states which one will be computed. It also contains objects that contain parameters of each computation modes. It also has a representation method so one can print it.
-
PATHparameters
¶ object containing parameters to compute the lasso-path.
- Type
-
ALOparameters
¶ object containing parameters to compute the ALO for c-lasso.
- Type
-
CVparameters
¶ object containing parameters to compute the cross-validation.
- Type
-
StabSel
¶ True if Stability Selection should be computed. Default value : True
- Type
boolean
-
StabSelparameters
¶ object containing parameters to compute the stability selection.
- Type
-
LAMfixed
¶ True if solution for a fixed lambda should be computed. Default value : False
- Type
boolean
-
LAMfixedparameters
¶ object containing parameters to compute the lasso for a fixed lambda.
- Type
-
Classes used in Model_selection¶
-
class
classo.solver.
PATHparameters
(method='not specified')¶ Class that contains the parameters to compute the lasso-path. It also has a representation method so one can print it.
-
numerical_method
¶ name of the numerical method that is used, it can be : ‘Path-Alg’ (path algorithm) , ‘P-PDS’ (Projected primal-dual splitting method), ‘PF-PDS’ (Projection-free primal-dual splitting method) or ‘DR’ (Douglas-Rachford-type splitting method). Default value : ‘not specified’, which means that the function
choose_numerical_method()
will choose it accordingly to the formulation- Type
-
n_active
¶ if it is higher than 0, then the algo stops computing the path when n_active variables are active.
- Type
-
Then the solution does not change from this point.
Default value : 0
-
lambdas
¶ list of rescaled lambdas for computing lasso-path. Default value : None, which means line space between 1 and
lamin
andNlam
points, with logarithm scale or not depending onlogscale
.- Type
-
Nlam
¶ number of points in the lambda-path if
lambdas
is still None (default). Default value : 80- Type
-
logscale
¶ when
lambdas
is set to None (default), this parameters tells if it should be set with log scale or not. Default value : True- Type
-
plot_sigma
¶ if True then the representation method of the solution will also plot the sigma-path if it is computed (formulation R3 or R4). Default value : True
- Type
-
label
¶ labels on each coefficient.
- Type
numpy.ndarray of str
-
-
class
classo.solver.
ALOparameters
(method='not specified')¶ Class that contains the parameters to compute the lasso-path, then the Approximation of Leave one-out error. It also has a representation method so one can print it.
-
numerical_method
¶ name of the numerical method that is used, it can be : ‘Path-Alg’ (path algorithm) , ‘P-PDS’ (Projected primal-dual splitting method), ‘PF-PDS’ (Projection-free primal-dual splitting method) or ‘DR’ (Douglas-Rachford-type splitting method). Default value : ‘not specified’, which means that the function
choose_numerical_method()
will choose it accordingly to the formulation- Type
-
n_active
¶ if it is higher than 0, then the algo stops computing the path when n_active variables are active.
- Type
-
Then the solution does not change from this point.
Default value : 0
-
lambdas
¶ list of rescaled lambdas for computing lasso-path. Default value : None, which means line space between 1 and
lamin
andNlam
points, with logarithm scale or not depending onlogscale
.- Type
-
Nlam
¶ number of points in the lambda-path if
lambdas
is still None (default). Default value : 80- Type
-
logscale
¶ when
lambdas
is set to None (default), this parameters tells if it should be set with log scale or not. Default value : True- Type
-
plot_sigma
¶ if True then the representation method of the solution will also plot the sigma-path if it is computed (formulation R3 or R4). Default value : True
- Type
-
label
¶ labels on each coefficient.
- Type
numpy.ndarray of str
-
-
class
classo.solver.
CVparameters
(method='not specified')¶ Class that contains the parameters to compute the cross-validation. It also has a representation method so one can print it.
-
seed
¶ Seed for random values, for an equal seed, the result will be the same. If set to False/None: pseudo-random seed. Default value : 0
-
numerical_method
¶ name of the numerical method that is used, can be : ‘Path-Alg’ (path algorithm) , ‘P-PDS’ (Projected primal-dual splitting method), ‘PF-PDS’ (Projection-free primal-dual splitting method) or ‘DR’ (Douglas-Rachford-type splitting method). Default value : ‘not specified’, which means that the function
choose_numerical_method()
will choose it accordingly to the formulation.- Type
-
lambdas
¶ list of rescaled lambdas for computing lasso-path. Default value : None, which means line space between 1 and
lamin
andNlam
points, with logarithm scale or not depending onlogscale
.- Type
-
Nlam
¶ number of points in the lambda-path if
lambdas
is still None (default). Default value : 80- Type
-
logscale
¶ when
lambdas
is set to None (default), this parameters tells if it should be set with log scale or not. Default value : True- Type
-
oneSE
¶ if set to True, the selected lambda is computed with method ‘one-standard-error’. Default value : True
- Type
-
-
class
classo.solver.
StabSelparameters
(method='not specified')¶ Class that contains the parameters to compute the stability selection. It also has a representation method so one can print it.
-
seed
¶ Seed for random values, for an equal seed, the result will be the same. If set to False/None: pseudo-random seed. Default value : 123
-
numerical_method
¶ name of the numerical method that is used, can be : ‘Path-Alg’ (path algorithm) , ‘P-PDS’ (Projected primal-dual splitting method) , ‘PF-PDS’ (Projection-free primal-dual splitting method) or ‘DR’ (Douglas-Rachford-type splitting method). Default value : ‘not specified’, which means that the function
choose_numerical_method()
will choose it accordingly to the formulation.- Type
-
lam
¶ (only used if
method
= ‘lam’) lam for which the lasso should be computed. Default value : ‘theoretical’ which mean it will be equal totheoretical_lam
once it is computed.
-
rescaled_lam
¶ (only used if
method
= ‘lam’) False if lam = lambda, False if lam = lambda/lambdamax which is between 0 and 1. If False and lam = ‘theoretical’ , then it will take the value n*theoretical_lam. Default value : True- Type
-
theoretical_lam
¶ (only used if
method
= ‘lam’) Theoretical lam. Default value : 0.0 (once it is not computed yet, it is computed thanks to the functiontheoretical_lam()
used inclasso_problem.solve()
).- Type
-
method
¶ ‘first’, ‘lam’ or ‘max’ depending on the type of stability selection we do. Default value : ‘first’
- Type
-
percent_nS
¶ size of subsample relatively to the total amount of sample. Default value : 0.5
- Type
-
hd
¶ if set to True, then the ‘max’ will stop when it reaches n-k actives variables. Default value : False
- Type
-
-
class
classo.solver.
LAMfixedparameters
(method='not specified')¶ Class that contains the parameters to compute the lasso for a fixed lambda. It also has a representation method so one can print it.
-
numerical_method
¶ name of the numerical method that is used, can be : ‘Path-Alg’ (path algorithm) , ‘P-PDS’ (Projected primal-dual splitting method) , ‘PF-PDS’ (Projection-free primal-dual splitting method) or ‘DR’ (Douglas-Rachford-type splitting method). Default value : ‘not specified’, which means that the function
choose_numerical_method()
will choose it accordingly to the formulation- Type
-
lam
¶ lam for which the lasso should be computed. Default value : ‘theoretical’ which mean it will be equal to
theoretical_lam
once it is computed
-
rescaled_lam
¶ False if lam = lambda, True if lam = lambda/lambdamax which is between 0 and 1. If False and lam = ‘theoretical’ , then it will takes the value n*theoretical_lam. Default value : True
- Type
-
theoretical_lam
¶ Theoretical lam. Default value : 0.0 (once it is not computed yet, it is computed thanks to the function
theoretical_lam()
used inclasso_problem.solve()
).- Type
-
Class Solution¶
-
class
classo.solver.
Solution
¶ Class that contains characteristics of the solution of the model_selections that are computed Before using the method
solve()
, its componant are empty/null. It also has a representation method so one can print it.-
PATH
¶ Solution components of the model PATH.
- Type
-
CV
¶ Solution components of the model CV.
- Type
-
StabelSel
¶ Solution components of the model StabSel.
- Type
-
LAMfixed
¶ Solution components of the model LAMfixed.
- Type
-
Classes used in Solution¶
-
class
classo.solver.
solution_PATH
(matrices, param, formulation, numerical_method, label)¶ Class that contains characteristics of the lasso-path computed, which also contains representation method that plot the graphic of this lasso-path.
-
BETAS
¶ array of size Npath x d with the solution beta for each lambda on each row.
- Type
-
SIGMAS
¶ array of size Npath with the solution sigma for each lambda when the formulation of the problem is R2 or R4.
- Type
-
LAMBDAS
¶ array of size Npath with the lambdas (real lambdas, not divided by lambda_max) for which the solution is computed.
- Type
-
method
¶ name of the numerical method that has been used. It can be ‘Path-Alg’, ‘P-PDS’ , ‘PF-PDS’ or ‘DR’.
- Type
-
save
¶ if it is a str, then it gives the name of the file where the graphics has been/will be saved (after using print(solution) ).
-
formulation
¶ object containing the info about the formulation of the minimization problem we solve.
- Type
-
-
class
classo.solver.
solution_ALO
(matrices, param, formulation, numerical_method, label)¶ Class that contains characteristics of the lasso-path computed, which also contains representation method that plot the graphic of this lasso-path.
-
BETAS
¶ array of size Npath x d with the solution beta for each lambda on each row.
- Type
-
SIGMAS
¶ array of size Npath with the solution sigma for each lambda when the formulation of the problem is R2 or R4.
- Type
-
LAMBDAS
¶ array of size Npath with the lambdas (real lambdas, not divided by lambda_max) for which the solution is computed.
- Type
-
method
¶ name of the numerical method that has been used. It can be ‘Path-Alg’, ‘P-PDS’ , ‘PF-PDS’ or ‘DR’.
- Type
-
save1,save2
if a string is given, the corresponding graph will be saved with the given name of the file. save1 is for the path plot ; save2 for ALO plot ; and save3 for refit beta-solution.
- Type
bool or string
-
formulation
¶ object containing the info about the formulation of the minimization problem we solve.
- Type
-
-
class
classo.solver.
solution_CV
(matrices, param, formulation, numerical_method, label)¶ Class that contains characteristics of the cross validation computed, which also contains a representation method that plot the selected parameters and the solution of the not-sparse problem on the selected variables set.
-
xGraph
¶ array of size Nlam of the lambdas / lambda_max.
- Type
-
yGraph
¶ array of size Nlam of the average validation residual (over the K subsets).
- Type
-
standard_error
¶ array of size Nlam of the standard error of the validation residual (over the K subsets).
- Type
-
beta
¶ solution beta of classo at lambda_oneSE/lambda_min depending on
CVparameters.oneSE
.- Type
-
selected_param
¶ boolean arrays of size d with True when the variable is selected.
- Type
-
refit
¶ solution beta after solving unsparse problem over the set of selected variables.
- Type
-
save1,save2
if a string is given, the corresponding graph will be saved with the given name of the file. save1 is for CV curve ; and save2 for refit beta-solution.
- Type
bool or string
-
formulation
¶ object containing the info about the formulation of the minimization problem we solve.
- Type
-
-
solution_CV.
graphic
(se_max=None, save=None, logscale=True, errorevery=5)¶ Method to plot the graphic showing mean squared error over along lambda path once cross validation is computed.
- Parameters
se_max (float) – float thanks to which the graphic will not show the lambdas from which MSE(lambda)> min(MSE) + se_max * Standard_error(lambda_min). this parameter is useful to plot a graph that zooms in the interesting part. Default value : None
logScale (bool) – input that tells to plot the mean square error as a function of lambda, or log10(lambda) Default value : True
errorevery (int) – parameter input of matplotlib.pyplot.errorbar that gives the frequency of the error bars appearence. Default value : 5
save (string) – path to the file where the figure should be saved. If None, then the figure will not be saved. Default value : None
-
class
classo.solver.
solution_StabSel
(matrices, param, formulation, numerical_method, label)¶ Class that contains characteristics of the stability selection computed, which also contains a representation method that plot the selected parameters, the solution of the not-sparse problem on the selected variables set, and the stability plot.
-
distribution
¶ d array of stability ratios.
- Type
array
-
lambdas_path
¶ for ‘first’ method : Nlam array of the lambdas used. Other cases : ‘not used’.
- Type
array or string
-
distribution_path
¶ for ‘first’ method : Nlam x d array with stability ratios as a function of lambda.
- Type
array or string
-
threshold
¶ threshold for StabSel, ie for a variable i, stability ratio that is needed to get selected.
- Type
-
save1,save2
if a string is given, the corresponding graph will be saved with the given name of the file. save1 is for stability plot ; and save2 for refit beta-solution.
- Type
bool or string
-
selected_param
¶ boolean arrays of size d with True when the variable is selected.
- Type
-
to_label
¶ boolean arrays of size d with True when the name of the variable should be seen on the graph.
- Type
-
refit
¶ solution beta after solving unsparse problem over the set of selected variables.
- Type
-
formulation
¶ object containing the info about the formulation of the minimization problem we solve.
- Type
-
-
class
classo.solver.
solution_LAMfixed
(matrices, param, formulation, numerical_method, label)¶ Class that contains characteristics of the lasso computed which also contains a representation method that plot this solution.
-
rescaled_lam
¶ if True, the problem had been computed for lambda*lambdamax (so lambda should be between 0 and 1).
- Type
-
beta
¶ solution beta of classo.
- Type
-
selected_param
¶ boolean arrays of size d with True when the variable is selected (which is the case when the i-th component solution of the classo is non-null).
- Type
-
refit
¶ solution beta after solving unsparse problem over the set of selected variables.
- Type
-
formulation
¶ object containing the info about the formulation of the minimization problem we solve.
- Type
-