Miscellaneous functions

Functions

random_data(n, d, d_nonzero, k, sigma[, …])

Generation of random matrices as data such that y = X.sol + sigma.

clr(array[, coef])

Centered-Log-Ratio transformation

theoretical_lam(n, d)

Theoretical lambda as a function of the dimensions of the problem

More details

classo.misc_functions.random_data(n, d, d_nonzero, k, sigma, zerosum=False, seed=False, classification=False, exp=False, A=None, lb_beta=3, ub_beta=10, intercept=None)

Generation of random matrices as data such that y = X.sol + sigma. noise

The data X is generated as a normal matrix. The vector sol is generated randomly with a random support of size d_nonzero, and componants are projected random intergers between -10 and 10 on the kernel of C restricted to the support. The vector y is then generated with X.dot(sol)+ sigma*noise, with noise a normal vector.

Parameters
  • n (int) – Number of samples, dimension of y.

  • d (int) – Number of variables, dimension of sol.

  • d_nonzero (int) – Number of non null componant of sol.

  • k (int) – Number of constraints, number of rows of C.

  • sigma (float) – size of the noise.

  • zerosum (bool, optional) – If True, then C is the all-one matrix with 1 row, independently of k.

  • seed (bool or int, optional) – Seed for random values, for an equal seed, the result will be the same. If set to False: pseudo-random vectors

  • classification (bool, optional) – if True, then it returns sign(y) instead of y.

  • A (numpy.ndarray) – matrix corresponding to a taxa tree, if it is given, then the problem should be y = X.A.g + eps , C.A.g = 0.

Returns

tuple of three ndarray that corresponds to the data : (X,C,y). ndarray : array corresponding to sol which is the real solution of the problem y = Xbeta + noise s.t. beta sparse and Cbeta = 0.

Return type

tuple

classo.misc_functions.clr(array, coef=0.5)

Centered-Log-Ratio transformation

Set all non positive entry to a constant coef. Then compute the log of each component. Then substract the mean of each column.

Parameters
  • array (ndarray) – matrix nxd

  • coef (float, optional) – Value to replace the zero values

Returns

clr transformed matrix nxd

Return type

ndarray

classo.misc_functions.theoretical_lam(n, d)

Theoretical lambda as a function of the dimensions of the problem

This function returns (with \(\phi = erf\)) :

\(4/ \sqrt{n} \phi^{-1}(1 - 2x)\) such that \(x = 4/d ( \phi^{-1}(1-2x)4 + \phi^{-1}(1-2x)^2 )\)

Which is the same (thanks to formula : \(norm^{-1}(1-t) = \sqrt{2}\phi^{-1}(1-2t)\) ) as :

\(\sqrt{2/n} * norm^{-1}(1-k/p)\) such that \(k = norm^{-1}(1 - k/p)^4 + 2norm^{-1}(1 - k/p)^2\)

Parameters
  • n (int) – number of sample

  • d (int) – number of variables

Returns

theoretical lambda

Return type

float