Note
Click here to download the full example code
pH prediction using the 88 soils dataset¶
The next microbiome example considers a Soil dataset .
The data are generated thanks to a qiime2 workflow similar to a gneiss tutorial.
This workflow treat some files taken from gneiss GitHub.
The task is to predict pH concentration in the soil from microbial abundance data.
- A similar analysis is also done in Tree-Aggregated Predictive Modeling of Microbiome Data.
Load data¶
t = pd.read_csv("pH_data/qiime2/news/table.csv", index_col=0)
metadata = pd.read_table(
"pH_data/qiime2/originals/88soils_modified_metadata.txt", index_col=0
)
y_uncent = metadata["ph"].values
X = t.values
label = t.columns
# second option to load the data
# import scipy.io as sio
# pH = sio.loadmat("pH_data/matlab/pHData.mat")
# tax = sio.loadmat("pH_data/matlab/taxTablepHData.mat")["None"][0]
# X, y_uncent = pH["X"], pH["Y"].T[0]
# label = None
y = y_uncent - np.mean(y_uncent) # Center Y
print(X.shape)
print(y.shape)
Set up c-lassso problem¶
problem = classo_problem(X, y, label=label)
problem.model_selection.StabSelparameters.method = "lam"
problem.model_selection.PATH = True
problem.model_selection.LAMfixed = True
problem.model_selection.PATHparameters.n_active = X.shape[1] + 1
Solve for R1¶
problem.formulation.concomitant = False
problem.solve()
print(problem, problem.solution)
Solve for R2¶
problem.formulation.huber = True
problem.solve()
print(problem, problem.solution)
Solve for R3¶
problem.formulation.concomitant = True
problem.formulation.huber = False
problem.solve()
print(problem, problem.solution)
Solve for R4¶
Remark : we reset the numerical method here, because it has been automatically set to ‘¨Path-Alg’ for previous computations, but for R4, “DR” is much better as explained in the documentation, R4 “Path-Alg” is a method for fixed lambda but is (paradoxically) bad to compute the lambda-path because of the absence of possible warm-start in this method
problem.model_selection.PATHparameters.numerical_method = "DR"
problem.formulation.huber = True
problem.solve()
print(problem, problem.solution)
Total running time of the script: ( 0 minutes 0.000 seconds)