pH prediction using the 88 soils dataset

The next microbiome example considers a Soil dataset .

The data are generated thanks to a qiime2 workflow similar to a gneiss tutorial.

This workflow treat some files taken from gneiss GitHub.

The task is to predict pH concentration in the soil from microbial abundance data.

A similar analysis is also done in Tree-Aggregated Predictive Modeling of Microbiome Data.

on another dataset

import sys, os
from os.path import join, dirname, abspath

classo_dir = dirname(dirname(abspath("__file__")))
sys.path.append(classo_dir)
from classo import classo_problem
import numpy as np
import pandas as pd

Load data

t = pd.read_csv("pH_data/qiime2/news/table.csv", index_col=0)
metadata = pd.read_table(
    "pH_data/qiime2/originals/88soils_modified_metadata.txt", index_col=0
)
y_uncent = metadata["ph"].values


X = t.values
label = t.columns


# second option to load the data
# import scipy.io as sio
# pH = sio.loadmat("pH_data/matlab/pHData.mat")
# tax = sio.loadmat("pH_data/matlab/taxTablepHData.mat")["None"][0]
# X, y_uncent = pH["X"], pH["Y"].T[0]
# label = None

y = y_uncent - np.mean(y_uncent)  # Center Y
print(X.shape)
print(y.shape)

Set up c-lassso problem

problem = classo_problem(X, y, label=label)

problem.model_selection.StabSelparameters.method = "lam"
problem.model_selection.PATH = True
problem.model_selection.LAMfixed = True
problem.model_selection.PATHparameters.n_active = X.shape[1] + 1

Solve for R1

problem.formulation.concomitant = False
problem.solve()
print(problem, problem.solution)

Solve for R2

problem.formulation.huber = True
problem.solve()
print(problem, problem.solution)

Solve for R3

problem.formulation.concomitant = True
problem.formulation.huber = False
problem.solve()
print(problem, problem.solution)

Solve for R4

Remark : we reset the numerical method here, because it has been automatically set to ‘¨Path-Alg’ for previous computations, but for R4, “DR” is much better as explained in the documentation, R4 “Path-Alg” is a method for fixed lambda but is (paradoxically) bad to compute the lambda-path because of the absence of possible warm-start in this method

problem.model_selection.PATHparameters.numerical_method = "DR"
problem.formulation.huber = True
problem.solve()
print(problem, problem.solution)

Total running time of the script: ( 0 minutes 0.000 seconds)

Gallery generated by Sphinx-Gallery