Genetic Identification of Lung Cancer

Benchmark vs. Logistic Regression, Random Forest, Boosted Tree & Neural Network

Design an AI-based decision system that accurately and instantly makes a rational medical diagnosis of lung cancer from genetic sequencing of lung tissues, to determine whether it is malignant pleural mesothelioma or adenocarcinoma (ADCA).

Goals & benefits

Identify the genes involved in cancer and enhance medical knowledge by helping pulmonologists and oncologists understand the causal relationships between specific genes, their combination, and the type of cancer.

Help the medical profession to make earlier and more personalized decisions through rapid, systematic, and explainable diagnoses.

Contribute to improving patient care (pain, survival, duration of treatment) and extend access to high-level diagnoses even in medical deserts.

  • The top-model is a decision system composed of 2 disjunctive gradual rules without chaining. Remark: Even if the theoretical complexity of this problem was very high, the decision process studied turns out to be quite simple, although non-linear.
  • Each rule uses from 1 to 2 predictors among the 2 variables that XTRACTIS automatically identified as significant (out of the 12,533 level of genes expression describing each patient).
  • Only a few rules are triggered at a time to compute the decision.

It has a perfect Real Performance (on unknown data).

It computes real-time predictions up to 70,000 decisions/second, offline or online (API).

UC12 scores graph

LoR=Logistic Regression
RFo=Random Forest
BT=Boosted Tree
NN=Neural Network

Detailed results and explanations in full document

Use Case 2024/03 (v2.1)

Powered by XTRACTIS® REVEAL v12.2.44169 (2022/12)


  1. Problem Definition
  2. XTRACTIS-induced Decision System
  3. XTRACTIS Process
  4. Top-Model Induction
  5. Explained Predictions for 3 unkown cases
  6. Top-Models Benchmark
  7. Appendices