TechRxiv

Analyzing Biomedical Datasets with Symbolic Tree Adaptive Resonance Theory

Version 2 2023-12-05, 04:17
Version 1 2023-11-20, 15:19
preprint
posted on 2023-12-05, 04:17 authored by Sasha PetrenkoSasha Petrenko, Daniel Hier, Tayo Obafemi-Ajayi, Mary Bone, Erik Timpson, Michael Speight

Biomedical datasets distill many mechanisms of human diseases, linking diseases to genes and phenotypes (signs and symptoms of disease), genetic mutations to altered protein structures, and altered proteins to changes in molecular functions and biological processes. It is desirable to gain new insights from these data, especially with regard to the uncovering of hierarchical structures relating disease variants. However, analysis to this end has proven difficult due to the complexity of the connections between multicategorical symbolic data. This article proposes Symbolic Tree Adaptive Resonance Theory (START), with additional supervised, Dual-Vigilance (DV-START), and Distributed Dual-Vigilance (DDV-START) formulations, for the clustering of multicategorical symbolic data from biomedical datasets by demonstrating its utility in clustering variants of Charcot-Marie-Tooth disease using genomic, phenotypic, and proteomic data.


AUTHORS NOTE: this article outlines the Symbolic Tree Adaptive Resonance Theory (START) machine learning algorithm, which is unrelated to the similarly named Spectral Timing Adaptive Resonance Theory (START) explanatory neural network model.

Funding

Kansas City National Security Campus contract number DE-NA0002839

History

Email Address of Submitting Author

petrenkos@mst.edu

ORCID of Submitting Author

0000-0003-2442-8901

Submitting Author's Institution

Missouri University of Science and Technology

Submitting Author's Country

  • United States of America

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC