Leveraging hybrid database models for enhanced gene-disease association analysis
Sama Salam Samaan
Saja Dheyaa Khudhur
Omar Nowfal Mohammed Tahe
Computer Engineering Department, University of Technology
DOI: https://doi.org/10.47831/mjpas.v3i2.325
Keywords: GDA, graph database, semi-structured data, TBGA
Abstract
Many diseases are driven by genetic variations. The Gene-Disease Association (GDA) dataset, structured as a network, evaluates the relationships between genes and diseases. Typically, the GDA dataset consists of semi-structured data, which does not conform to a tabular format. In this work, we propose a hybrid approach for processing, storing, and analyzing TBGA, a GDA dataset comprising over 200,000 JSON instances and 100,000 gene-disease pairs. We introduce two procedures to import the TBGA dataset into both a relational model and a graph model. SQL Server is employed for the relational model to support analytical and reporting tasks, while Neo4j is used for the graph model to enable visualization and the application of graph algorithms tailored for GDA analysis. Experimental results demonstrate the effectiveness of each model, with SQL Server excelling in analytical tasks and Neo4j in visualization and graph analysis.