年份 | 2019 |
学科 | 计算生物与生物信息学 COMPUTATIONAL BIOLOGY AND BIOINFORMATICS |
国家/州 | NJ,United States of America |
Enabling Personalized Medicine: A Novel Deep Learning Tool for Classifying Genetic Mutations Using Text from Clinical Evidence
The understanding of genetic mutations and their effects is the foundation of personalized medicine. Currently, this interpretation is time-consuming, costly, and susceptible to bias, involving the manual reviewing of thousands of scientific texts on individual mutations. To address these issues, a deep-learning natural language processing tool was developed to automatically classify genetic variants and their effects. Opensource data on genetic variants and related clinical literature was utilized to engineer features that represent the relationship between variations and their impacts. Text-mining algorithms such as term frequency-inverse document frequency, coupled with high dimensional vector representations, were performed on the text corpus to embed the relationships between terms. Additionally, physicochemical properties of the substituted amino acids and their respective Grantham scores were used to map the severity of the changes and the amino acid evolutionary distances. The machine-learning system is the concatenation of a Multi-Layer Perceptron and a bidirectional Long Short-Term Memory Network that incorporates dimensionality reduction to capture principle features and mitigate noise. After training, the predictor achieved a high accuracy of 92.3% and an F1-score of 85.5. The tool was validated based on feature prioritization and previously annotated mutations through cross-validation. The deep learning predictor was then applied to currently unclassified genetic variations and identified 13 as novel oncogenic mutations. Ultimately, this study not only helps solve one of precision medicine’s primary limitations, but also presents industry viability, significantly streamlining the research process and potentially leading to the development of new therapies.
高中生科研 英特尔 Intel ISEF
资讯 · 课程 · 全程指导
请扫码添加微信好友
英特尔国际科学与工程大奖赛,简称 "ISEF",由美国 Society for Science and the Public(科学和公共服务协会)主办,英特尔公司冠名赞助,是全球规模最大、等级最高的中学生的科研科创赛事。ISEF 的学术活动学科包括了所有数学、自然科学、工程的全部领域和部分社会科学。ISEF 素有全球青少年科学学术活动的“世界杯”之美誉,旨在鼓励学生团队协作,开拓创新,长期专一深入地研究自己感兴趣的课题。
>>> 实用链接汇总 <<<
· 数学 · 物理 · 化学 · 生物 · 计算机 · 工程 ·
Studies that primarily focus on the discipline and techniques of computer science and mathematics as they relate to biological systems. This includes the development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavior, and social systems.
Computational Biomodeling (MOD): Studies that involve computer simulations of biological systems most commonly with a goal of understanding how cells or organism develop, work collectively and survive.
Computational Epidemiology (EPD): The study of disease frequency and distribution, and risk factors and socioeconomic determinants of health within populations. Such studies may include gathering information to confirm existence of disease outbreaks, developing case definitions and analyzing epidemic data, establishing disease surveillance, and implementing methods of disease prevention and control.
Computational Evolutionary Biology (EVO): A study that applies the discipline and techniques of computer science and mathematics to explore the processes of change in populations of organisms, especially taxonomy, paleontology, ethology, population genetics and ecology.
Computational Neuroscience (NEU): A study that applies the discipline and techniques of computer science and mathematics to understand brain function in terms of the information processing properties of the structures that make up the nervous system.
Computational Pharmacology (PHA): A study that applies the discipline and techniques of computer science and mathematics to predict and analyze the responses to drugs.
Genomics (GEN): The study of the function and structure of genomes using recombinant DNA, sequencing, and bioinformatics.
Other (OTH): Studies that cannot be assigned to one of the above subcategories. If the project involves multiple subcategories, the principal subcategory should be chosen instead of Other.
© 2024. All Rights Reserved. 沪ICP备2023009024号-1