年份 | 2018 |
学科 | 物理与天文学 Physics and Astronomy |
国家/州 | United States of America |
Automated Identification and Inference of Organic Molecular Structure and Relative Concentrations from Infrared Spectral Data
The discovery of complex organic molecules in space is critical to the understanding of the reaction pathways leading to biomolecules and the origins of life. Existing techniques for the analysis of astronomical spectra require knowledgeable researchers and often struggle to identify and differentiate complex spectral signatures, such as those of polycyclic aromatic hydrocarbons (PAHs). My project applies machine learning (convolutional neural networks) to the problem of identifying complex organic molecules in IR spectroscopic data and proposes a novel method for creating synthetic training data to tune models to specific astronomical environments.
My project created: a) models to identify organic molecules from empirical IR spectroscopic data when trained on the approximate theoretical counterparts from NASA’s PAHdb v2 and v3 databases and b), models to identify molecular compositions from spectra of random theoretical molecule mixtures with realistic noise.
My principal findings are: a) network models trained on theoretical spectra can accurately identify empirical molecules with ~73% accuracy, and b) models trained on random mixtures of 3,139 theoretical PAH spectra can identify molecular concentrations with weight vector correlations of ~85% and can correctly identify the largest constituent ~67% of the time. In all cases, my models (the best being ResNet5 with ~200M parameters) dramatically outperform standard linear models.
My convolutional network models can recognize complex spectral patterns and generalize across datasets with realistic noise. These models can significantly increase the scale and efficiency of analyzing astronomical IR spectra data and improve our understanding of the distribution of complex organic molecules in the universe.
英特尔国际科学与工程大奖赛,简称 "ISEF",由美国 Society for Science and the Public(科学和公共服务协会)主办,英特尔公司冠名赞助,是全球规模最大、等级最高的中学生的科研科创赛事。ISEF 的学术活动学科包括了所有数学、自然科学、工程的全部领域和部分社会科学。ISEF 素有全球青少年科学学术活动的“世界杯”之美誉,旨在鼓励学生团队协作,开拓创新,长期专一深入地研究自己感兴趣的课题。
·
Physics is the science of matter and energy and of interactions between the two. Astronomy is the study of anything in the universe beyond the Earth.
Atomic, Molecular, and Optical Physics (AMO): The study of atoms, simple molecules, electrons, light, and their interactions. Projects studying non-solid state lasers and masers also belong in this subcategory.
Astronomy and Cosmology (AST): The study of space, the universe as a whole, including its origins and evolution, the physical properties of objects in space and computational astronomy.
Biological Physics (BIP): The study of the physics of biological processes and systems.
Condensed Matter and Materials (MAT): The study of the properties of solids and liquids. Topics such as superconductivity, semi-conductors, complex fluids, and thin films are studied.
Mechanics (MEC): Classical physics and mechanics, including the macroscopic study of forces, vibrations and flows; on solid, liquid and gaseous materials. Projects studying aerodynamics or hydrodynamics also belong in this subcategory.
Nuclear and Particle Physics (NUC): The study of the physical properties of the atomic nucleus and of fundamental particles and the forces of their interaction. Projects developing particle detectors also belong in this subcategory.
Theoretical, Computational, and Quantum Physics (THE): The study of nature, phenomena and the laws of physics employing mathematical or computational methods rather than experimental processes.
Other (OTH): Studies that cannot be assigned to one of the above subcategories. If the project involves multiple subcategories, the principal subcategory should be chosen instead of Other.
© 2024. All Rights Reserved. 沪ICP备2023009024号-1