年份 | 2015 |
学科 | 机器人与智能机器 Robotics and Intelligent Machines |
国家/州 | United States of America |
Development of an Authorship Identification Algorithm for Twitter Using Stylometric Techniques
I developed software that implements semi-supervised learning to dramatically improve accuracy when stylometrically attributing an unidentified tweet to the correct author from a set of known Twitter authors. Existing stylometric techniques generally do not perform well on short texts. Software written in Python streamed, preliminarily processed, and stored 1000 tweets each from up to 30 prolific authors on Twitter. Traditional and flexible bigrams, as well as their frequencies of occurrence, were extracted from both the authors’ known tweets and the unknown tweet, forming each author’s profile. These bigrams were then used as tokens for a Naive Bayes classifier which returned the probability of each author having written the unknown tweet. The first, second, and third most likely authors were determined by the classifier and written as output. After repeating this process multiple times, the percent accuracy of identifying the correct author was calculated. A program was completed that would, to a significant degree of accuracy, identify the author of an unknown tweet. Furthermore, it was found that excluding retweets, using a combination of flexible and traditional bigrams, and other techniques produced the most effective algorithm for stylometrically identifying the author of a tweet. With 10 authors, the algorithm correctly identified the author of the tweet with 73 percent accuracy on the first guess and with 87 percent accuracy within the top three guesses, showcasing the potential of stylometric techniques in application to extremely short messages. Moreover, this algorithm has significant potential in investigating anonymous cyber-crimes committed over social media.
高中生科研 英特尔 Intel ISEF
资讯 · 课程 · 全程指导
请扫码添加微信好友
英特尔国际科学与工程大奖赛,简称 "ISEF",由美国 Society for Science and the Public(科学和公共服务协会)主办,英特尔公司冠名赞助,是全球规模最大、等级最高的中学生的科研科创赛事。ISEF 的学术活动学科包括了所有数学、自然科学、工程的全部领域和部分社会科学。ISEF 素有全球青少年科学学术活动的“世界杯”之美誉,旨在鼓励学生团队协作,开拓创新,长期专一深入地研究自己感兴趣的课题。
>>> 实用链接汇总 <<<
· 数学 · 物理 · 化学 · 生物 · 计算机 · 工程 ·
Studies in which the use of machine intelligence is paramount to reducing the reliance on human intervention.
Biomechanics (BIE): Studies and apparatus which mimic the role of mechanics in biological systems.
Cognitive Systems (COG): Studies/apparatus that operate similarly to the ways humans think and process information. Systems that provide for increased interaction of people and machines to more naturally extend and magnify human expertise, activity, and cognition.
Control Theory (CON): Studies that explore the behavior of dynamical systems with inputs, and how their behavior is modified by feedback. This includes new theoretical results and the applications of new and established control methods, system modelling, identification and simulation, the analysis and design of control systems (including computer-aided design), and practical implementation.
Machine Learning (MAC): Construction and/or study of algorithms that can learn from data.
Robot Kinematics (KIN): The study of movement in robotic systems.
Other (OTH): Studies that cannot be assigned to one of the above subcategories. If the project involves multiple subcategories, the principal subcategory should be chosen instead of Other.
© 2024. All Rights Reserved. 沪ICP备2023009024号-1