NSF CAREER Award Boosts Data Extraction Research
With rapidly evolving current events, decision makers may be inundated with messages. With an increasing flow of information from multiple sources, data extraction is an important facet of controlling and identifying key patterns and events. How can data science experts help to simplify complex or extensive documents and streams of information?
Dr. Xinya Du, an assistant professor of computer science in the Erik Jonsson School of Engineering and Computer Science at The University of Texas at Dallas, received a 2024 Faculty Early Career Development Program (CAREER) award from the National Science Foundation (NSF) to support research reducing information overload.
Du’s $561,000 CAREER grant supports his work to build data extraction methods that identify key information from large amounts of text.
The vast amounts of event-related information published daily create an overwhelming flood of data that far exceeds the cognitive capacity of any individual, Du said.
“My research focuses on developing advanced techniques to automatically extract succinct event knowledge, enabling people to quickly capture critical insights from complex documents,” Du said. “I aim to tackle the information overload problem in people’s daily lives by developing innovative natural language processing (NLP) techniques that transform unstructured text in long and complex documents into structured, comprehensive knowledge graphs.”
About NSF CAREER Awards
Faculty Early Career Development Program (CAREER) awards from the National Science Foundation are competitive awards for promising early career faculty who have the potential to become leaders and role models both as researchers and as educators.
Since 2010, UT Dallas faculty members have received 58 NSF CAREER awards.
Du intends to use current methods of natural language processing to improve rapidly evolving event information in a more efficient manner. Instead of using a predefined schema, which is common for most current NLP models, Du will test a question-answer generation paradigm to allow a new representation of events from clusters of documents discussing the same events.
Major applications of the technology include may include reasoning based upon legal documents, prediction of disease outbreaks, risk prevention and biomedical document understanding, which Du indicates currently require inefficient or costly methods. Du, who teaches natural language processing at UT Dallas, will incorporate the research into his courses of study.
Du joined the University in 2022 after serving as a postdoctoral research associate at the University of Illinois at Urbana-Champaign. He earned a bachelor of engineering in computer science from Shanghai Jiao Tong University and a PhD in computer science from Cornell University. In 2021, he was named a Rising Star in Data Science by The University of Chicago Data Science Institute, and he also received an Amazon Research Award in 2023 and a Cisco Faculty Research Award in 2024.