CS Department Automatic Document Summarizer Among Top Two

Due to the web's ongoing "information explosion", there have been increasing efforts to minimize the vast amounts of text found from online sites through automated summarization. Computer Science Department Graduate Student Araly Barrera and her mentor, Professor Verma, have designed and implemented, SynSem, a single document summarizer that exploits a document's word popularity, sentence position, and semantic linkage as three main approaches for sentence extraction. SysSem's algorithms are based on their analysis of a human summarization data set and a previous summarizer called WN-SUM.

Single-document summarization is an area that has been showing declining interest recently due to the difficulties experienced by researchers in beating baseline summary quality on news articles. However, testing SynSem on separate datasets, composed mainly of scientific and news-wire articles, shows successful evaluation results, which will reinvigorate single-document summarization. With the help of an automated summary evaluator, ROUGE, Araly and Dr. Verma compared and observed significant quality outperformance to some of the most sophisticated summarizers of today including MEAD, TextRank, and 14 out of 15 systems that participated in the NIST Document Understanding Conference Competition of 2002. 2002 was the last year when NIST organized a single-document summarization competition. More importantly, SynSem constantly outperforms a document's baseline throughout experimentation, a result that is sure to spark interest. These findings have implications to multiple-document as well as journal article summarization.

A paper based on these findings has been accepted at the ACM SAC 2011 Conference in Taiwan.