T. R. Girill Society for Technical Communication/Lawrence Livermore National Lab. firstname.lastname@example.org Technical Writing: In Science, Readability Breeds Virality In May, 2012, LLNL hosted a poster symposium at which 14 local high-school students reported on their own year-long research projects. As more students engage in such original research, more of them face the realistic challenge of effectively sharing their methods and results with the scientific community. Of course, working scientists vary greatly in their own ability to report research; the most influential are those whose articles are downloaded, read, and cited most often. So student and professional researchers alike want to know if the text-usability features promoted in this series of notes really make any difference in communicating one's science: are they really more prevalent in highly downloaded and cited papers? Marco Guerini, Alberto Pepe, and Bruno Lepri have found a clever way to answer this "return on investment" question with relevant empirical data ("Do linguistic style and readability of scientific abstracts affect their virality?" 19 March 2012, http://arxiv.org/abs/1203.4238, a preprint from Proceedings of the Sixth International AAAI Conference on Weblogs and Social Media, 2012). The Test Cases The NASA Astrophysics Data System (adswww.harvard.edu) holds over 9 million records, mostly published articles on physics, geophysics, and astronomy (broadly construed). From this database, Guerini, Pepe, and Lepri extracted three subsets to answer their title question: 1. 3000 randomly selected articles, 2. the 3000 most frequently downloaded articles (at least 330 times each), and 3. the 3000 most frequently cited articles (at least 350 times each). They then computed readability-formula scores for the random sample as well as readability-feature occurrence rates for the random sample, and compared them with the corresponding values for the most downloaded and most cited papers (they actually used only abstracts, not whole papers, to save computational time). Being downloaded means that readers of an abstract felt that the paper was probably worth reading, while being cited means that people who read the paper were impressed enough by it to reference it later in their own published articles. So downloading and citing are two different measures of "virality" or influence in science publishing. Features That Matter These three fellows found that "formula readability" (which mostly means using relatively shorter sentences and shorter words) correlates strongly with article downloads (because even sophisticated professional scientists prefer to read articles that are relatively easy to understand). They also found a significantly high occurrence rate for readability-enhancing text features in both the highly downloaded and the highly cited papers. They divided text features into classes or clusters using the standard Linguistic Inquiry and Word Count (www.liwc.net/descriptiontable1.php), computed the occurrence rate for each cluster in the random sample, and then used that as the baseline (rate = 1) to compute relative feature frequency in the most downloaded and most cited samples. Several of these text features are just the ones recommended to make technical articles more usable, more helpful to those who try to understand and apply them. For example, active voice (I, we, ours) occurred 3.56 times more often in the highly downloaded articles and 1.82 times more often in the highly cited articles than in random articles. Pronouns (this, these, who), which serve to connect related clauses and sentences together, were 3.7 times more frequent in the most downloaded and 1.84 times more frequent in the most cited papers than in the random ones. Interpretive clues for readers (such as variations on the strings discuss*, argue*, and interact*) occurred 1.94 times more often in most-downloaded and 1.63 times more often in most-cited papers than in the random baseline. And finally, comparisons and contrasts (like, as) were 1.3 times more common in most-downloaded and 1.54 times more common in most-cited papers than in random ones. Choosing To Be Readable All of these differences in "linguistic style," in the occurrence of sets of usability-related text features, were statistically significant. The authors interpret this to show that those features do indeed predict high download and citation rates in real-life science publishing. They conclude that "virality is a phenomenon with many facets" (p. 3), and that not only technical content but also writer-chosen text features strongly affect how often a science paper is read and cited. Just as you teach your students how to perform research, you can also teach them how to communicate well about their research--by pointing out and practicing these text-usability features. Of course writing advice is easy to give but sometimes hard to take. Guerini, Pepe, and Lepri show that they take their work seriously when they end their paper by revealing how they edited their own draft abstract to increase the rate at with it includes the very usability-boosting text features that they had studied. Your students can benefit by using this same approach. [For specific classroom activities, see "Technical Writing in Science Class" at http://www.ebstc.org/TechLit/handbook/handbooktoc.html ].