STEM education provides many opportunities and challenges. How can our practice evolve to meet the needs of 21st-century learners?

Technical Writing: In Science, Readability Breeds Virality

T.R. Girill Technical Literacy Project leader, STC and LLNL

T. R. Girill
Society for Technical Communication/Lawrence Livermore National Lab.
trgirill@acm.org

Technical Writing: In Science, Readability Breeds Virality

In May, 2012, LLNL hosted a poster symposium at which 14 local high-school students
reported on their own year-long research projects. As more students engage in such
original research, more of them face the realistic challenge of effectively sharing their
methods and results with the scientific community. Of course, working scientists vary
greatly in their own ability to report research; the most influential are those whose articles
are downloaded, read, and cited most often. So student and professional researchers
alike want to know if the text-usability features promoted in this series of notes really
make any difference in communicating one's science: are they really more prevalent in
highly downloaded and cited papers? Marco Guerini, Alberto Pepe, and Bruno Lepri
have found a clever way to answer this "return on investment" question with relevant
empirical data ("Do linguistic style and readability of scientific abstracts affect their
virality?" 19 March 2012, http://arxiv.org/abs/1203.4238, a preprint from Proceedings of
the Sixth International AAAI Conference on Weblogs and Social Media, 2012).

The Test Cases

The NASA Astrophysics Data System (adswww.harvard.edu) holds over 9 million records,
mostly published articles on physics, geophysics, and astronomy (broadly construed).
From this database, Guerini, Pepe, and Lepri extracted three subsets to answer their title
question:
1. 3000 randomly selected articles,
2. the 3000 most frequently downloaded articles (at least 330 times each), and
3. the 3000 most frequently cited articles (at least 350 times each).
They then computed readability-formula scores for the random sample as well as
readability-feature occurrence rates for the random sample, and compared them with
the corresponding values for the most downloaded and most cited papers (they
actually used only abstracts, not whole papers, to save computational time). Being
downloaded means that readers of an abstract felt that the paper was probably worth
reading, while being cited means that people who read the paper were impressed
enough by it to reference it later in their own published articles. So downloading and
citing are two different measures of "virality" or influence in science publishing.

Features That Matter

These three fellows found that "formula readability" (which mostly means using relatively
shorter sentences and shorter words) correlates strongly with article downloads
(because even sophisticated professional scientists prefer to read articles that are
relatively easy to understand). They also found a significantly high occurrence rate for
readability-enhancing text features in both the highly downloaded and the highly cited
papers. They divided text features into classes or clusters using the standard
Linguistic Inquiry and Word Count (www.liwc.net/descriptiontable1.php), computed the
occurrence rate for each cluster in the random sample, and then used that as the
baseline (rate = 1) to compute relative feature frequency in the most downloaded and
most cited samples.

Several of these text features are just the ones recommended to make technical articles
more usable, more helpful to those who try to understand and apply them. For example,
active voice (I, we, ours) occurred 3.56 times more often in the highly downloaded
articles and 1.82 times more often in the highly cited articles than in random articles.
Pronouns (this, these, who), which serve to connect related clauses and sentences
together, were 3.7 times more frequent in the most downloaded and 1.84 times more
frequent in the most cited papers than in the random ones. Interpretive clues for
readers (such as variations on the strings discuss*, argue*, and interact*) occurred 1.94
times more often in most-downloaded and 1.63 times more often in most-cited papers
than in the random baseline. And finally, comparisons and contrasts (like, as) were
1.3 times more common in most-downloaded and 1.54 times more common in most-cited
papers than in random ones.

Choosing To Be Readable

All of these differences in "linguistic style," in the occurrence of sets of usability-related
text features, were statistically significant. The authors interpret this to show that those
features do indeed predict high download and citation rates in real-life science publishing.
They conclude that "virality is a phenomenon with many facets" (p. 3), and that not only
technical content but also writer-chosen text features strongly affect how often a science
paper is read and cited. Just as you teach your students how to perform research, you
can also teach them how to communicate well about their research--by pointing out
and practicing these text-usability features.

Of course writing advice is easy to give but sometimes hard to take. Guerini, Pepe, and
Lepri show that they take their work seriously when they end their paper by revealing how
they edited their own draft abstract to increase the rate at with it includes the very
usability-boosting text features that they had studied. Your students can benefit by
using this same approach. [For specific classroom activities, see "Technical Writing
in Science Class" at http://www.ebstc.org/TechLit/handbook/handbooktoc.html ].

Comments (0)

Comment RSS
see more see less