I've now read Yuan, Liu and @gneubig 's paper "Can We Automate Scientific Reviewing?" https://arxiv.org/abs/2102.00176 

... and of course I have a few things to say: /1
I'd like to start with the positive. I appreciate that this paper is direct about what doesn't work well and furthermore clearly concludes that the answer to the the question posed by the title is negative:

/2
Furthermore, the paper includes significant investigation into the ways in which the system produce biased results, which would need to be accounted for in any deployment context.

/3
Also, the paper is thoughtful about what the components of a good (constructive) review are and designs an evaluation that looks at most of those components in turn.

/4
However, I think this paper also provides a case study in mismatch between technology and use case as well as a case study in "AI hype".

/5
Re hype: The text actually positions the system as one that assists reviewers in drafting their review, especially the part that summarizes the paper: /6
This is not at all what the title of the paper suggests. Where journalists aren't usually in charge of the headlines their articles appear under, academics get to write our own titles, and I think we need to be careful. (Especially given how the press picks up our work.)

/7
Also re hype: Answering the title question with "not yet" rather than "NO" subtly frames the results of a this paper as step in that direction, when it isn't.

/8
Re task/tech match: The authors cast reviewing as a variation on the task of summarization, but peer review isn't just about reacting to what's in the paper: human peer reviewers evaluate the paper under review with respect to their knowledge of the field.

/9
Yuan et al do call out to the importance of "external knowledge" in future work. However, while summarization can be cast as a text transformation task, scientific peer review cannot: it requires understanding and expertise, which citation/knowledge graphs aren't. /10
Yuan et al do not claim that their system is understanding anything, just that it can possibly provide first draft reviews that help (especially junior) reviewers by showing them what is expected. /11
They also (I think tongue-in-cheek?) suggest that their system is a "domain expert" that can help a reader grasp the main idea of the paper. /12
But casting a system like this as "domain expert" is vastly overselling what it can do. /13
Furthermore, it seems to me there are large risks in pre-populating reviews, in terms of how the system will nudge reviewers to value certain things---especially reviewers who are unpracticed or unsure. /14
If the purpose of this study is to actually build software to solve some problem in the world, then the software needs to be situated in its use case and failure modes explored. /15
Without careful study of how people would make use of an automatic summarization system in the course of reviewing, I'm not prepared to accept the assertion that it would be a beneficial addition to peer review. /16
Finally, I'd like to take issue with this framing, from the introduction:
Yes, peer review takes time, effort and care. And it is valuable in exactly that measure. I don't think that being time-consuming + important is motivation for automating a task. /fin
You can follow @emilymbender.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled:

By continuing to use the site, you are consenting to the use of cookies as explained in our Cookie Policy to improve your experience.