Proposal for A New Publishing Model in Computer Science

HOME

BLOG/NEWS

BIO

PUBLICATIONS

Websites that I maintain

A New Publishing Model in Computer Science

Many computer Science researchers are complaining that our emphasis on highly selective conference publications, and our double-blind reviewing system stifles innovation and slow the rate of progress of Science and technology.

This pamphlet proposes a new publishing model based on an open repository and open (but anonymous) reviews which creates a "market" between papers and reviewing entities.

NEWS/UPDATES:

2013-10-16: The ICLR 2014 call for paper is out. We will use the same open review process as last year.
2013-10-10: A number of workshops have adopted open reviewing model of ICLR 2013. The list includes the Inferning 2013 Workshop, and the Third International Workshop on Automated Knowledge Base Construction (AKBC) 2013. They both used OpenReview.net.
2013-05-13: At the ICML'13 Workshop on Peer Review and Publishing Models, David Soergel presented a paper co-authored with Adam Saunders and Andrew McCallum (who are behind OpenReview.net). The paper entitled Open Scholarship and Peer Review: a Time for Experimentation talks about OpenReview.net and reports results of a survey of ICLR 2013 authors, reviewers and participants.
2013-01-18: My Google+ post prompted an interesting discussion. Several websites have linked to it and generated discussions of their own, including Reddit.com/r/MachineLearning, Ycombinator.
2013-01-15: The International Conference on Learning Representations (ICLR 2013) is using a reviewing model similar to the one described in this pamphlet. The web infrastructure is OpenReview.net which was developed by Andrew McCallum and his team as U.Mass Amherst, particularly David Soergel.
This pamphlet was the basis of Shirley Zhao's 2012 Master's thesis, in which she worked out in detail how this could be implemented. Shirley is now Physical Science Librarian at Dartmouth.
This pamphlet was linked by John Langford's ML blog, with comments from readers.

THE SHORT STORY:

Our current publication system should be redesigned to maximize the rate of progress in our field. This means accelerating the speed at which new ideas and results are exchanged, disseminated, and evaluated. This also means minimizing the amount of time each of us spends evaluating other people's work through reviewing and sifting through the literature. A major issue is that our current system, with its emphasis on highly-selective conferences, is highly biased against innovative ideas and favors incremental tweaks on well-established methods. Ideas that turn out to be highly influential are sometimes held up for months (if not years) in reviewing purgatory, particularly if they require several years to come to maturity (there are a few famous examples, mentioned). The friction in our publication system is slowing the progress of our field. It makes progress incremental. And it makes our conferences somewhat boring.

Our current publication system is the result of history, where the dissemination of scientific information was limited by printing and shipping capacity and cost. Paradoxically, Computer Science has not fully taken advantage of the Web as a communication medium to the same extent that other fields have (such as Physics and Mathematics).

In an attempt to maximize the efficiency of our scientific communication system, I am proposing a new publication model that dissociates dissemination from evaluation. Its main characteristics are as follows:

Authors post their papers on a repository as soon as they deem them acceptable for public consumption. Papers can be revised and are given version numbers. They become immediately citable (examples of such repositaries in math, physics and social sciences already exist: arXiv.org SSRN,....).
The repository provides a web-based infrastructure with which people can organize themselves into "Reviewing Entities" (RE), similar to existing journal editorial boards, conference program committees, special interest groups, etc. Using this infrastructure, REs can assign reviewers to papers (or simply give a weight to a review of a paper). An RE may consist of a single individual. or may be an informal group of people with a common interest (what we call an "area"), or may be a more traditional editorial board.
Any RE can choose to review any paper at any time (out of the author's control). The reviews are published and accessible with the paper. Reviews include the name of the RE. Individual reviewers that are part of an RE may choose to reveal their identity or not. REs may choose to give a rating or "seal of approval" to papers they review, so as to bring them to the attention of the communities they cater to. REs do not "own" papers exclusively, as traditional journals do, so that multiple REs can give ratings to a single paper.
However, authors may formally request that a particular RE review their paper. They may make only one such formal request at a time. The RE is given a short time to accept or refuse to review (e.g. 2 weeks) and a slightly longer time before the author is allowed to submit a request to another RE (whether the requested RE has produced a review or not). RE will have an incentive to review good papers, as their reputation will increase with the quality of the papers which they are the first to rate highly.
Reviews are published and are themselves citable documents with the same status as regular publications. Reviewers may choose to keep their names hidden or not. Reviews can be given a rating by readers (who are themselves REs), hence reviewrs will get objectively evaluated and will be assigned a kind of "karma" that indicates the average usefulness of their reviews (many sites such as Slashdot and Amazon have such a feature). This will give reviewers an incentive to do a good job. Eventually, people will indicate their reviewing karma on their CV.
Papers are under a revision control system. Papers that are revised as a consequence of a comment or review should cite that comment or review, thereby giving credit to the reviewer (whether the reviewer was anonymous or not. as this can be tracked by the system). The citation identifier of a paper includes its revision number.
Users can set up alerts for all papers approved by their favorite REs.
One can imagine sophisticated credit assignment systems for reviewers and authors that:
- propagate credit down the citation graph, so that you get credit if you paper is cited by highly-cited papers.
- gives karma to reviewers whose evaluation was predictive of the ultimates success of the paper.

SOME PROPERTIES OF THE PROPOSED SYSTEM:

Basically, this system plays the role of a "stock exchange", an open market of ideas, in which papers play the role of securities, and REs play the role of investors. Compared to this, our current system looks like a highly inefficient, balkanized, closed market, controlled by a few monopolies. In the new system, REs will actually *compete* to rate the best papers first.
This system will fix some of the main inefficiencies of our current system, namely its costly false negative rate (i.e. the number of good ideas that are rejected and held in reviewing purgatory for months or years), as well as its high barrier of entry for new ideas. With this new system, no paper with a good idea will ever be barred from dissemination (though it may be ignored for a time). Good ideas, even if they do not immediately receive a high rating by a prominent RE will eventually bubble up to the forefront, and their authors will get the credit they deserve. In the current system, authors of a good idea that was rejected often get scooped, do not get credit, and become bitter or demotivated.
The main failure mode of the new system may be a deluge of terrible papers posted on the repository, but that is unlikely to become a problem: people will have little incentive to post papers that have no chance of gathering interest, because merely posting papers will not count for anything on your CV. What will count is how much interest papers gather and how influential they become, in terms of RE ratings, citations, etc (arXiv.org doesn't seem to suffer from that problem). And bad papers will simply be ignored.
The current system of double-blind reviews has a perverse effect: there is little cost to submitting a half-baked paper to a conference, since the authors reputation is not really at stake until the conference accepts the paper. In the new system, what you post is available to the whole world and engages your reputation.
With the new system, bad papers will simply be ignored and will not be a high burden on the reviewing body, as they currently are.
When an RE refuses to review a paper, the consequence is not as dire as a rejection from a traditional journal or conference, because this refusal does not block or delay dissemination. At worst, it delays whether a paper is brought to the attention of the community by a couple of weeks.
Readers will have considerably more information about the opinion of the community on a paper, since interesting papers will be accompanied by ratings from multiple REs as well a open reviews and commments.
Since reviews are published and citable, reviewers will get credit for doing a good job. Objective "scores" (or "karma") for reviewers may be derived from how their reviews are rated by readers, or how predictive their reviews are of the future success of the papers they review, or by how often their reviews are cited in revised versions of the paper, or by whether the papers they review become highly cited or influential.
News blogs such as Digg and Slashdot rely of such mechanisms for collaborative filtering, albeit in a less formal way than what is proposed here.
another potential problem of the proposed system is that "the rich get richer": papers from prominent authors will get numerous comments, ratings, and seals of approval. The presence of open comments may prevent undue hype, and attitude that give the benefit of the doubt to prominent authors. Papers from unknown authors will be utterly ignored. Perhaps, but at least they will be published.
The system will facilitate the creation of "communities of interest" (groups of people with a common interest in a particular type of methods or problems). These communities will have the ability to appear and flourish even if they are against the currently dominant set of methods. Critics may argue that this will lead to isolated communities of authors who just cite each other. Yes, but this is precisely how new fields are created. A small band of cross-citing authors becomes a full-fledged "area" of "field" if it grows beyond a certain size.
It is practically certain that if a paper starts to gather interest and comes to the forefront, it *will* be properly reviewed, because authors of alternative methods will have a vested interest in pointing out the flaws, and collaborators will have a vested interest in making the paper better.
incomplete or overlooked citations can be signaled. These can be taken into account or ignored by the authors in future revisions of the paper, but it would be at their own risk, since the comments would be available for all to see.
It is practically certain a good idea, even if it is initially ignored, will eventually come to the forefront. If author B re-invents a previously-posted idea by author A without citing him or her, author A or his/her friends and collaborators will comment on B's paper, signaling the omission. Authors will have an incentive to do proper citations to avoid embarassement. Some authors may attempt to illegitimately claim priority on an idea, but they will take the risk of lowering their karma.
Overall, this will level the playing field. People who are not in the "in" crowd often get their paper rejected not because it doesn't have a good idea, or has flaky experiments, but simply because it's not written in the right lingo and doesn't cite the right papers. Comments/reviews for such papers would signal these flaws (which could be corected), but the idea would still be available to the community.
Overall, this will reduce the number of prominent scientists who stop coming to a particular conference or stop submitting to a particular journal in retaliation for what they see as an unjust rejection.

MORE DETAILS AND BACKGROUND INFORMATION

THE PROBLEMS:

The current publication model in CS has become very inefficient. Our emphasis on highly-selective conference publications has introduced a huge amount of friction in our system of information exchange. Conference reviews tend to favor papers containing incremental improvements on well-established methods, and tend to reject papers with truly innovative ideas, particularly if these ideas are part of a long term vision to solve an important problem (as opposed a "one-shot" methods that solve a specific problem). The sheer volume of submissions to conferences and journals overwhelms reviewers, and the quality of the reviews suffers. Reviewers get very little credit for doing a good job with reviews and can do a bad job with few adverse consequences. Our current system seems to be designed almost entirely to keep scores, at the expense of actually maximizing the rate of progress of Science.

Many of us have horror stories about how some of our best papers have been rejected by conferences. Perhaps the best case in point of the last few years is David Lowe's work on the SIFT method. After years of being rejected from conference starting in 1997, the journal version published in 2004 went on to become the most highly cited paper in all of engineering sciences in 2005.

David Lowe relates the story:

I did submit papers on earlier versions of SIFT to both ICCV and CVPR (around 1997/98) and both were rejected. I then added more of a systems flavor and the paper was published at ICCV 1999, but just as a poster. By then I had decided the computer vision community was not interested, so I applied for a patent and intended to promote it just for industrial applications.
Another recent example is Rob Fergus's tiny images paper, which never did appear in a conference, but already has had a strong impact. I'm sure there are hundreds of other examples.

David adds:

I very much agree with the concept like arXiv.org in which papers are immediately published.

Because the publication of these papers was held up, the progress of computer vision and object recognition was held back by several years. In fact, the method only became accepted after David was able to demonstrate his SIFT-based real-time object recognition system. So much for the efficiency of double-blind reviews.

Others authors offer similar criticism of our current system. Here is a comment from Alan Yuille:

At present, my mediocre papers get accepted with oral presentations, while my interesting novel work gets rejected several times. By contrast, my journal reviewers are a lot slower but give much better comments and feedback. [....]
I think the current system is breaking down badly due to the enormous number of papers submitted to these meetings (NIPS, ICML, CVPR, ICCV, ECCV) and the impossibility of getting papers reviewed properly. The system encourages the wrong type of papers and encourages attention on short term results and minor variations of existing methods. Even worse it rewards papers which look superficially good but which fail to survive the more serious reviewing done by good journals (there have been serious flaws in some of the recent prize-winning computer vision papers).

Our current system, despite its emphasis on fairness and proper credit assignment, actually does a pretty bad job at it. I have observed the following phenomenon several times:

author A, who is not well connected in the US conference circuit (perhaps (s)he is from a small European country, or from Asia) publishes a new idea in an obscure local journal or conference, or perhaps in a respected venue that is not widely read by the relevant crowd.
The paper is ignored for several years.
Then author B (say a prominent figure in the US) re-invents the same idea independently, and publishes a paper in a highly visible venue. This person is prominent and well connected, writes clearly in English, can write convincing arguments, and gives many talks and seminars on the topic.
The idea and the paper gather interest and spurs many follow-up papers from the community.
These new papers only cite author B, because they don't know about author A.
author C stumbles on the earlier paper from author A and starts citing it, remarking that A had the idea first.
The commuity ignores C, and keeps citing B.

Why is this happening? because citing an obscure paper, rather than an accepted paper by a prominent author is dangerous, and has zero benefits. Sure, author A might be upset, but who cares about upsetting some guy from the university of Oriental Syldavia that you will never have to confront at a conference and who will never be asked to write a letter for your tenure case? On the other hand, author B might be asked to write a review for your next paper, your next grant application, or your tenure case. So, voicing the fact that he doesn deserve all the credit for the idea is very dangerous. Hence, you don't cite what's right. You cite what everybody else cites.

In an open review system where people can comment on papers at any time, this kind of situation would happen less often, because it would be difficult for author B not to revise his paper with the proper citation with open comments attached to his paper pointing to author A.

In our day and age, there is no reason to stop the dissemination of any paper. But what we need is an efficient system of collaborative filtering, so that good ideas and important results are brought to the attention of the masses, and so that flaws and errors are flagged. Our current system does not really allow for erroneous results, errors, and flaws to be flagged on published papers.

THE REAL OBJECTIVE:

The overarching objective function that an ideal publication model should maximize is the Rate of Progress of Science (RPoS) over the long term. Other issues are secondary, and should be optimized to the extent that they help maximize RPoS. An essential factor to RPoS is to eliminate publication delays, and to never stop a good idea from being disseminated. An other essential factor is to have an efficient evaluation mechanism so that good ideas and results are brought to the attention of the community as quickly as possible, while unimportant results are largely ignored, and wrong ones are quickly flagged (and perhaps corrected). In other words, to maximize RPoS, we must find ways to minimize the time each of us spends evaluating other people's work (publicly or privately). To maximize long-term RPoS, we must have a good system to assign credit and recognition, since people tend to lose their motivation if they don't get the recognition they deserve. Hence, we must have a good credit assignment system, but only to the extent that it helps maximize RPoS.

A FEW PRINCIPLES:

Maximizing RPoS has a number of consequences:

1. Every papers should be available to the community, and be citable, as soon as they are written (i.e. put up for distribution by their authors). In our day and age, there is no reason for papers not to be available to the world (for free) as soon as they are written.
2. The Evaluation process should be separate from dissemination, because lengthy and faulty evaluations is what currently holds back the dissemination of good papers.
3. Reviews, evaluations, and comments are a form of collaborative filtering system. their main purposes is to attract the attention of the community(ies) on interesting contributions, and point out the flaws on all contributions (good and bad). Good/useful papers would bubble up to the forefront.
4. Reviews should be open, publicly available, and should themselves be citable documents. This would create an incentive for reviewers to do a good job. It would also remove the incentive to hold back useful information from the author for fear of divulging a good idea with no chance of getting credit for it.
5. Bad papers should simply be ignored, so they don't create a burden on the community at large, which indirectly slows RPoS. Hence not every disseminated paper should necessarily have reviews.
6. Papers should be revisable, with an open versioning system, so that citations and review can concern a given version, and so that revisions can take reviews and comments into account at any time.

The proposed system attempts to abide by these principles.

QUESTIONS, PROBLEMS and ISSUES:

[Draft. To be competed]

Q: Your proposal is incompatible with our double-blind reviewing system. Studies have shown that non-double-blind reviewing is biased against authors who are not from a top institutions, and authors from underrepresented groups. Non-double-blind reviewing causes the "rich to get richer".
A: This is clearly the main issue with this proposal. But the sytematic use of arXiv.org by the physics and math community has demonstrated that papers from unknown authors from obscure institutions (even from no institution at all) get brought to the forefront whenever they contain good ideas, even if they have some flaws (there are a few famous examples). The proposed system is designed to prevent all false negatives, including those that are caused by reviewer biases of any kind. Unlike our current system, the proposed system allows reviewing mistakes to be fixed, and the open review system will prevent blatant bias. These papers, which contribute to the progress of Science, would never make it through our current reviewing system. Good ideas will eventually come to the attention of a community and their authors will get the credit they deserve *if* we allow the papers to be disseminated and to receive a time stamp. The worst that can happen is that they will take time to gather interest. In our current system, they simply don't get any credit.

Q: Assuming someone implements this system, how do we get the community to use it?
A: We make it the required submission cite for a couple of major conferences.

Q: If anyone can publish a paper in the repository without having to meet the standards of peer review, people will publish half-baked ideas without any experimental evaluation, or theoretical justification, and will get undeserved credit for them. It takes time and effort to demonstrate that a particular idea works. People who do that should get most of the credit.
A: First, authors don't automatically get credit for ideas they publish and certainly don't get to choose who cites them. Authors of follow-up papers get to choose who they cite, and they will cite whoever they think deserves the credit. It is already the case today that the most cited paper on a method is not the first paper on that method, but the one that popularized the method or produced the first convincing demonstration of it (there are famous examples). We certainly don't need any specific mechanisms to *prevent* ideas from getting out, unless we accept the idea of slowing the progress of science for fear of unsettling our current credit assignment system. In fact, authors often cite papers they admire or were influenced by, and papers they must cite in order to get passed the reviewing process. They don't always cite what's right.

[HOME]	[NEWS]	[PUBLICATIONS]
[RESEARCH]	[DOWNLOADS]
[LENET]	[MUSIC]	[PHOTOS]
[HOBBIES]	[FUN]	[LINKS]

Yann LeCun, Professor
The Courant Institute of Mathematical Sciences
Room 1221, 715 Broadway, New York, NY 10012, USA
tel: (212)998-3283

Yann LeCun, Le Cun, LeNet, DjVu, convolutional neural networks, machine learning, computer vision, pattern recognition, document imaging, image compression, digital libraries,