|
A New Publishing Model in Computer Science | |
Many computer Science researchers are complaining that our emphasis on
highly selective conference publications, and our double-blind
reviewing system stifles innovation and slow the rate of progress
of Science and technology.
This pamphlet proposes a new publishing model based on an open repository
and open (but anonymous) reviews which creates a "market" between papers
and reviewing entities.
NEWS/UPDATES:
THE SHORT STORY:
Our current publication system should be redesigned to maximize
the rate of progress in our field. This means accelerating the speed
at which new ideas and results are exchanged, disseminated, and
evaluated. This also means minimizing the amount of time each of us
spends evaluating other people's work through reviewing and sifting
through the literature. A major issue is that our current system, with
its emphasis on highly-selective conferences, is highly biased against
innovative ideas and favors incremental tweaks on well-established
methods. Ideas that turn out to be highly influential are sometimes
held up for months (if not years) in reviewing purgatory, particularly
if they require several years to come to maturity (there are a few
famous examples, mentioned). The friction in our publication system is
slowing the progress of our field. It makes progress incremental. And
it makes our conferences somewhat boring.
Our current publication system is the result of history, where the
dissemination of scientific information was limited by printing and
shipping capacity and cost. Paradoxically, Computer Science has not
fully taken advantage of the Web as a communication medium to the same
extent that other fields have (such as Physics and Mathematics).
In an attempt to maximize the efficiency of our scientific
communication system, I am proposing a new publication model that
dissociates dissemination from evaluation. Its main characteristics
are as follows:
- Authors post their papers on a repository as soon as they deem them
acceptable for public consumption. Papers can be revised and are
given version numbers. They become immediately citable (examples of
such repositaries in math, physics and social sciences already exist:
arXiv.org SSRN,....).
- The repository provides a web-based infrastructure with which people
can organize themselves into "Reviewing Entities" (RE), similar to
existing journal editorial boards, conference program committees,
special interest groups, etc. Using this infrastructure, REs can
assign reviewers to papers (or simply give a weight to a review
of a paper). An RE may consist of a single individual. or may
be an informal group of people with a common interest (what we
call an "area"), or may be a more traditional editorial board.
- Any RE can choose to review any paper at any time (out of the
author's control). The reviews are published and accessible with the
paper. Reviews include the name of the RE. Individual reviewers that
are part of an RE may choose to reveal their identity or not.
REs may choose to give a rating or "seal of approval" to papers
they review, so as to bring them to the attention of the communities
they cater to. REs do not "own" papers exclusively, as traditional
journals do, so that multiple REs can give ratings to a single paper.
- However, authors may formally request that a particular RE review
their paper. They may make only one such formal request at a time.
The RE is given a short time to accept or refuse to review
(e.g. 2 weeks) and a slightly longer time before the author is
allowed to submit a request to another RE (whether the requested
RE has produced a review or not). RE will have an incentive to
review good papers, as their reputation will increase with the
quality of the papers which they are the first to rate highly.
- Reviews are published and are themselves citable documents with the
same status as regular publications. Reviewers may choose to keep
their names hidden or not. Reviews can be given a rating by readers
(who are themselves REs), hence reviewrs will get objectively evaluated
and will be assigned a kind of "karma" that indicates the average
usefulness of their reviews (many sites such as Slashdot and Amazon
have such a feature). This will give reviewers an incentive to do
a good job. Eventually, people will indicate their reviewing karma
on their CV.
- Papers are under a revision control system. Papers that are revised
as a consequence of a comment or review should cite that comment or
review, thereby giving credit to the reviewer (whether the reviewer
was anonymous or not. as this can be tracked by the system).
The citation identifier of a paper includes its revision number.
- Users can set up alerts for all papers approved by their
favorite REs.
- One can imagine sophisticated credit assignment systems for
reviewers and authors that:
- propagate credit down the citation graph, so that you get
credit if you paper is cited by highly-cited papers.
- gives karma to reviewers whose evaluation was predictive of
the ultimates success of the paper.
SOME PROPERTIES OF THE PROPOSED SYSTEM:
- Basically, this system plays the role of a "stock exchange", an
open market of ideas, in which papers play the role of securities,
and REs play the role of investors. Compared to this, our current
system looks like a highly inefficient, balkanized, closed market,
controlled by a few monopolies. In the new system, REs will actually
*compete* to rate the best papers first.
- This system will fix some of the main inefficiencies of our current
system, namely its costly false negative rate (i.e. the number of
good ideas that are rejected and held in reviewing purgatory for
months or years), as well as its high barrier of entry for new ideas.
With this new system, no paper with a good idea will ever be barred
from dissemination (though it may be ignored for a time). Good ideas,
even if they do not immediately receive a high rating by a prominent
RE will eventually bubble up to the forefront, and their authors will
get the credit they deserve. In the current system, authors of a good
idea that was rejected often get scooped, do not get credit, and
become bitter or demotivated.
- The main failure mode of the new system may be a deluge of terrible
papers posted on the repository, but that is unlikely to become
a problem: people will have little incentive to post papers that have
no chance of gathering interest, because merely posting papers will
not count for anything on your CV. What will count is how much interest
papers gather and how influential they become, in terms of RE ratings,
citations, etc (arXiv.org doesn't seem to suffer from that problem).
And bad papers will simply be ignored.
- The current system of double-blind reviews has a perverse effect:
there is little cost to submitting a half-baked paper to a
conference, since the authors reputation is not really at stake
until the conference accepts the paper. In the new system, what
you post is available to the whole world and engages your reputation.
- With the new system, bad papers will simply be ignored and will
not be a high burden on the reviewing body, as they currently are.
- When an RE refuses to review a paper, the consequence is not as
dire as a rejection from a traditional journal or conference,
because this refusal does not block or delay dissemination. At
worst, it delays whether a paper is brought to the attention of
the community by a couple of weeks.
- Readers will have considerably more information about the opinion
of the community on a paper, since interesting papers will be
accompanied by ratings from multiple REs as well a open reviews
and commments.
- Since reviews are published and citable, reviewers will get credit
for doing a good job. Objective "scores" (or "karma") for reviewers
may be derived from how their reviews are rated by readers, or how
predictive their reviews are of the future success of the papers
they review, or by how often their reviews are cited in revised
versions of the paper, or by whether the papers they review become
highly cited or influential.
- News blogs such as Digg and Slashdot rely of such mechanisms
for collaborative filtering, albeit in a less formal way than
what is proposed here.
- another potential problem of the proposed system is that "the
rich get richer": papers from prominent authors will get numerous
comments, ratings, and seals of approval. The presence of open
comments may prevent undue hype, and attitude that give the benefit
of the doubt to prominent authors.
Papers from unknown authors will be utterly ignored. Perhaps, but
at least they will be published.
- The system will facilitate the creation of "communities of
interest" (groups of people with a common interest in a particular
type of methods or problems). These communities will have the
ability to appear and flourish even if they are against the
currently dominant set of methods. Critics may argue that this
will lead to isolated communities of authors who just cite each
other. Yes, but this is precisely how new fields are created.
A small band of cross-citing authors becomes a full-fledged
"area" of "field" if it grows beyond a certain size.
- It is practically certain that if a paper starts to gather interest
and comes to the forefront, it *will* be properly reviewed, because
authors of alternative methods will have a vested interest in
pointing out the flaws, and collaborators will have a vested
interest in making the paper better.
- incomplete or overlooked citations can be signaled. These can be
taken into account or ignored by the authors in future revisions
of the paper, but it would be at their own risk, since the comments
would be available for all to see.
- It is practically certain a good idea, even if it is initially
ignored, will eventually come to the forefront. If author B
re-invents a previously-posted idea by author A without
citing him or her, author A or his/her friends and collaborators
will comment on B's paper, signaling the omission. Authors will
have an incentive to do proper citations to avoid embarassement.
Some authors may attempt to illegitimately claim priority on
an idea, but they will take the risk of lowering their karma.
- Overall, this will level the playing field. People who are not in the
"in" crowd often get their paper rejected not because it doesn't have
a good idea, or has flaky experiments, but simply because it's not
written in the right lingo and doesn't cite the right papers.
Comments/reviews for such papers would signal these flaws (which could
be corected), but the idea would still be available to the community.
- Overall, this will reduce the number of prominent scientists who
stop coming to a particular conference or stop submitting to a
particular journal in retaliation for what they see as an unjust
rejection.
MORE DETAILS AND BACKGROUND INFORMATION
THE PROBLEMS:
The current publication model in CS has become very inefficient. Our
emphasis on highly-selective conference publications has introduced a
huge amount of friction in our system of information exchange.
Conference reviews tend to favor papers containing incremental
improvements on well-established methods, and tend to reject papers
with truly innovative ideas, particularly if these ideas are part of a
long term vision to solve an important problem (as opposed a
"one-shot" methods that solve a specific problem). The sheer volume
of submissions to conferences and journals overwhelms reviewers, and
the quality of the reviews suffers. Reviewers get very little credit
for doing a good job with reviews and can do a bad job with few
adverse consequences. Our current system seems to be designed almost
entirely to keep scores, at the expense of actually maximizing the
rate of progress of Science.
Many of us have horror stories about how some of our best papers have
been rejected by conferences. Perhaps the best case in point of the
last few years is David Lowe's work on the SIFT method. After years of
being rejected from conference starting in 1997, the journal version
published in 2004 went on to become the most highly cited paper in all
of engineering sciences in 2005.
David Lowe relates the story:
I did submit papers on earlier versions of SIFT to both ICCV and CVPR
(around 1997/98) and both were rejected. I then added more of a
systems flavor and the paper was published at ICCV 1999, but just as a
poster. By then I had decided the computer vision community was not
interested, so I applied for a patent and intended to promote it just
for industrial applications.
Another recent example is Rob Fergus's tiny images paper, which never
did appear in a conference, but already has had a strong impact. I'm
sure there are hundreds of other examples.
David adds:
I very much agree with the concept like arXiv.org in which papers
are immediately published.
Because the publication of these papers was held up, the progress of
computer vision and object recognition was held back by several years.
In fact, the method only became accepted after David was able to
demonstrate his SIFT-based real-time object recognition system. So
much for the efficiency of double-blind reviews.
Others authors offer similar criticism of our current system.
Here is a comment from Alan Yuille:
At present, my mediocre papers get accepted with oral
presentations, while my interesting novel work gets rejected
several times. By contrast, my journal reviewers are a lot slower
but give much better comments and feedback.
[....]
I think the current system is breaking down badly due to the
enormous number of papers submitted to these meetings (NIPS, ICML,
CVPR, ICCV, ECCV) and the impossibility of getting papers reviewed
properly. The system encourages the wrong type of papers and
encourages attention on short term results and minor variations of
existing methods. Even worse it rewards papers which look
superficially good but which fail to survive the more serious
reviewing done by good journals (there have been serious flaws in
some of the recent prize-winning computer vision papers).
Our current system, despite its emphasis on fairness and proper credit
assignment, actually does a pretty bad job at it. I have observed the
following phenomenon several times:
- author A, who is not well connected in the US conference circuit
(perhaps (s)he is from a small European country, or from Asia)
publishes a new idea in an obscure local journal or conference, or
perhaps in a respected venue that is not widely read by the
relevant crowd.
- The paper is ignored for several years.
- Then author B (say a prominent figure in the US) re-invents the
same idea independently, and publishes a paper in a highly visible
venue. This person is prominent and well connected, writes clearly
in English, can write convincing arguments, and gives many talks
and seminars on the topic.
- The idea and the paper gather interest and spurs many follow-up
papers from the community.
- These new papers only cite author B, because they don't know
about author A.
- author C stumbles on the earlier paper from author A and starts
citing it, remarking that A had the idea first.
- The commuity ignores C, and keeps citing B.
Why is this happening? because citing an obscure paper, rather than
an accepted paper by a prominent author is dangerous, and has zero
benefits. Sure, author A might be upset, but who cares about
upsetting some guy from the university of Oriental Syldavia that you
will never have to confront at a conference and who will never be
asked to write a letter for your tenure case? On the other hand,
author B might be asked to write a review for your next paper, your
next grant application, or your tenure case. So, voicing the fact that
he doesn deserve all the credit for the idea is very dangerous. Hence,
you don't cite what's right. You cite what everybody else cites.
In an open review system where people can comment on papers at any
time, this kind of situation would happen less often, because it
would be difficult for author B not to revise his paper with the
proper citation with open comments attached to his paper pointing
to author A.
In our day and age, there is no reason to stop the dissemination of
any paper. But what we need is an efficient system of collaborative
filtering, so that good ideas and important results are brought to the
attention of the masses, and so that flaws and errors are flagged.
Our current system does not really allow for erroneous results,
errors, and flaws to be flagged on published papers.
THE REAL OBJECTIVE:
The overarching objective function that an ideal publication model
should maximize is the Rate of Progress of Science (RPoS) over the
long term. Other issues are secondary, and should be optimized to the
extent that they help maximize RPoS. An essential factor to RPoS is to
eliminate publication delays, and to never stop a good idea from being
disseminated. An other essential factor is to have an efficient
evaluation mechanism so that good ideas and results are brought to the
attention of the community as quickly as possible, while unimportant
results are largely ignored, and wrong ones are quickly flagged (and
perhaps corrected). In other words, to maximize RPoS, we must find
ways to minimize the time each of us spends evaluating other people's
work (publicly or privately). To maximize long-term RPoS, we must have
a good system to assign credit and recognition, since people tend to
lose their motivation if they don't get the recognition they deserve.
Hence, we must have a good credit assignment system, but only to the
extent that it helps maximize RPoS.
A FEW PRINCIPLES:
Maximizing RPoS has a number of consequences:
- 1. Every papers should be available to the community, and be
citable, as soon as they are written (i.e. put up for
distribution by their authors). In our day and age, there is
no reason for papers not to be available to the world (for free)
as soon as they are written.
- 2. The Evaluation process should be separate from dissemination,
because lengthy and faulty evaluations is what currently holds
back the dissemination of good papers.
- 3. Reviews, evaluations, and comments are a form of collaborative
filtering system. their main purposes is to attract the attention
of the community(ies) on interesting contributions, and point out
the flaws on all contributions (good and bad). Good/useful papers
would bubble up to the forefront.
- 4. Reviews should be open, publicly available, and should themselves
be citable documents. This would create an incentive for reviewers
to do a good job. It would also remove the incentive to hold back
useful information from the author for fear of divulging a good
idea with no chance of getting credit for it.
- 5. Bad papers should simply be ignored, so they don't create
a burden on the community at large, which indirectly slows RPoS.
Hence not every disseminated paper should necessarily have reviews.
- 6. Papers should be revisable, with an open versioning system,
so that citations and review can concern a given version,
and so that revisions can take reviews and comments into
account at any time.
The proposed system attempts to abide by these principles.
QUESTIONS, PROBLEMS and ISSUES:
[Draft. To be competed]
- Q: Your proposal is incompatible with our double-blind
reviewing system. Studies have shown that non-double-blind reviewing
is biased against authors who are not from a top institutions, and
authors from underrepresented groups. Non-double-blind reviewing
causes the "rich to get richer".
- A: This is clearly the main issue with this proposal. But
the sytematic use of arXiv.org by the
physics and math community has demonstrated that papers from unknown
authors from obscure institutions (even from no institution at all)
get brought to the forefront whenever they contain good ideas, even
if they have some flaws (there are a few famous examples). The
proposed system is designed to prevent all false negatives,
including those that are caused by reviewer biases of any kind.
Unlike our current system, the proposed system allows reviewing
mistakes to be fixed, and the open review system will prevent
blatant bias. These papers, which contribute to the progress of
Science, would never make it through our current reviewing
system. Good ideas will eventually come to the attention of a
community and their authors will get the credit they deserve *if* we
allow the papers to be disseminated and to receive a time stamp.
The worst that can happen is that they will take time to gather
interest. In our current system, they simply don't get any credit.
- Q: Assuming someone implements this system, how do we get
the community to use it?
- A: We make it the required submission cite for a couple of
major conferences.
- Q: If anyone can publish a paper in the repository without
having to meet the standards of peer review, people will publish
half-baked ideas without any experimental evaluation, or
theoretical justification, and will get undeserved credit for them.
It takes time and effort to demonstrate that a particular idea
works. People who do that should get most of the credit.
- A: First, authors don't automatically get credit for ideas
they publish and certainly don't get to choose who cites
them. Authors of follow-up papers get to choose who they cite, and
they will cite whoever they think deserves the credit. It is
already the case today that the most cited paper on a method is not
the first paper on that method, but the one that popularized the
method or produced the first convincing demonstration of it (there
are famous examples). We certainly don't need any specific
mechanisms to *prevent* ideas from getting out, unless we accept
the idea of slowing the progress of science for fear of unsettling
our current credit assignment system. In fact, authors often cite
papers they admire or were influenced by, and papers they must cite
in order to get passed the reviewing process. They don't always
cite what's right.
|