Soft peer review: Social software and distributed 
scientific evaluation 

Dario TARABORELLI *

Department of Psychology
University College London

Gower Street
London WC1 6BT
United Kingdom

d.taraborelli@ucl.ac.uk

Abstract The debate on the prospects of peer-review in the Internet age and the 
increasing  criticism  leveled  against  the  dominant  role  of  impact  factor 
indicators are calling for new measurable criteria to assess scientific quality. 
Usage-based metrics offer a new avenue to scientific quality assessment but 
face  the  same  risks  as  first  generation  search  engines  that  used  unreliable 
metrics (such as raw traffic data) to estimate content quality. In this article I 
analyze the contribution that social bookmarking systems can provide to the 
problem  of  usage-based  metrics  for  scientific  evaluation.  I  suggest  that 
collaboratively aggregated metadata may help fill the gap between traditional 
citation-based  criteria  and  raw  usage  factors.  I  submit  that  bottom-up, 
distributed evaluation models such as those afforded by social  bookmarking 
will challenge more traditional quality assessment models in terms of coverage, 
efficiency and scalability. Services aggregating user-related quality indicators 
for online scientific content will come to occupy a key function in the scholarly 
communication system.

Keywords  peer review; rating; impact factor; citation analysis; usage factors; 
scholarly  publishing;  social  bookmarking;  collaborative  annotation;  online 
reference managers; social software; web 2.0; tagging; folksonomy. 

*  This paper is based on ideas previously published on a post on the Academic Productivity 
blog. Thanks to Stevan Harnad, Christophe Heintz, Bastien Guerry and several readers of 
Academic Productivity for valuable feedback on earlier versions of this paper. I am also 
grateful to Kevin Emamy (CiteULike) and Ian Mulvany (Connotea) for disclosing facts and 
figures about their services. This work was partly supported by a Marie Curie fellowship 
from the European Commission (MEIF-CT-2006-024460). 

http://www.academicproductivity.com/blog/2007/soft-peer-review-social-software-and-distributed-scientific-evaluation/
http://www.academicproductivity.com/blog/2007/soft-peer-review-social-software-and-distributed-scientific-evaluation/


2  /    DARIO TARABORELLI

1 Beyond peer review: usage-based metrics and scientific quality 
assessment 

A large  debate  has  addressed  in  recent  years  the  peer-review model  of  scientific 
assessment, questioning, among others, its ability to be affordable, accurate, timely, 
objective,  and efficient at  detecting fraud. [2] The debate tackled in particular the 
issue of what measurable indicators are available to estimate the value of scientific 
knowledge production. 

The motivations behind this debate are manifold, but they are partly related to the 
problem of the explosion of scientific content available in the World Wide Web. The 
massive  availability  of  scientific  content  in  the  Internet  is  challenging  the  role 
academic journals had in the past as privileged vehicles for scientific communication 
and as filters of scientific quality: the Web has been actually paving the way to new 
forms of scientific evaluation (such as open peer review or open peer commentary 
[12]) that were not conceivable as such in the past. More dramatically, the Web is 
blurring the traditional distinction between content that has been selected because of 
peer  review (what  we may refer  to  as  a priori  scientific  quality  assessment)  and 
content whose quality is determined by other criteria after its selection for publication 
(or a posteriori scientific quality assessment). Even if the importance of rigorous pre-
publication selection criteria as a condition to secure scientific quality has hardly been 
compared  to  that  of  post-publication  impact  assessment,  models  such  as  Paul 
Ginsparg’s two-tiered selection [8] have already started challenging the monolithic 
distinction between a priori and a posteriori evaluation. 

Impact  factor  [7]  has  undoubtedly  become the  de  facto  standard to  measure  a 
posteriori scientific significance in many areas of research, but it has been challenged 
by  several  authors  calling  for  more  accurate  or  alternative  indicators.  [3,  9]  The 
necessity  of  new assessment  strategies  to  overcome the  limits  of  traditional  peer 
review and  the  need  of  new metrics  to  complement  impact  factor  indicators  has 
become the object of a lively discussion in the literature. In the field of Open Access, 
projects such as CiteBase or OpCit have been introduced to enable tracking popularity 
metrics such as the number of views or  downloads per article and to explore the 
relationship between usage and impact for free online papers. Harnad observes that 
usage-based  metrics  are  increasingly  perceived  by  the  scientific  community  as  a 
necessary  complement  to  traditional  peer  review  as  an  indicator  of  scientific 
significance: 

a new potential measure of on-line impact, not available in the on-paper era, is usage, in the 
form of “hits”. This measure is noisy [in that] it can be inflated by automated web-crawlers, 
short-changed  by  intermediate  caches,  abused  by  deliberate  self  hits  from authors,  and 
undiscriminating between nonspecific site browsing and item-specific reading) (...), [but ] 
seems  to  have  some signal-value  too,  partly  correlated  with  and  partly  independent  of 
citation impact. (S. Harnad, cit. in McKiernan [16]) 

Whereas the search engine literature has long since acknowledged that hits or raw 
usage data provide a poor measure of popularity (let alone quality), there has been 
relatively  little  work  on  potential  usage-related  metrics  that  could  complement 

http://opcit.eprints.org/
http://www.citebase.org/


SOFT PEER REVIEW: SOCIAL SOFTWARE AND DISTRIBUTED SCIENTIFIC EVALUATION  /    3

traditional quality indicators such as impact factor in the field of scientific literature. 
[5, 15] 

A first milestone in this sense is a report published by the  UK Serials Group on 
online usage factors (UF), whose objective was “to obtain an initial assessment of the 
feasibility of developing and implementing journal usage factors” as a criterion to 
measure scientific quality. [18] It is worth reporting some of the results of this survey:

• the majority of publishers are supportive of the UF concept, appear to be willing, in 
principle to participate in the calculation and publication of UFs, and are prepared to 
see their journals ranked according to UF;

• there  is  a  diversity  of  opinion  on  the  way  in  which  UF  should  be  calculated,  in 
particular on how to define the following terms: total usage, specified usage period, and 
total number of articles published online. Tests with real usage data will be required to 
refine the definitions for these terms;

• there  is  not  a  significant difference between authors  in  different  areas of academic 
research on the validity of journal Impact Factors as a measure of quality;

• the great majority of authors in all fields of academic research would welcome a new, 
usage-based measure of the value of journals;

• UF, were it available, would be a highly ranked factor by librarians, not only in the 
evaluation of journals for potential purchase, but also in the evaluation of journals for 
retention or cancellation;

• publishers are, on the whole, unwilling to provide their usage data to a third party for 
consolidation and for calculation of UF. The majority appear to be willing to calculate 
UFs for their own journals and to have this process audited;

• there  are several structural problems with online usage data  that would have to be 
addressed for UFs to be credible. Notable among these is the perception that online 
usage data is much more easily manipulated than is citation data. 

The  results  of  this  survey  clearly  show that  usage-based  metrics,  as  a  way to 
complement traditional peer review, are perceived as a major need by several actors 
(authors, librarians, publishers) in the scientific communication system. It should be 
noted, though, that the scope of this survey was limited to the study of access data of 
online resources. Whereas usage statistics (such as those collected by the COUNTER 
project) certainly provide valuable information to estimate the popularity of online 
resources, it is arguable whether they will be able to correctly represent quality or 
scientific authority. In particular, it is debatable whether they will be able to overcome 
the major issues that afflicted search engine research over the last decade, which led it 
to abandon raw traffic data in favor of more accurate,  scalable and spam-resistant 
criteria for quality assessment. 

Online  access  data  belong  to  a  family of  traditional  ranking metrics  that  were 
recently challenged by the so called Web 2.0 revolution and by the diffusion of social 
software and socially aggregated web metrics. Surprisingly, little has been done to 
date  to  understand  how  to  combine  the  benefits  of  social  network  analysis  with 
scientific quality assessment in light of the new forms of collaboration allowed by 
Web 2.0 services. 

The question I aim to address in this paper is the following: is there any kind of 
measurable indicator to bridge the gap between citation analysis and impact factor on 
the one hand and raw access data on the other hand in order  to provide efficient 
measures of scientific quality as it is perceived by the academic community? 

http://en.wikipedia.org/wiki/Social_software
http://en.wikipedia.org/wiki/Social_software
http://en.wikipedia.org/wiki/Web_2
http://www.projectcounter.org/
http://www.uksg.org/


4  /    DARIO TARABORELLI

I will argue that social software (in particular social bookmarking systems) offer a 
unique opportunity to provide costless and accurate metrics that may become in the 
long run more relevant to measure scientific impact than raw hits or other forms of 
usage-based statistics. I review in particular the case of social bookmarking systems 
targeted at the academic community such as Nature’s  Connotea and  CiteULike and 
discuss the challenges traditional scientific evaluation processes face when compared 
with these systems. 

2 Social software and collaborative metadata 

Online reference managers are extraordinary productivity tools: they allow users to 
file scientific references from online databases and easily access, annotate, categorize 
and share these references with collaborators. It would be a mistake, though, to take 
this as their primary interest for the academic community. As it is often the case for 
social  software  services,  online  reference  managers  are  becoming  powerful  and 
costless solutions to collect large sets of metadata, in this case socially aggregated 
metadata on scientific literature. 

An item in an online bookmarking system (e.g. a paper from an academic journal) 
is described by a list of tags, ratings, annotations compiled by the user when filing the 
item in  his  or  her  library.  Online  reference  managers  allow such  metadata  to  be 
aggregated  from the  entire  user  community.  Taken  at  the  individual  level,  these 
metadata  are  hardly  of  any  interest,  but  at  a  large  scale  metrics  based  on  these 
metadata are likely to outperform more traditional evaluation processes in terms of 
coverage, speed and efficiency. Social metadata cannot offer the same guarantees as 
standard selection processes (insofar as they do not rely on experts’ reviews and are 
less immune to biases and manipulations). However, they are an interesting solution 
for producing virtually costless evaluative representations of scientific knowledge at a 
very large scale. 

Traditional peer review has been criticized on various grounds but possibly the 
major  threat  it  is  currently  facing  is  scalability,  i.e.  the  ability  to  cope  with  an 
increasingly large number of scientific paper submissions, which–given the limited 
number of available reviewers and time constraints on the publication cycle–results in 
a relative smaller and smaller acceptance rate for high quality journals. 

Although  ratings  based  on  collaborative  metadata  will  never  replace  hard 
evaluation models  such as  traditional  peer  review,  they are in  a  good position to 
outperform them in terms of efficiency and scalability, at least as soon as they reach 
critical  mass  of  users.  When this  happens  and  as  soon  as  their  potential  is  fully 
acknowledged,  I  anticipate  that  academic  content  providers  (including  publishers, 
scientific  portals  and bibliographic  databases)  will  be  urged to  integrate metadata 
from social software services. 

The following is a list of areas in which I expect metrics from social bookmarking 
services  targeted  at  the  academic  community  to  challenge  traditional  quality 
assessment indicators.

http://www.citeulike.org/
http://www.connotea.org/


SOFT PEER REVIEW: SOCIAL SOFTWARE AND DISTRIBUTED SCIENTIFIC EVALUATION  /    5

2.1 Semantic relevance 

A widely acknowledged application of tags as collaborative metadata is their use as 
semantic descriptors. Tagging is the most popular example of how social software (at 
least according to its defendants) helped overcome the limits of traditional, top-down 
approaches to content categorization. Collaboratively aggregated tags can be used to 
extract similarity patterns, for automatic clustering or to improve the quality of search 
engine results.[4, 19] 

In the case of academic literature, tags can provide extensive lists of keywords for 
scientific papers, often more accurate and descriptive than those originally added by 
the author. Figures 1. and 2. compare keywords respectively used by the author and 
by the community of users to describe a popular article about tagging, ordered by the 
number of users who added a specific tag in their bookmarks as a descriptor for the 
article. 

Figure 1: List of keywords for a popular article on “tagging” as compiled by the author, from 
Del.icio.us.

Figure 2: Distribution of collaboratively aggregated keywords for the same article as in figure 
1, from Del.icio.us. 

Similar  lists  can  be  found  in  CiteULike  or  Connotea,  although  neither  of  these 
services seem to have realized so far how important it is to rank tags by the number of 
users who applied them to a specific item. Measuring tag density per item in social 


6  /    DARIO TARABORELLI

software is possibly the most reliable strategy to estimate the semantic relevance of an 
item without relying on expert feedback. Services allowing to aggregate keywords 
compiled  by  multiple  users  to  describe  scientific  references  are  then  in  the  best 
position  to  become  providers  of  virtually  cost-free,  collaboratively  aggregated 
semantic metadata for large sets of scientific articles and to challenge more traditional 
and costly top-down categorization approaches. 

2.2 Popularity 

Another fundamental type of metadata that can be extracted by social bookmarking 
systems is popularity indicators. Looking at how many users bookmarked an item in 
their personal reference library can provide a reliable measure of the popularity of that 
item within a given community. Understandably, academically oriented services (like 
CiteSeer, Web of Science or Google Scholar) have focused so far on citation analysis, 
which is the standard indicator of a paper’s authority in the bibliometric tradition. I 
anticipate that popularity indicators from online reference managers will eventually 
become a factor as crucial as citation analysis for evaluating scientific content. 

This may sound paradoxical if we consider that complex authority measures were 
introduced  precisely  to  avoid  the  typical  biases  of  raw  usage-based  popularity 
indicators. Social bookmarking data are likely to provide more robust indicators than 
usage factors insofar as they result from the intentional behavior of users interested in 
marking  an  item  for  future  use  rather  than  from  pure  navigation  patterns. 
Bookmarking an item is a much more relevant (and virtually more spam-resistant) 
kind of action to estimate user interest than merely  following a link. In this sense, 
social bookmarking systems are likely to provide accurate figures on papers that are 
frequently read and cited in a given area of science. 

Whether  social  bookmarking  popularity  data  are  better  indicators  than  access-
based  factors  to  measure  the  scientific  significance  of  an  article  within  a  given 
academic community is  an empirical  question.  It  has  been shown that  ratings  for 
scientific  articles  aggregated  from  an  online  community  of  biologists  (F1000) 
strongly correlate with their impact factor. [1] A comparison of the distribution of 
citations, the distribution of popularity indicators in social bookmarking services and 
access-based figures for a representative sample of papers would provide a very much 
needed contribution to the understanding of how good different kinds of usage-related 
metrics are at predicting scientific impact. [see for instance 6, 11] 

Interestingly, a number of social bookmarking systems such as  Del.icio.us have 
started realizing the strategic importance of redistributing popularity data they collect. 
Del.icio.us  recently  introduced  the  possibility  of  displaying  on  external  websites 
popularity indicators based on the number of users who filed a specific URL in their 
bookmarks. Similar ideas have been in circulation for years (consider for example 
Google’s PageRank indicator or Alexa’s traffic ranking in their browser toolbars) but 
it  seems  that  social  bookmarking  systems  have  not  fully  acknowledged  the 
importance of redistributing the metadata they collect. 

http://www.alexa.com/site/help/traffic_learn_more
http://toolbar.google.com/button_help.html
http://blog.delicious.com/blog/2006/12/the_new_and_tag.html
http://del.icio.us/
http://www.facultyof1000.com/
http://scholar.google.com/
http://scientific.thomson.com/products/wos/
http://citeseer.ist.psu.edu/


SOFT PEER REVIEW: SOCIAL SOFTWARE AND DISTRIBUTED SCIENTIFIC EVALUATION  /    7

Figure 3: Popularity indicators for an article in CiteULike.

Connotea,  CiteULike  and  similar  services  should  consider  giving  back  to  content 
providers (from which they borrow bibliographic metadata) the ability to display the 
popularity  indicators  they  produce.  When  this  happens,  it  is  not  unlikely  that 
publishers may start displaying popularity indicators on their websites (e.g. Article X 
was bookmarked by 10,234 readers) to promote their content. 

2.3 Hotness 

“Hotness” can be described as an indicator  of short-term scientific  significance,  a 
useful  measure  to  identify  emerging  research  trends  within  specific  communities. 
Mapping popularity distributions on a temporal  scale is actually common practice. 
Indicators  such  as  ISI  Impact  Factor are  systematically  complemented  with time-
dependent  metrics:  High  Immediacy on  the  one  hand  describes  the  frequency  of 
citations  an article  receives  within a  specific  timeframe,  which allows to  identify 
journals that are good at providing cutting edge information;  Cited Half-Life  on the 
other hand can be used to estimate how long an article is perceived as relevant in the 
field.  Similar  criteria  are  used  by  social  software  services  (such  as  Del.icio.us, 
Technorati or Flickr) to determine what’s hot in the last days of activity. 

Online reference managers have recently started to look at such indicators. As of 
its current implementation, CiteULike measures “hotness” by explicitly asking users 
to  vote  for  articles  they  like.  The  goal—CiteULike  developer  Richard  Cameron 
explains—is to “catch influential papers as soon as possible after publication”. There 
are several reasons to believe that explicit votes may not be the best way to capture 
emerging trends within a given academic community. Relying on votes (whether they 
are combined or not with other metrics) is a questionable strategy to measure time-
related  popularity  information  from  users,  insofar  as  most  users  who  use  social 
bookmarking services are unlikely to cast a vote if they do not see its immediate 
benefits,  whereas  a  large  part  of  those  users  who  actively  vote  may  do  so  for 
opportunistic reasons. In order to provide reliable figures, popularity indicators should 
rely on patterns that are implicitly generated by user behavior: the best way to know 
what  a  community  of  users  likes  is  definitely  not  to  ask  them,  but  to  aggregate 
meaningful patterns from the natural behavior of users who joined a given service. 
Hopefully online reference management services will soon realize the importance of 
extracting measures of time dependent popularity in an implicit and automatic way: 
most mature social software projects solved this issue by avoiding the use of explicit 
votes. 

http://flickr.com/
http://www.technorati.com/
http://del.icio.us/
http://scientific.thomson.com/free/essays/journalcitationreports/impactfactor/


8  /    DARIO TARABORELLI

2.4 Collaborative annotation 

One of the most understated (and in my opinion, most promising) aspects of online 
reference managers  is  the ability  they provide  to  collaboratively annotate content. 
Collaborative annotation functionality was introduced by platforms such as Naboj (a 
service allowing collaborative annotations of arXiv preprints) or electronic journals 
such as Philica, allowing open peer commentaries on the articles they feature. 

A  distinctive  feature  of  online  reference  managers  is  that  they  do  not  require 
specific incentives for notes and reviews to be produced, since annotating references 
is  an activity individual users  naturally engage in when filing a reference in their 
library. The issue of incentives and of the cost related to reviewing content for free 
has actually been one of the major obstacles to the diffusion of open peer review 
systems, witness the failure of Nature’s pilot experiment in 2006. [10] 

Collecting annotations from users of online reference managers, on the 
other hand, looks like a more viable strategy precisely because these annotations 

are generated  spontaneously.  Online reference managers allow users to add public 
notes  and  short  reviews  to  items  they  bookmark,  which  in  turn  can  be  used  to 
automatically  aggregate  collaborative  lists  of  annotations  without  any  explicit 
incentive or call for commentary. 

Could such annotations be used to extract meaningful metadata at a large scale for 
the  purpose  of  measuring  scientific  quality  and  impact?  The obvious reason why 
bottom-up,  collaborative  annotation  cannot  be  compared,  in  this  respect,  with 
traditional refereeing is that the expertise of the reviewer cannot be directly measured. 
The  crucial  question  is  then  to  understand  if  there  is  a  viable  strategy  to  make 
collaborative  annotation  more  reliable  while  maintaining  the  advantages  of  social 
software. 

Weighting user contributions by independently assessed authority is an issue that 
was recently brought to public attention by the  Citizendium vs. Wikipedia debate. 
The solution proposed by the Citizendium founder to externally check the academic 
credentials  of  contributors  is  definitely  a  good  solution  to  secure  the  quality  of 
contributions against inaccuracy, abuse and vandalism. But the question remains open 
whether  this  approach  is  scalable  without  specific  incentives.  I  suggest  in  what 
follows  an  alternative  solution  that  could  be  implemented  in  online  reference 
management systems to combine some features of anonymous refereeing with the 
benefits afforded by social software. 

A possible bottom-up solution to the problem of ranking contributions by authority 
would  be  to  introduce  user  rating  as  a  function  of  their  perceived  expertise  as 
measured by the user community. Asking users to directly rate each other is definitely 
not a viable approach: as in the case of “hotness” measures based on explicit votes, 
mutual user rating is an easily biasable strategy. Indirectly rating expertise by rating 
anonymous  contributions  looks  like  a  much more  robust  solution.  Assuming that 
users massively annotate references they file in their library and accept to make these 
notes public, notes from multiple users can be easily aggregated and displayed for 
each item. Notes could then be displayed anonymously to other users, who would 
have the possibility to save a note in their own library if they consider it useful. This 
behavior  (i.e.  importing  someone else’s  annotation)  could be taken as  an indirect 

http://blog.citizendium.org/2007/03/21/we-arent-wikipedia/
http://www.philica.com/
http://www.naboj.com/


SOFT PEER REVIEW: SOCIAL SOFTWARE AND DISTRIBUTED SCIENTIFIC EVALUATION  /    9

positive rating for the author of the note, whose overall score would result from the 
number of anonymous contributions she wrote that other users imported. 

These ratings can then be calculated on a per-topic basis. Suppose user A has a 
large number of positive ratings for comments she posted on papers tagged with tag 
dna: this will be a bottom-up indicator of her expertise for the dna; topic within the 
user community. User A will then have different degrees of expertise for topics tag1, 
tag2, tag3, as a function of how useful other users found her anonymous annotations 
to papers tagged respectively with tag1, tag2, tag3. 

This  is  just  an  example  of  how valuable  information  could  be  extracted  from 
collaborative annotations by adding an indirect rating layer within online reference 
management systems. Allowing indirect rating of annotations posted via anonymous 
contributions would allow implementing at a large scale a sort of  soft peer review 
system. This in turn would allow social software services to aggregate much larger 
sets  of  evaluative  metadata  about  scientific  literature  than  traditional  reviewing 
models will ever be able to provide at a large scale. 

3 The  role  of  collaborative  evaluation  in  scientific  knowledge 
production 

I reviewed a number of ways in which social software metrics might help bridge the 
gap between traditional quality indicators and raw usage factors, thus answering to 
the need of more accurate metrics to evaluate scientific significance. The potential of 
social bookmarking to provide relatively unbiased metrics is underestimated in the 
current debate on usage factors. Compared to raw access-data, social bookmarking 
metrics are likely to provide better proxies to estimate the impact of scientific papers 
in the academic community insofar as they are aggregated from much more specific 
usage  patterns:  the  act  of  bookmarking  an  item as  opposed  to  the  act  of  simply 
following  a  link  or  downloading  a  paper.  Obviously,  there  is  no  guarantee  that 
bookmarking  is  spam-free,  and  social  bookmarking  immune  to  self-promotion 
gaming, but there are several reasons to believe it is far more reliable as a proxy than 
mere usage data:
 
• bookmarks require user  registration whereas  usage data  can be artificially  inflated via 

robots; 
• a bookmark indicates a single action by a user whereas in the general case there is no way 

to understand how many hits are generated by different users or by the same user visiting 
the same resource several times; 

In this sense, social bookmarking systems offer a unique opportunity to provide a 
class of usage-related indicators of scientific quality that look more robust than any 
other kind of bottom up solutions to this problem. 
How will  traditional  top-down quality assessment cope with the diffusion of  new 
forms of distributed scientific evaluation? Whether soft peer review will oust more 
traditional assessment models is a question that can only be answered by considering 
the  conditions  that  any  candidate  system alternative  to  peer  review should  meet. 

http://www.brianstorms.com/archives/000575.html
http://www.brianstorms.com/archives/000575.html


10  /    DARIO TARABORELLI

Former Nature editor Charles G. Jennings [14] summarizes the basic requirements for 
a scientific quality assessment system as follows (emphasis mine): 

• It must be  reliable it must predict the significance of a paper with a level of accuracy 
comparable to or better than the current journal system. 

• It  must produce a recommendation that is  easily  digestible,  allowing busy scientists to 
make quick decisions about what to read. [...] 

• It must be economical, not only in terms of direct costs such as web operations, but also in 
terms of reviewer time invested.

• It must work fast. The peer review system produces clear-cut decisions relatively quickly 
(in part because editors pester reviewers to deliver their reports), whereas many forms of 
communal  assessment  such  as  the  emergence  of  a  statistically  significant  pattern  of 
citations or expert recommendations are likely to be slow and gradual by comparison. [...]

• It must be resistant to ‘gaming’ by authors. Of course, savvy authors already know how to 
work the current system, but the separation of powers between editors and anonymous 
reviewers does I believe preserve some integrity to the process. 

Understanding whether evaluations enabled by social bookmarking meet these criteria 
is  beyond the  scope of  the  present  discussion.  Quantitative  analyses  will  have  to 
compare how peer review and distributed evaluation processes perform as competing 
scientific assessment systems against the above benchmarks. 

It  is noteworthy, however, that on top of reliability requirements, several of the 
conditions suggested by Jennings explicitly refer to the sustainability of evaluation 
systems. It is not implausible in the long run scientific evaluation systems, in order to 
be sustainable, will have to become independent of scientific dissemination systems 
(e.g.  scholarly journals run by academic publishers).  Evaluation and dissemination 
can be regarded as two distinct functions in the scientific communication system [17] 
that  are  currently  fulfilled  by  the  same  actors,  i.e.  peer-reviewed  journals.  There 
seems  to  be  no  reason  to  exclude  that  the  relationship  between  evaluation  and 
dissemination  systems  may  change  in  the  future  under  the  pressure  of  new 
technologies. This is particularly likely in a situation in which scientific content and 
metadata  about  this  content  are  massively  available  online,  thus  favoring  the 
development of third party services. 

The emergence of search engines as universal quality assessment institutions to 
orient users in content selection is the result of the pressure put on the system by the 
explosion of content and by the need of efficient and scalable solutions to cope with 
this explosion. In this sense, search engine have come to occupy a crucial epistemic 
function between knowledge producers and knowledge consumers in the World Wide 
Web. [13] 

Scientific  knowledge transmission may face  the  same destiny in  an even more 
dramatic way. I proposed a few ways in which social bookmarking and collaborative 
annotation systems could be used to extract large-scale indicators of scientific quality 
from user behavior without the need of specific incentives. In the long run, I expect 
these  bottom-up,  distributed  processes  to  become more  and  more  valuable  to  the 
academic  community  and  traditional  publishers  to  acknowledge  the  necessity  of 
integrating metadata collected through social software. This will be possible as soon 
as collaborative annotation services reach critical mass and start developing facilities 


SOFT PEER REVIEW: SOCIAL SOFTWARE AND DISTRIBUTED SCIENTIFIC EVALUATION  /    11

(ideally programmable interfaces or API) to expose the data they collect and feed 
them back to potential consumers (publishers, individual users or other services). 

The future role of social bookmarking systems (as I envision this) is not dissimilar 
from  that  of  mashup  services,  as  intermediate  providers–between  information 
producers  and  information  consumers–of  aggregated  metadata.  To  quote  the 
conclusions of an article on the future of the mashup economy: 

[y]ou don’t  have to have your own data to make money off of data access. Right now, 
there’s revenue to be had in acting as a one-stop shop for mashup developers, essentially 
sticking yourself right between data providers and data consumers. 

A similar reason could justify a strong presence of these services in the scientific 
communication system. If they succeed in doing this,  they will come to occupy a 
crucial function in the system of scientific knowledge production and challenge the 
traditional approaches to scientific assessment. 

References 

[1] Revolutionizing peer review? Nat Neurosci, 8(4):397–397, April 2005. doi: 
10.1038/nn0405397. URL http://dx.doi.org/10.1038/nn0405397. 

[2] Peer review and fraud. Nature, 444(7122):971–972, December 2006. doi: 
10.1038/444971b. URL http://dx.doi.org/10.1038/444971b. 

[3] The impact factor game. PLoS Medicine, 3(6), June 2006. doi: 10.1371/journal. 
pmed.0030291. URL http://dx.doi.org/10.1371/journal.pmed.0030291. 

[4] S. Bao, G. Xue, X. Wu, Y. Yu, B. Fei, and Z. Su. Optimizing web search using 
social annotations. In WWW ’07: Proceedings of the 16th international conference 
on World Wide Web, pages 501–510, New York, NY, USA, 2007. ACM Press. ISBN 
9781595936547. doi: 10.1145/1242572.1242640. URL 
http://dx.doi.org/10.1145/1242572.1242640. 

[5] J. Bollen, H. Van de Sompel, J. A. Smith, and R. Luce. Toward alternative metrics of 
journal impact: A comparison of download and citation data. Information Processing & 
Management, 41(6):1419–1440, December 2005. doi: 10.1016/j.ipm.2005.03.024. URL 
http://dx.doi.org/10.1016/j.ipm.2005.03.024. 

[6] T. Brody, S. Harnad, and L. Carr. Earlier Web usage statistics as predictors of later 
citation impact. J. Am. Soc. Inf. Sci. Technol., 57(8):1060–1072, June 2006. 
ISSN1532-2882. doi: 10.1002/asi.v57:8. URL http://dx.doi.org/10.1002/asi.v57:8.13

[7] E. Garfield. The agony and the ecstasy— the history and meaning of the jour
nal impact factor. In International Congress on Peer Review And Biomedical Pub
lication, Chicago, September 2005. URL http://garfield.library.upenn.edu/ 
papers/jifchicago2005.pdf.

http://gigaom.com/2007/01/21/making-money-in-the-mashup-economy/


12  /    DARIO TARABORELLI

[8] P. Ginsparg. Can peer review be better focused. Science & Technology Libraries, 22 
(3-4):5–17, January 2004. doi: 10.1300/J122v22n03 02. URL 
http://people.ccmr.cornell.edu/~ginsparg/blurb/pg02pr.html. 

[9] W. Glanzel. Journal impact measures in bibliometric research. Scientometrics, 53(2): 
171–193, 2002. URL http://www.ingentaconnect.com/content/klu/scie/2002/ 
00000053/00000002/00400216. 

[10] S. Greaves, J. Scott, M. Clarke, L. Miller, T. Hannay, A. Thomas, and P. Campbell. 
Nature’s trial of open peer review. Nature, December 2006. doi: doi: 
10.1038/nature05535. URL http://www.nature.com/nature/peerreview/debate/ 
nature05535.html. 

[11] S. Harnad. Open access scientometrics and the uk research assessment exercise. In D. 
Torres-Salinas and H. F. Moed, editors, 11th Annual Meeting of the International Society  
for Scientometrics and Informetrics, volume 11, pages 27–33, 2007. URL 
http://eprints.ecs.soton.ac.uk/13804/. 

[12] S. Harnad. Implementing Peer Review on the Net: Scientific Quality Control 
in Scholarly Electronic Journals, pages 103–118. MIT Press, 1996. URL 
http://eprints.ecs.soton.ac.uk/2900/. 

[13] C. Heintz. Web search engines and distributed assessment systems. Pragmatics & 
Cognition, 14(2):387–409, 2006. 

[14] C. G. Jennings. Quality and value: The true purpose of peer review. Nature, 2006. doi: 
doi:10.1038/nature05032. URL 
http://www.nature.com/nature/peerreview/debate/nature05032.html. 

[15] M. Jensen. The new metrics of scholarly authority. The Chronicle, June 2007. URL 
http://chronicle.com/free/v53/i41/41b00601.htm. 

[16] G. McKiernan. Peer review in the internet age: Five (5) easy pieces. Against the Grain, 
16(3):52–55, June 2004. URL http://www.public.iastate.edu/~gerrymck/ 
DraftFive.htm. 

[17] H. Roosendaal and P. Geurts. Forces and functions in scientific communication. In 
Cooperative Research Information Systems in Physics, Oldenburg, Germany, August 1997. 
URL http://www.physik.unioldenburg.de/conferences/crisp97/roosendaal.html. 

[18] P. T. Shepherd. Final report on the investigation into the feasibility of developing and 
implementing journal usage factors. Technical report, United Kingdom Serials Group, May 
2007. URL http://www.uksg.org/sites/uksg.org/files/Final%20Report%20on%20Usage
%20Factor%20project.pdf. 

[19] Y. Yanbe, A. Jatowt, S. Nakamura, and K. Tanaka. Can social bookmarking enhance 
search in the web? In JCDL ’07: Proceedings of the 2007 conference on 
Digital libraries, pages 107–116, New York, NY, USA, 2007. ACM Press. ISBN 
9781595936448. doi: 10.1145/1255175.1255198. URL http://dx.doi.org/10.1145/ 
1255175.1255198.