Open Access – Open Access and Digital Scholarship Blog

ResearchGate vs. institutional repositories: Which one should I use?

18 October 2019

Academic social network sites are currently used by researchers to share their research as well as collaborate with others studying in a similar field. Sites such as ResearchGate and Academia.edu and Mendeley are popular too. However, this does not necessarily make them the right place to share research. In this blog post, we explain why researchers at Imperial College London should use academic social network sites in addition to, not instead of, institutional repositories (such as Spiral). You will also find out about important differences between the two.

Why should I use an institutional repository?

Institutional repositories are officially recognised by governments, publishers, and funders for depositing published research – whereas academic social network sites are not. Even if a publisher does permit uploading a paper to ResearchGate, this will not ensure compliance with the UK’s REF 2021, research funders like the Wellcome Trust or future cOALition S (Plan S) OA requirements. This is because these policies clearly state that scholarly outputs are to be made available through institutional (or open access) repositories. Therefore, unlike academic social network sites, depositing your paper in one (such as Spiral) makes you compliant with funders, publishers and future ones too!

Copyright matters

If you upload a paper to one of these platforms, the risk of copyright infringement is a lot higher. That’s because many of these sites have no mechanism to check for publishers’ copyright permissions and policies. A recent study showed that “201 (51.3%) out of 392 non-OA articles [deposited in ResearchGate] infringed the copyright and were non-compliant with publishers.” The majority of infringement, the study highlights, occurred because the wrong version was deposited. There have also been around 7 million take-down notices for unauthorised content on ResearchGate given by various 17 different publishers which indicates how serious the issue is.

Institutional repositories, on the other hand, are managed (usually by a library team such as the Imperial Open Access Team) and they always check which version they are allowed to upload to a repository as open access. So, if you mistakenly sent the wrong version of an article, one of them will notify you and ask for a different version. They will also ensure the various embargo policies of publishers for you. Therefore, the risk of copyright infringement is very low when using a repository.

We’re hassle-free

Academic social network sites are commercial companies. This fact creates significant drawbacks. For example, some publishers only permit depositing papers in not-for-profit repositories, which means that ResearchGate and its equivalents cannot be used at all to deposit any version of the file. They also make a profit from your research outputs by selling data, advertising jobs or providing a premium service. For this reason, their services may discontinue, shut down or change to preserve profits. In such a case, the papers you have already uploaded to these platforms may no longer be available. Being commercially-run brings another hassle too: A subscription is necessary to upload a paper to these sites or even for reaching the content of a paper. This means they keep your personal data, or worse, may use it in a direct (by email notifications) or indirect (by selling data) way.

We’re here to stay

As institutional repositories are non-profit platforms, none of the disadvantages mentioned above will occur. Institutional repositories, such as Spiral, serve as a permanent archive for collecting, preserving, and disseminating the intellectual output of an institution. Did we say permanent? We are not going anywhere. Therefore, research outputs deposited in institutional repositories will be preserved and freely accessible to the public for a long time, similar to public archives. Additionally, users can immediately access and download the contents of research outputs from the repositories without a subscription or log-in. And in fact if something is closed access, end users can Request a Copy (under the Fair Dealing exception in UK copyright law) – see an example. We never ask or use your personal data.

Love your repository

In short, there are many reasons to use institutional repositories and to be careful about using academic social network sites. We advise that Imperial researchers always deposit their papers first in Spiral which will provide you with a permanent resolvable link (such as http://hdl.handle.net/10044/1/73113) that you can safely post anywhere including ResearchGate. By depositing your paper in Spiral, you will ensure compliance with funders and publishers. So, you should use ResearchGate in addition to Spiral not instead of it. If you still want to deposit to academic social network sites, you can do it the legal way and check what publishers permit via the website SHERPA RoMEO. Lastly, repositories are trusted permanent archives for an institution’s research – they respect copyright law as well as promote open access to research – please use them!

Please don’t forget to check out our leaflet below which covers the same topic in a more visual way. You can also download it via Spiral (http://hdl.handle.net/10044/1/74076).

*This blog post used some arguments of the article authored by Katie Frtney and Justin Gonder (from the University of California). The article is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0). Find the original article at https://osc.universityofcalifornia.edu/2015/12/a-social-networking-site-is-not-an-open-access-repository/

* Another article written by Kathleen Fitzpatrick is also worth reading. Find the article at https://kfitz.info/academia-not-edu/

[infogram id=”54d20ab6-510d-4e99-829c-e2a291c15619″ prefix=”xaR” format=”interactive” title=”Copy: ResearchGate_leafletV2.pdf”]

Plan U: Open access initiative

Christine Buckley

16 October 2019

Plan U: A mandate for universal access to research using preprints

Introduction:

A preprint is a full draft of a research paper that is shared publicly before it has been peer reviewed. They are a growing form of scholarly communication. Plan U aims to make use of preprints to provide universal access to research. We are at a point where traditional forms of scholarly communication are slow and inaccessible. Can a preprint mandate change this?

Plan U is an open access initiative that seeks to mandate depositing research papers to a preprint server before publication. This mandate would come from funders. (More information is available in this article)

What are the problems?

The process of getting a research paper published is slow and arguably this is getting slower. This means that research is delayed in being seen and used.
Plan U aims to tackle issues of accessibility. Once a paper is accepted it remains behind the journal paywall. Even if papers are deposited into open access repositories there are usually embargoes. This means that research papers are inaccessible to those who do not have access to the journal platforms.
The final issue is the expense. Some are lucky enough to have funds to pay for Article Processing Charges (APCs) at which point your research is made available at the point of publication. However, even the richest universities struggle to find money to pay for these fees.

Preprint servers:

Plan U suggests that by using preprint servers you can speed up the access to the research and make it widely accessible for minimal costs. Preprint servers are free for both the reader and the author. Posting preprints is already widely accepted in many disciplines and is growing in others.

arXiv is generally considered the first preprint server and has been around since 1991. arXiv hosts approximately 1.5 million papers and this is growing at the rate of 140,000 a year. In the last five years, the success of arXiv has sparked the creation of other subject specific preprint servers such as bioRxiv, chemRxiv, EarthArxiv, medRxiv to name a few. It is estimated that by depositing preprints this could speed up scientific research by five times over 10 years.

Benefits of preprints:

There are recognised benefits to posting preprints. Plan U highlights the benefits of preprints to the community. Here are some benefits to the authors:

Credit: Preprints are citeable pieces
Feedback: Preprints accommodate wide feedback from a wide-ranging audience
Visibility: Papers that appear in preprint servers receive more alternative metrics
Reliable: Preprints often do not differ significantly from the published article

If you want any advice on your preprints please contact the Imperial Open Access team.

Better use of money:

Without the burden of paying for high APCs more money will be freed up to work on improving systems of peer-review and academic publishing processes.

Plan S vs Plan U:

Plan S is an open access initiative that aims to make research papers freely accessible by mandating 10 key principles. The problem facing plan S is that it requires many changes to existing infrastructures of academic publishing.

Plan U, on the other hand, doesn’t require a lot of change as it plans to make use of preprint servers. Preprints are a growing form of scholarly communication and are widely accepted by researchers and publishers alike. Plan U is not an alternative to Plan S and the two could run concurrently if any funder wished to do so.

Conclusion:

The Plan U website is a very basic text only webpage. It doesn’t include information on who is responsible for Plan U. To find this out you should look at the article published in PLoS. As the websites doesn’t contain any references or links to follow I think further development is still needed. At this point Plan U is still just an idea and whilst everyone is focused on Plan S this mandate may be one for the future.

Event:

If you are interested in learning more about Plan U and Preprints, there will be a free event hosted at Silwood Park Campus on November 27, 2019. Please visit Eventbrite to view the programme and sign up for your free ticket.

Most useful for: All postgraduate students (masters and PhD), and staff involved in research.

Less gold & more green – Research Councils and Open Access at Imperial

J M

1 May 2019

In common with a number of other universities in receipt of Research Councils UK (RCUK/UKRI**) funding for open access (Cambridge, UCL, LSHTM), we cannot pay for every single RCUK-funded output to be immediate open access. The Vice Provost’s Advisory Group for Research have therefore decided to restrict the use of the RCUK grant to only pay where:

a. the publication output is in a fully open access title that appears in the Directory of Open Access Journals (DOAJ) or
b. the publication title does not provide a compliant “green” (self-archiving) open access route

RCUK funded authors can still expect financial support for open access for the following:

Fully open access journals e.g. PLoS, BMC etc.
‘Hybrid’ open access journals – but only when a compliant publisher “green” (self-archiving) open access route is unavailable or exceeds the individual research council embargo.

Rather than paying for open access, if you are in receipt of funding from the UK research councils, you can comply by simply self-archiving (a REF2021 requirement already). Provided that the publisher’s required embargo does not exceed the maximum permitted. The vast majority of publications are compliant via the self-archiving open access route and authors can check individual embargo periods of journals via Sherpa Romeo.

Funder Maximum permitted embargo

MRC 6 months
BBSRC 12 months
EPSRC 12 months
NERC 12 months
STFC 12 months
ESRC 24 months

It is important to note that the choice of publication venue will not be compromised as we will pay when they exceed the permitted embargo. Funder compliance and REF2021 output eligibility will also be ensured.

The full policy of the College’s RCUK fund is available via the Open Access Library website as well as contact details with the OA Team.

**UKRI: UK Research and Innovation brings together the UK Research Councils, Innovate UK and Research England into a single organisation.

[contact-form][contact-field label=”Name” type=”name” required=”true” /][contact-field label=”Email” type=”email” required=”true” /][contact-field label=”Website” type=”url” /][contact-field label=”Message” type=”textarea” /][/contact-form]

You say tomato, I say accepted manuscript

Danny Smith

23 October 2018

This is the second of a series of blog posts by Imperial’s Open Access Team for OA Week, our first was on Publisher Problems.

What is an accepted manuscript? Depends who you ask…

The REF 2021 open access policy requires authors of journal articles and conference proceedings to deposit their work to an institutional repository within three months of acceptance. The version required for deposit by Research England, and permitted by most publishers, is the accepted manuscript version, but selecting the correct version is sometimes confusing for authors. There’s generally a lack of standardization in publishing, and a good example of this concerns accepted manuscripts. There is, in theory, an agreed definition, as follows:

The version of a journal article that has been accepted for publication in a journal. A second party (the “publisher”—see “Version of Record” below for definition) takes permanent responsibility for the article. Content and layout follow publisher’s submission requirements.

This is taken from NISO-RP-8-2008, or to give it its full title, Journal Article Versions (JAV): Recommendations of the NISO/ALPSP JAV Technical Working Group*. The definition is followed by these notes.

Acceptance must follow some review process, even if limited to a single decision point about whether to publish or not. We recommend that there should be a link from the Accepted Manuscript to the journal’s website that describes its review process
If the Accepted Manuscript (AM) is processed in such a way that the content and layout is unchanged (e.g., by scanning or converting directly into a PDF), this does not alter its status as an AM. This will also apply to “normalized” files where, for example, an author’s Word file is automatically processed into some standardized form by the publisher. The content has not changed so this essentially constitutes a shift of format only, and our terms are format neutral.
This stage is also known as “Author’s Manuscript” by, for example, the NIH, but we believe that the key point is the acceptance of the manuscript by a second party. Elsevier refers to it as “Author’s Accepted Manuscript”. SHERPA/RoMEO refer to it as “Postprint”, but this term is counterintuitive since it implies that it refers to a version that comes after printing.

Author Confusion

Many authors are confused by the details of Green OA, not knowing what version(s) they can share, where they can share them, and how etc. This confusion arises in part because of the various permissions of each publisher, and even each journal within a publisher’s collection. Permissions are an issue for another day, but surely authors’ (and our) lives could be made easier if publishers were to agree on a definition, such as that above (assuming for the moment that the above is satisfactory)? This is indeed the definition used by Taylor & Francis, though other publishers offer their own interpretations of what an accepted manuscript is, increasing author confusion.

Pile of papers — Which version can I upload?

In processing deposits to Spiral, Imperial’s IR, we often have to reject items because the authors have uploaded the incorrect version. We of course contact the author when this happens and request the accepted manuscript. When explaining this we try to use publisher specific details and if possible, give an example. A spreadsheet has been setup for this purpose.

It gives definitions of accepted manuscript by publisher with a link to the information on the publisher’s site, and where available, an example, if the publisher provides clear or labelled accepted manuscripts. It’s in its infancy at the moment, but hopefully with community input this can grow to become a useful resource for everyone. Presumably we’re all sending similar communications to authors about accepted manuscripts, so this should hopefully save us some time, and increase author awareness.

Please contribute to the spreadsheet, and do let us know if you have any questions or comments.

*A Recommended Practice of the National Information Standards Organization in partnership with the Association of Learned and Professional Society Publishers. Prepared by the NISO/ALPSP Journal Article Versions (JAV) Technical Working Group.

Open access outside academia

Anisha Ahmed

19 February 2018

Emily Nunn’s recent talk for the London Open Access Network (LOAN) meeting at the British Library, titled “Open Access outside academia: exploring access to medical and educational research for non-academics” provided an interesting opportunity to look at how the non-academic public access, view, understand and use academic research.

Tennyson said by Begoña V. (CC BY NC SA) https://flic.kr/p/4HJ8Kr

The talk was based on Emily’s PhD research investigating the impact of open access publishing outside traditional academic communities and focussed on patients and workers in the education/ charity/ medical sector. The motivations behind accessing research ranged from health diagnosis, “naturally curious”, to tasks at work, and social media coverage.

The access to research differed greatly among research participants, from those who had institutional access via their employer (such as a library), access via university, to those who relied on personal networks such as friends in academia. Although most users tend not to pay for paywalled content, there was little to no knowledge or familiarity with open access tools such as Unpaywall (a browser plug-in that locates free, legal, green open access versions of research when available). Workers in the charity sector were aware of/ had used pirate websites to access research.

Research participants also mentioned that they sometimes found it difficult to understand and interpret academic language, statistics etc. which could be a further barrier to accessing research. There was also no mention of the “Request a copy button” as most users were relying on Google for their searches. There was an exciting discussion around whether libraries should lead the way in research literacy, and helping the lay public to understand and interpret research. A suggestion was to include lay summaries in scientific articles and also to embed open access links within lay articles.

Knowledge of open access was quite low, among the research participants, and limited to only being a way of accessing research; there was no discussion or mention of re-use by any of the participants. There was very little understanding of the green and gold routes to open access, with the public not getting to green articles easily, most reached journal websites.

Interestingly, there was a false understanding of scholarly publishing, with research participants believing that articles were paywalled so as to allow the author/ researcher to recuperate their costs themselves. (rather than the publisher profiteering!)

The talk lead to a lively discussion among the members of LOAN, with those present wondering if university and public libraries should be doing more open access advocacy to engage the wider public.

Anisha Ahmed

19 February 2018

Article Processing Charges and open access January to July 2017

J M

15 August 2017

Open Access infographic – January to June 2017

J M

9 August 2017

Imperial College 2015-2016 open access compliance report to RCUK

Torsten Reimer

15 September 2016

Earlier today Imperial College submitted its annual report on compliance with the Research Council’s open access policy to RCUK. The RCUK OA policy envisages a five year journey after which 100% of RCUK funded scholarly papers should be available as open access in 2018. To support the transition to open access, RCUK have set up a block grant that makes funds available to institutions to cover the cost for article processing charges (APCs) and other OA-related expenses. Funds are awarded in relation to RCUK research funding for institutions, and Imperial College has the second largest allocation, just behind Cambridge and followed by UCL. The annual reports to RCUK give an overview over institutional spend and on compliance.

The headline figures for the 2015/2016 College report are:

£1,051,130 block grant spend from April 2015 to March 2016
89% overall compliance, split in 31% via the gold and 58% via the green route
570 article processing charges paid at an average cost of ~£1,800
The top five publishers are: Elsevier, Wiley, Nature, ACS and OUP

Like every year when discussing the RCUK report figures I think it is important to highlight that compliance rates between universities cannot meaningfully be compared without understanding the data sources and methods used. Just to give one example: the College could also have reported 81% green and 8% gold from the same data.

Why do I caution against directly comparing the numbers? For starters, research-intensive universities find it difficult to establish what 100% is. With hundreds, or in the case of Imperial College many thousand papers published every year we rely on academics to manually notify us for each paper who the funder is. Even though the College has made much progress improving its processes and data over the past few years we have to acknowledge that data collected through such a process will never be complete or fully accurate. For the College report we decided, like in the previous years, to base our analysis on outputs we know to have been RCUK-funded. For this year the size of the sample was 1,923 papers (compared to 1,326 in 2014). With a different sample the numbers would have been different, and other universities may have taken a different approach to analysing the data.

Sadly, it is currently not easy to establish whether an output was made available open access. Publishers do not usually add licensing information to metadata, and searching for manuscripts deposited in external repositories is possible but not necessarily accurate. The process we used for analysis was:

Cross-reference the sample with the journal list from the Directory of Open Access Journals; class every article published in a full OA journal as compliant ‘gold’.
Take the remaining articles and cross-reference with the list of articles for which the College Library has paid an APC; class all those articles as compliant ‘gold’.
Take the remaining articles and cross-reference with the outputs from ResearchFish that show a CC BY license; class all those articles as compliant ‘gold’.
Take the remaining articles and cross-reference with list of outputs deposited in the College repository Spiral; class all those articles as compliant ‘green’.
Take the remaining articles and cross-reference with list of outputs that have a Europe PubMed Central ID; class all those articles as compliant ‘green’.

As in previous years we also put remaining outputs through Cottage Labs Lantern tool but this showed no additional open access outputs. The main reason for that, I suspect, is the high compliance via the green route: some 81% of outputs in the sample had been deposited to the College repository Spiral or to Europe PMC. As the College prefers green over hybrid gold it would have been in line with our policy to report them as green, but as the RCUK prefers gold OA we decided to report all outputs know as gold as such, like in previous years.

I could write more about reporting issues around open access, but as I have done that on a few other occasions I refer those who haven’t suffered enough to my previous posts.

One other caveat should be raised for those planning to analyse the APC spend in comparison with previous years: The APC article level data is based on APCs paid during the reporting period. This differs from the APC data reported in the previous period which was based on APC applications published. There are, therefore, a small number of records duplicated from the previous year. These have been identified in the notes column.

If you want the data from the full report, please visit the College repository: http://hdl.handle.net/10044/1/40159

I would like to thank everyone involved in putting the information together, especially colleagues in the Central Library’s open access team.

Universities and academic software development – Software Sustainability Institute fellowship report

Torsten Reimer

9 August 2016

“At a research-intensive university like Imperial it is hard to do anything that doesn’t involve data,” noted Imperial’s Provost when he launched the KPMG Data Observatory last year. The importance of data in research is now commonplace, from proclaiming the rise of a scientific Fourth Paradigm to celebrating “data scientist” as “the sexiest job of the 21^st century” and research funders mandating research data management (RDM). Comparatively, software has received less attention – and yet without software there is no data, certainly no “big” data, and no data science either. In fact, there may well be no ‘modern’ research without it – in a 2014 survey 7 out of 10 researchers said it is now impossible to do research without software.

“Better Software Better Research” – SSI, licensed CC BY NC

Despite the importance of research software, academia could improve its support for academic coders. A university career is usually measured on publications, citations, grants and, perhaps, teaching. Focusing on keeping the tools of a research group up-to-date is not likely to give you either, and highly paid industry posts may be more appealing than short term academic contracts.

When I was a student and part-time university staff I was one of the people who developed and maintained digital research infrastructure. At the time, senior colleagues advised us not to risk our careers by becoming ‘mere technicians’ instead of doing ‘real’ research. This attitude has since changed somewhat, but beyond research support roles the career paths for academic software developers are still murky and insecure.

Thankfully, there are now initiatives dedicated to change this. One of them is the UK’s Software Sustainability Institute (SSI), a fantastic organisation with the simple yet powerful slogan: “Better Software, Better Research”. In 2015 I became a fellow of the SSI, and through this blog post I report on some of my related activities.

Supporting Research Software Engineers

Organisations like the SSI help to create a professional identity for coding academics, or research software engineers, as they are now called. One of the recent achievements was the formation of a UK RSE community as a first step to professionalization. Imperial College now has its own RSE group, and I am pleased that I had a chance to contribute a little to its formation. The focus of my fellowship activity was on improving College support for academic software development, and I approached this through policy.

In recent years, UK research funders released a set of policies governing academic research data management. This led to universities defining their own policies and making plans for the corresponding support infrastructure. At the heart of Imperial’s RDM policy is the requirement to preserve the data needed to validate academic publications – reproducibility is a core principle of research, after all. During the policy development I suggested that we should go a step beyond funder requirements to include software. Without code, after all, there is a risk that data cannot be understood. In some cases, the code is arguably more valuable than the data generated by it. This led to our policy requiring that where software is developed as part of a project “the particular version of the software used to generate or analyse the data” has to be archived alongside the data.

One of our principles for policy development was that there would be no College requirement without us providing – directly or indirectly – solutions that enable academics to comply, and that we would seek to add value where possible. This brought up the question: how do you facilitate the archiving, and ideally wider sustainability, of research code?

One answer, in general terms, is: by supporting best practice in software development, in particular the use of version control. Being able to track contributions to code makes it possible to give credit. Being able to distinguish different versions allows researchers to archive the right code. Running a distributed version control system (DVCS) makes it easy to open up the development and share code.

In informal consultation academics pointed to the open source DVCS Git – not surprisingly perhaps, considering its global popularity. We knew from anecdotal evidence that a broad range of DVCS are used at the College. Some academics pay for commercial solutions, others use free web-based options and some groups are hosting their own. There is no central support and coordination, leading to inefficiencies and, to an extent, a lack of central College engagement with academic coders.

Imperial College survey on distributed version control

To better understand current practice, I worked with colleagues in ICT to develop a survey aimed at DVCS users across the College. We launched the survey in November 2015 and circulated it via the RSE community, academic champions and email newsletters. 263 completed responses were received – for what some would call an “esoteric” topic this was a very good response, especially considering that we only approached a fraction of our 4,000 academics directly. The responses also showed that it was not just the usual suspects, such as computer scientists, who have an interest in DVCS (fewer than half of the responses came from the Faculty of Engineering).

Results summary

96% of respondents were aware of Git, and 82% actively use it
The main alternatives to Git are Subversion (65 users), Mercurial (18) and CVS (17)
Of the active Git users:
- 75% were rating themselves as expert or intermediate
- 91% use Git for academic research, 22% in teaching and 18% for commercial work
- 50% use Git for both closed and open development, and about a quarter each use it only or mostly for closed or open development
The main uses of Git are: code/documentation (99%), data/documents (53%), managing configuration files (35%), data sharing/sync (34%), backend for wiki/blog etc. (19%)
GitHub is by far the most popular Git web-repository (79%), followed by Bitbucket (45%) and Gitlab (22%)

Sample survey question: How do you use Git? (check all that apply)

#	Answer	Response	%
1	Code (programming) and its documentation	201	99%
2	Data, documents (also e.g. static website)	107	53%
3	Sharing data or sync	69	34%
4	Managing configuration files	72	35%
5	Backend for wiki, blog, or other web app	39	19%
6	Backend for bug tracker / issue tracker	32	16%
7	Backend (versioned storage) for other kind of app	14	7%
8	Interacting with other SCM (e.g. git-svn)	9	4%
9	Other (please specify)	9	4%

The, annonymised, survey results are available at: http://dx.doi.org/10.5281/zenodo.59894

GitHub Enterprise for Imperial?

We were particularly interested in finding out whether it would be worthwhile for the College to invest in GitHub, the hosted Git environment. GitHub is free to use, as long as you don’t mind your code being publicly accessible; there is a charge for private code repositories. Some respondents expressed a preference for a College-hosted open source solution or other platforms such as Bitbucket, but many comments pointed to GitHub. Overall there was a consensus that DVCS should be, to quote a participant, “a vital part of e-infrastructure” for an institution like Imperial.

A key requirement that emerged from the consultation was being able to run private code repositories, for example for “codes with commercial or security (e.g. nuclear) related sensitivities”. I am aware that open versus closed can be a controversial topic, but as an organisation with significant industry funding we have to acknowledge that some code cannot be made available publicly. Or, as one respondent put it: “Having a local GitHub Enterprise would definitely add value for us, as we’re working with commercially sensitive data through industrial collaborations, which we can’t put in a publically accessible repository or project management site.”

DVCS like GitHub make it easy for academics to collaborate and share. However, academics value platforms that preserve the integrity of the code while giving them control over what to make publicly accessible and when. The survey pointed to GitHub Enterprise as the preferred platform, a view that was fully endorsed by academics on the College’s RDM steering group.

Following the consultation, the College has made the decision to procure a site licence to GitHub Enterprise. GitHub Enterprise will become a core College service, managed by ICT. There would be no requirement to use GitHub for development, although its use will be encouraged. It was also agreed that we would not simply launch a new out-of-the-box service and hope that that would magically fix all issues. Instead some level of centrally coordinated support and training would be provided – ideally working with groups like the SSI and Software Carpentry. As a first step of the project to launch GitHub Enterprise, focus groups are being set up to gather academic requirements and guide the configuration and introduction of the new service.

Ongoing engagement

Arguably, this does not address concerns about career paths and reward systems for research software engineers. However, it demonstrates that a university like Imperial College values the code written by its staff, and is dedicated to support academic developing of research code. Partly as a result of the consultation, ICT, Library and the Research Office have now increased their engagement with the RSE community. Policy development may not sound like a very exciting task, but where it leads to more communication with and better support for academics I find it worthwhile and exciting enough.

How compliant are we with HEFCE’s REF open access policy? (Why Open Access reporting is difficult, part 2)

Torsten Reimer

2 March 2016

In what is hopefully not going to become a long series I am today dealing with the joys of compliance reporting in the context of HEFCE’s Policy for open access in the post-2014 Research Excellence Framework (REF). The policy requires that conference papers and journal articles that will be submitted to the next REF – a research assessment through which funding is allocated to UK universities – have to be deposited in a repository within three months of acceptance for publication. Outputs that are published as open access (“gold OA”) are also eligible, and during the first year of the policy the deposit deadline has been extended to three months of publication. The policy comes in force on 1^st April and considering the importance of the REF the UK higher education sector is now pondering the question: how compliant are we?

As far as Imperial College is concerned, I can give two answers: ‘100%’ and ‘we don’t know’.

‘100%’ is the correct answer as until 1 April all College outputs remain eligible for the next REF. While correct, the answer is not very helpful when trying to assess the risks of non-compliance and for understanding where to focus communications activities. Therefore we have recently gone through a number crunching exercise to work out how compliant we would be if the policy had been in force since May last year. In May 2015 we made a new workflow available to academics, allowing them to deposit outputs ‘on acceptance’. The same workflow allows academics to apply for article processing charges for open access, should they wish to.

You would imagine that with ten months of data we would be able to give an answer to the question for ‘trial’ compliance, but we cannot, at least not reliably. In order to assess compliance we need to know the type of output, date of acceptance (to work out if the output falls under the policy), the date of deposit and the date of publication (to calculate if the output was deposited within three months). Additionally it would help to know whether the output has been made open access through the publisher (gold/immediate open access).

Below are eight issues that prevent us from calculating compliance:

Publisher data feeds do not provide the date of acceptance
Publishers do not usually include the date of acceptance in their data feeds, therefore we have to rely on authors manually entering the correct date on deposit. Corresponding authors would usually be alerted to acceptance, but co-authors will not always find out about acceptance, or there may be a substantial delay.
Deposit systems do not always require date of acceptance
The issue above is made worse by not all deposit systems requiring academics to enter the date of acceptance. In Symplectic Elements, the system used by Imperial, the date is mandatory only in the ‘on acceptance’ workflow; when authors deposit an output that is already registered in the system as published there is currently no requirement to add the date – resulting in the output listed as non-compliant even if it was deposited in time. Some subject repositories do not even include fields for date of acceptance.
Difficulties with establishing the status of conference proceedings
Policy requirements only apply to conference proceedings with an ISSN. Because of the complexities with the publishing of conference proceedings we often cannot establish whether an output falls under the policy, or at least there is a delay (and possible additional manual effort).
Delays in receiving the date of publication
It takes a while for publication metadata to make it from publishers’ into institutional systems. During this time (weeks, sometimes months) outputs cannot be classed as compliant.
Publisher data feeds do not always provide the date of publication
This may come as a surprise to some, but a significant amount of metadata records do not state the full date of publication. The year is usually included, but metadata records for 18% of 2015 College outputs did not specify year or month. This percentage will be much higher for other universities as the STEM journals (in which most College outputs are published) tend to have better metadata than arts, humanities and social sciences journals.
Publisher data feeds usually do not provide the ‘first online’ date
Technically, even where a full publication date is provided the information may not be sufficient to establish compliance. To get around the problem that publishers define publication dates differently, HEFCE’s policy states that outputs have to be deposited within three months of when the output was first published online. This information is not usually included in our data feeds.
Publisher data feeds do not usually provide licence information
Last year, Library Services at Imperial College processed some 1,000 article processing charges (APCs) for open access. We know that these outputs would meet the policy requirements. However, when the corresponding author is not based at Imperial College – last year around 55% of papers had external co-authors – we have no record on whether they requested that the output be made open access by a publisher. For full open access journals we can work this out by cross-referencing the Directory of Open Access Journals. However, for ‘hybrid’ journals (where open access is an (often expensive) option) we cannot track this as publisher metadata does not usually include licence information.
We cannot reliably track deposits in external repositories
Considering the effort universities across the UK in particular have put into raising awareness of open access there is a chance that outputs co-authored with academics in other institutions have been deposited in their institutional repository. Sadly, we cannot reliably track this due to issues with the metadata. If all authors and repositories used the ORCID identifier it would be easier, but even then institutional repositories would have to track the ORCID iD of all authors involved in a paper, not just those based at their university. If we had DOIs for all outputs in the repositories it would be much easier to identify external deposits.

Considering the issues above, reliably establishing ‘compliance’ is at this stage a largely manual effort that would take too much staff time for an institution that annually publishes some 10,000 articles and conference proceedings – certainly while the policy is not yet in force. Even come April I would rate such an activity as perhaps not the best use of public money. Arguably, publisher metadata should include at least the (correct) date of publication and also the licence, although I cannot see a reason not to include the date of acceptance. If we had that, reporting would be much easier. If we had DOIs for all outputs (delivered close to acceptance) it would be even easier as we could track deposits in external repositories reliably.

Therefore I call on all publishers: if you want to help your authors to meet funder requirements, improve your metadata. This should be in everyone’s interest.

Colleagues at Jisc have put together a document to help publishers understand and implement these and other requirements: http://scholarlycommunications.jiscinvolve.org/wp/2015/03/26/how-publishers-might-help-universities-implement-oa/

What we can report on with confidence is the number of deposits (excluding theses) to our repository Spiral during 2015: 5,511. Please note: 2015 is the year of deposit, not necessarily year of publication.