Archiving data: Difference between revisions

From Thom Group Wiki
Jump to navigation Jump to search
import>Rc566
(Created page with "Data archiving procedure (for EPSRC-funded researchers) == Requirements == The EPSRC requires that all publications with a publication date on/after 1st May 2015 include a st...")
 
import>Rc566
No edit summary
Line 32: Line 32:


Make your plots easy to use. Ideally, one plotting script per figure. Make sure your script only imports data it is going to use. As with data creation, keep an index of the version and date (eg. Python 2.7, 2014-08-15)
Make your plots easy to use. Ideally, one plotting script per figure. Make sure your script only imports data it is going to use. As with data creation, keep an index of the version and date (eg. Python 2.7, 2014-08-15)

== Example ==
Click this [https://www.repository.cam.ac.uk/handle/1810/250382 link]to see an example of properly indexed data in the university repository

Revision as of 15:49, 28 August 2015

Data archiving procedure (for EPSRC-funded researchers)

Requirements

The EPSRC requires that all publications with a publication date on/after 1st May 2015 include a statement describing how to access the underlying research data. This means the data must be publicly available, and easy to understand.

The Thom Group uses the university data repository for this purpose. (http://www.data.cam.ac.uk/repository) For every paper, there should be a corresponding directory with a metadata file about its contents.


Required information

When the data was created (yyyy-mm-dd)

How the data was created:

  • If you used publicly available software, it is sufficient to state the name and version of the software eg: Q-Chem (Version 4.0.1, Q-Chem, Inc., Pittsburgh, PA (2007) www.q-chem.com)
  • If you used your own code or software which is not publicly available, you should include a copy of that code or, in the worst case scenario, details of the software and its creators. eg: ‘Data produced using The Thom Group's qcmagic script’

How the data was plotted. Including the code you used to plot the data is sufficient (as long as it clear how it can be run (ie does not refer to data files which are missing or in a different directory)


Archive as you go

You can make the archiving stage much easier by preparing for it as you research, rather than trying to clean up messy data and unintelligible code at the end. Here are some tips:

WHEN CREATING DATA:

Organise the data files logically. If it is possible to organise it by figure then do so. This might not be possible if your publication plots the same data in lots of different ways. In that case, make sure the way you organise your data is obvious to others (avoid meaningless acronyms eg H2-sto_trial1_TEST5.dat) Keep an index noting when and how the data was created. WHEN: yyyy-mm-dd HOW: Software used, version, input file

WHEN PLOTTING DATA:

Make your plots easy to use. Ideally, one plotting script per figure. Make sure your script only imports data it is going to use. As with data creation, keep an index of the version and date (eg. Python 2.7, 2014-08-15)

Example

Click this linkto see an example of properly indexed data in the university repository