Archiving data
Data archiving procedure (for EPSRC-funded researchers)
Requirements
The EPSRC requires that all publications with a publication date on/after 1st May 2015 include a statement describing how to access the underlying research data. This means the data must be publicly available, and easy to understand.
The Thom Group uses the university data repository for this purpose. (http://www.data.cam.ac.uk/repository) For every paper, there should be a corresponding directory with a metadata file about its contents.
Required information
When the data was created (yyyy-mm-dd)
How the data was created:
- If you used publicly available software, it is sufficient to state the name and version of the software eg: Q-Chem (Version 4.0.1, Q-Chem, Inc., Pittsburgh, PA (2007) www.q-chem.com)
- If you used your own code or software which is not publicly available, you should include a copy of that code or, in the worst case scenario, details of the software and its creators. eg: ‘Data produced using The Thom Group's qcmagic script’
How the data was plotted. Including the code you used to plot the data is sufficient (as long as it clear how it can be run (ie does not refer to data files which are missing or in a different directory)
Archive as you go
You can make the archiving stage much easier by preparing for it as you research, rather than trying to clean up messy data and unintelligible code at the end. Here are some tips:
WHEN CREATING DATA:
Organise the data files logically. If it is possible to organise it by figure then do so. This might not be possible if your publication plots the same data in lots of different ways. In that case, make sure the way you organise your data is obvious to others (avoid meaningless acronyms eg H2-sto_trial1_TEST5.dat) Keep an index noting when and how the data was created. WHEN: yyyy-mm-dd HOW: Software used, version, input file
WHEN PLOTTING DATA:
Make your plots easy to use. Ideally, one plotting script per figure. Make sure your script only imports data it is going to use. As with data creation, keep an index of the version and date (eg. Python 2.7, 2014-08-15)