Submitted on behalf of IFR by
Dr Marianne Defernez & Dr Nick Walton
September 2005
Information on the consultation can be found at: www.bbsrc.ac.uk/society/consult/data_sharing_policy/Welcome.html
1. Data areas
We feel it is impracticable to cover all data fields; we agree with the two areas suggested by the Working Group, at least in the first instance.
2. Data types
Metabolomics provides an illustrative example. As discussed at the “Metabomeeting” (Cambridge, July 2005, supported by BBSRC), the issue of “what data” is being discussed as part of the UK initiative to formalise a “blueprint” for data-sharing in the metabolomics field. It is not a straightforward issue, even, for example, in the context of building a database of relatively-standard NMR spectra. Raw data are really the only data that allow a user to carry out their own analysis (as any form of processing may be usage-dependent). However, raw data are often not usable at all and must undergo some form of pre-processing before being used; this can represent a considerable part of the analysis.
Acquiring and dealing with metadata is also one
of the most problematic areas. The metadata are more difficult
to define than the data per se but include critical factors such
as (for example) microbiological inoculation, aeration or sampling
procedures that can vary, sometimes unknowingly, from operator
to operator and that can bedevil inter-laboratory comparisons.
In some cases, even the definition of the organism itself (e.g.
genetic instability, presence of plasmids etc.) can pose a serious
difficulty. It will be as important to specify how metadata are
lacking or are incomplete as to indicate what metadata have been
recorded, and how.
3. Timeliness etc.
It will be important to ensure that the interests of PIs, and their ability to capitalise on their efforts in collecting the data, are not compromised by data-sharing policies. We would support an appropriate period of exclusive use. The IP and secondary-use issues need to be carefully anticipated well in advance and managed on a case-by-case basis.
4. Requirements for implementation / culture
The data-sharing policy needs to be seen by PIs as providing new research opportunities, not as a constraint or as an administrative (or financial) burden. Making data publicly available is certainly not cost-free, because (1) it will undoubtedly represent an extra effort on the part of researchers, and (2) given the size of some data sets, storage could become very expensive. These costs have to be assessed against the expected benefits.
There is also a need to identify and provide guidance on appropriate tools, ensuring that their application (in storing and accessing data) is as time- and cost-efficient as possible. At present, we are unsure about the existence and robustness of such mechanisms.
The quality-control aspects are central to success and curation is a major issue, especially if there is a total detachment between data-originator and end-user.
Science+Innovation, the IFR's Newsletter reflects IFR's latest science discoveries, and demonstrates its economic impact