Skip to Main Content

Research Data Management

Learn about best practices in research data management

Formatting Dates

Date formats can vary between countries. The most common confusion is between the United States and European formats:

US - April 8, 2021, or 04/08/2021 vs. European - 8th April, 2021 or 08/04/2021. 

Choosing a standard format for dates, and using a numerical notation, will help avoid confusion and errors.

ISO 8601 is the best standard for date formats:

YYYY-MM-DD =  2021-04-08 or 20210408

You can also break this down further with time notation if needed: 

YYYYMMDDTHH:MM:SS, or 20210408T15:21:09

 

Learn more about ISO 8601 here. 

Extensibility

Like you see in the example above about consistent file naming, it's helpful to use extensible file names to help organize and sort files with numerical content. When you view files in your file explorer or folders on your computer, we have all probably experienced the numbers being out of order and having to hunt for the file you need. The answer to that is extensibility!

When creating your file naming structure, think about whether you will be using image outputs or other ordered content and plan for that. If you know you will have hundreds or even thousands of files, building in that placeholder will allow you to easily order and find your files. 

Example of file extensibility
Good Bad!

AtherRat_ex001_lipitor.tif

AtherRat_ex1_lipitor.tif

AtherRat_ex002_lipitor.tif

AtherRat_ex10_lipitor.tif

AtherRat_ex003_lipitor.tif

AtherRat_ex2_lipitor.tif

AtherRat_ex004_lipitor.tif

AtherRat_ex3_lipitor.tif

 

Open Formats for Data

Best practices for preservation is to save your data on preservation formats. These four formats are the gold standard for making sure your data will be available for long term, as they can be opened and viewed on any operating system using any kind of software. They are:

  • XML: Extensible markup language -- this is used to ensure simplicity, generability and usability across the internet and can be used to save documents or web service content
  • CSV: Comma separated values -- this is an ideal way to save spreadsheets in a preservation format. Excel, Google Docs and any other spreadsheets can open CSV files
  • PDF: Ideal for saving documents in perpetuity. Note that PDFs are not easily editable, and should be used to freeze a document in time that will not be changed
  • TIFF: Tagged image file format -- the gold standard for saving image files. TIFF’s a preservation ready, and will ensure the quality of images over time.

Other tips for file naming

Avoid using special characters in your file names: 

~ ! @ # $ % ^ & * ( ) ’ “ ; < > ? { } [ ]

Most modern software probably won't allow these characters in names, but avoid them regardless. Special characters can cause confusion with coding or scripting languages or create errors. 

Avoid using abbreviations in your file names. This leads to confusion and makes the file name difficult to read. You will probably forget the abbreviation you created! Make file names clear and human-readable. 

BAD: msewt.csv

GOOD: 2018_09_20_mouse_weight.csv

File Naming Software Resources

Starting fresh with a new project and developing a file naming scheme is the best way to save time and aggravation. But if you need to clean up an existing file structure, there are tools out there to help you and make it less time-consuming. No endorsement is implied for any specific tool below, this is a list of available options. There may be more out there as well.

Why is File Naming Important?

Consistent file naming conventions help you avoid errors or duplication in your research, make your files both machine and human-readable, and make file sorting and organization easier. 

In the example below, the same sample is given two different names by two different lab members, leading to confusion and duplication of work

Example of file naming showing two different names for the same sample

 

Being specific and consistent in your file naming makes it easier to quickly read and identify files in a list, and makes it clear what type of information is contained within.

In the example below, the file name contains a date, which repetition it is from a gene expression experiment, and you can see that it is a spreadsheet file by the .csv extension. 

2016_05_10_gene_expression_rep01.csv

2016_06_01_gene_expression_rep02.csv

2016_06_20_gene_expression_rep03.csv

2016_07_11_gene_expression_rep04.csv