UsefulCode Series: A necessary header for every R script
A necessary header for every R script
Useful code for useful people
The UsefulCode repository (repo for short) is intended as a resource for myself, my colleagues, and anyone who wants useful examples to speed up their R coding tasks. I'll post new code to this repository as I come across it and welcome anyone to submit their own examples via the GitHub Pull Request function. Every time a new example is added, a short description will be added to the README GitHub file, but I'll also post the code and description on my website!
Script_Starter.R
The Script_Starter.R file represents what I typically use as a "starter section" for each new R script in any of our projects. It is comprised of 4 components:
- Header: The header can be as long as necessary, but I recommend 6 lines at a minimum. The header defines the script's purpose, specifies the author, creation date, most recent update date, and finally the R version used to write the script. Altogether, the header is useful to remind yourself and others of the context of the script.
#### R script used to demonstrate a useful header and setup code for each script
# Developed by Ben Block, Tetra Tech; Ben.Block@tetratech.com
# Date created: 09/27/2024
# Date last updated: 09/27/2024
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# R version 4.4.1 (2024-06-14) -- "Race for Your Life"
- Libraries: All R packages that I will use for analyses are imported in this section. They should never be loaded within the script itself.
# Libraries needed (for example)
library(dplyr)
library(readxl)
library(readr)
- Directories: I always work in R projects. Always! So the "working directory" is always the folder in which the project is located. I declare the working directory, define the date, and specify an input directory, output directory, and results directory. The results directory is regenerated each day (by defining the date), that way, a new folder is created if the script is re-executed to avoid any data being overwritten. This is super helpful as analyses progress. [You could use the dir.create snippet to always check/create input and output folders, but I usually do this manually at the start of every project.]
# Declare directories ####
wd <- getwd()
myDate <- format(Sys.Date(), "%Y%m%d")
input.dir <- "Input_Data"
output.dir <- "Output_Data"
results.dir <- paste0("/Sample_Results_",myDate,"/")
# create results folder
boo_Results <- dir.exists(file.path(wd, output.dir, results.dir))
if(boo_Results==FALSE){
dir.create(file.path(wd, output.dir, results.dir))
}
- Specify and read input files: I first specify the file names that I will import in one section and then read them in below. This makes the code easier to read [by a human] by avoiding code that wraps or requires the user to scroll to understand where the data are coming from. For example, I get a lot of long file names from clients (e.g., 'Here is my data 2023 - 07- 31 final version.xlsx). It's just easier this way, trust me!
# specify input files
fn.data1 <- "Dataset1.csv"
fn.data2 <- "Dataset2.csv"
fn.data3 <- "Dataset3.xlsx"
# Read data files ####
df_data1 <- read_csv(file.path(wd, input.dir, fn.data1)
, na = c("NA",""), trim_ws = TRUE, skip = 0
, col_names = TRUE, guess_max = 100000)
df_data2 <- read_csv(file.path(wd, input.dir, fn.data2)
, na = c("NA",""), trim_ws = TRUE, skip = 0
, col_names = TRUE, guess_max = 100000)
df_data3 <- read_excel(file.path(wd, input.dir, fn.data3)
, na = c("NA",""), trim_ws = TRUE, skip = 0
, col_names = TRUE, guess_max = 100000)
# cleanup
rm(fn.data1, fn.data2, fn.data3, input.dir)
I find that this starter section avoids any confusion as to what the script does, what packages it uses, where data are imported and exported, and which files are used for analyses.
That's all for now! I hope this helps and happy coding!
To keep up with this GitHub repository, sign up for my website's newsletter!