Bad Habits in R programming from Nice-R-Code site
Some Bad R Programming habits
Recently, I’ve prepared/refreshed my old programming talk materials. It was a talk I gave internally to all the summer interns in 2018. I found an very interesting and good post from the web.
This web post gives a lot of good and bad programming examples. I thought it might be worthwhile for me to share the post content here so that people including myself could be benefited from it in the future.
The original post is here.
You can also read the post below.
Avoid indexing by location unless you compute those locations.
Don’t use
T
andF
as shortcuts forTRUE
andFALSE
. This dates back to the language that predates R. For why not to do this, try runningT <- FALSE
. Now bothT
andF
areFALSE
! Try runningTRUE <- FALSE
— R prevents you from doing this.Don’t be inconsistent (i.e., be consistent) with assignment symbols, variable names, etc. The more consistent you are, the easier you will find it to revisit your code in the future.
Don’t use
attach
. I’m debating even listing this function’s name, but don’t use it. If you see it in somebody’s code, don’t copy it. Just trust me on this one.Avoid copy and pasting large chunks of code between projects. Once you get in the habit of making functions, copy those around, but I have seen huge messes of code with problems that are impossible to track down due to this.
Don’t write enormously long functions that are difficult to read (this is sometimes unavoidable to a degree in cleaning up ecological data). Try to have each function do just one thing.
Don’t write functions that depend on global variables, as this makes it hard to use those functions elsewhere (which was the point of writing them!). All data should be given as arguments to the function.
Do document your work. You probably cannot have enough describing where data files came from, what scripts do what, how particular pieces of code work. Avoid writing comments that just repeat what the code says, and try and write comments that describe your intent, reasons for approaches, sources of data / code / algorithm — things that will make your life easier when you come back to a project.
Keep things tidy. Tidy data is a wonderful data manipulation philosophy proposed by the great Hadley Wickham. You can find his paper here. In my personal opinion, every data scientist or statistical analyst should read this paper once.
Indent your code, and do it consistently (I usually set my auto-indentation by 4-space). There are many styles (google around to be amazed) but proper indentation is the difference between easy to read and hard to read code, as indent depth implies something about program structure. Take our variance function from earlier.
variance <- function(x) {
n <- length(x)
m <- mean(x)
(1 / (n-1)) * sum((x - m)^2)
}
variance(data$Height)
vs
variance <- function(x) {
n <- length(x)
m <- mean(x)
(1 / (n-1)) * sum((x - m)^2)
}
variance(data$Height)
It is much more obvious that the body of the variance function is three lines long in the second version than the first. Skimming the code, you will probably interpret the first version has only having a one line body and miss most of the function.
A good rule of thumb is to code for the person who will maintain your code. As this is usually you, that should not be too hard to motivate.
Hope you like what you just read.