R has many built-in **functions** along with several others that you can access by installing and loading various packages. This means whenever you use R, you’re already using functions whether you realize it or not. This guide will teach you how to actually *write* your own functions and explain why they can be helpful in both writing and reading code.

**The Basic Syntax of a Function**

Functions in R use the following syntax:

function.name <- function(arg1, arg2, arg3 = 10, ...) { variable1 = arg1 * arg2 #some calculations variable1 / arg3 #return some value }

**function.name** – the name of the function. You can name a function whatever you’d like, but you should be careful to not use names already used elsewhere in R like *function*, *plot*, etc.

**arg1, arg2, arg3** – arguments of the function. Your function can have any amount of arguments you’d like. Note that you can specify default values for some arguments. For example, arg3 has a specified value of *10 *in the example above.

**…** – the ellipses argument is an optional argument that allows for other arguments to be passed into the function and onto another function. This is often used for plotting.

**Function body** – the code within the brackets {} is run every time the function is called. In general, it’s a good idea to write functions that are fairly short and do one specific thing. Note that the lines within the function body are typically indented by two spaces to make the function easier to read.

**Return value** – the last line of code within the brackets {} provides the value that will be returned by the function. In the example above, the value for variable1/ arg3 will be returned by the function. Note that a function doesn’t have to return any value. For example, functions that generate plots may not return any mathematical values. Also note that you may type out **return **variable1/arg3 in the function above, but it’s not required.

The following example illustrates a function that takes three numbers as arguments and returns the product of the three numbers:

#define the function multiply_three_numbers <- function(num1, num2, num3) { num1 * num2 * num3 } #call the function multiply_three_numbers(1, 2, 3) #[1] 6

The function above used the three numbers provided as arguments and multiplied them together to return 1 * 2 *3 = **6**.

**Using the Correct Number of Arguments**

Note that if the number of arguments provided to the function does not match the number of arguments defined in the function, an error will occur:

#call the function with too few arguments multiply_three_numbers(1, 2) #Error in multiply_three_numbers(1, 2) : # argument "num3" is missing, with no default #call the function with too many arguments multiply_three_numbers(1, 2, 3, 4) #Error in multiply_three_numbers(1, 2, 3, 4) : unused argument (4)

**Examples of Functions**

The following examples show several different functions in action.

**Function: Find Descriptive Statistics**

The following function takes a vector as an argument and outputs a string that shows the mean, standard deviation, and number of elements in the vector:

#define function descriptive_stats <- function(x) { mean = round(mean(x), 1) sd = round(sd(x), 1) length = length(x) paste("mean:", mean, "standard deviation:", sd, "count:", length) } #run function numbers <- c(12, 14, 15, 38, 37, 35, 24, 2, 6, 5) descriptive_stats(numbers) #[1] "mean: 18.8 standard deviation: 13.8 count: 10"

The following function performs the same calculations as the previous function, except it outputs the results as a vector:

#define function descriptive_stats <- function(x) { mean = round(mean(x), 1) sd = round(sd(x), 1) length = length(x) c(mean, sd, length) } #run function numbers <- c(12, 14, 15, 38, 37, 35, 24, 2, 6, 5) descriptive_stats(numbers) #[1] 18.8 13.8 10.0 #access themeanfrom the vector descriptive_stats(numbers)[1] #[1] 18.8 #access thestandard deviationfrom the vector descriptive_stats(numbers)[1] #[1] 13.8 #access thelengthfrom the vector descriptive_stats(numbers)[3] #[1] 10

**Function: Extract the last ***n *characters from a string

*n*characters from a string

The following function takes a string and a number *n* as an argument and outputs the last *n *characters of the string:

#define function last_n_characters <- function(string, n){ substr(string, nchar(string)-n+1, nchar(string)) } #use function to find last4characters of a string x <- "I like to walk in the park" last_n_characters(x, 4) #[1] "park" #use function to find last 7 characters of a string x <- "My favorite color is blue" last_n_characters(x, 7) #[1] "is blue"

**Function: Find Z Scores**

A **z-score** tells you how many standard deviations away from the mean a certain value lies. The following function calculates the z-scores for each value in a vector:

#define function find_z_scores <- function(x){ (x- mean(x)) / sd(x) } #use function to find z-score of each value in vectorxx <- c(1, 14, 7, 5, 23, 12, 4, 5, 7, 4) find_z_scores(x) #[1] -1.1115724 0.8954333 -0.1852621 -0.4940322 2.2848988 0.5866632 #[7] -0.6484172 -0.4940322 -0.1852621 -0.6484172

**Function: Create a Blue Plot with pre-defined axis labels**

The following function creates a scatterplot with blue dots and pre-defined axis labels. Using the argument **. . . **, we tell R to allow this function to also use other built-in *plot *functions as well.

#define function blue_plot <- function(x, y, ...) { plot(x, y, col = 'blue', xlab = 'X-axis', ylab = 'Y-axis', ...) } #run function blue_plot(1:10, 1:10, pch = 17, main = 'A plot of blue triangles')

Notice that we didn’t include *pch* or *main *explicitly in the arguments of the function, but by using the ellipses ( **. . .** )argument, we were able to use *pch *and *main *anyway.

**What Makes a Good Function?**

A good function in R has three properties: It is (reasonably) short, performs a single operation, and has an intuitive name.

**Short** – The point of functions is to make R code easier and more convenient to both read and write. If a function is several hundred or thousands of lines long, that largely defeats the purpose of using a function.

**Performs a single operation** – A function should ideally perform a single operation to keep things as simple as possible.

**Uses intuitive names** – A function that multiplies three numbers together should possess an intuitive name like *multiply_three_numbers* as opposed to something vague like *m3*. This makes it much more convenient to understand what a function does and it allows for more efficient implementation of the function within an R script.

**Functions Increase Productivity**

Functions increase productivity by allowing you to write a function one time and simply use it over and over again to perform a single operation. Functions also make an R script more readable and user friendly.

**Additional Resources**

To learn more about functions in R, check out the following resources:

Hadley Wickham – Functions

R Documentation – Function Definition