82 Generate character vectors with rep()

Quantitative variables are great, but in simulations we’re often going to need categorical variables, as well.

In my own work these are usually sort of “grouping” or “treatment” variable. This means I need to have repeated observations of each character value. The rep() function is one way to avoid having to write out an entire vector manually. It’s for replicating elements of vectors and lists.

82.1 Using letters and LETTERS

The first argument of rep() is the vector to be repeated. One option is to write out the character vector you want to repeat. You can also get a simple character vector through the use of letters or LETTERS. These are built in constants in R. letters is the 26 lowercase letters of the Roman alphabet and LETTERS is the 26 uppercase letters.

Letters can be pulled out via the extract brackets ([). I use these built-in constants for pure convenience when I need to make a basic categorical vector and it doesn’t matter what form those categories take. I find it more straightforward to type out the word and brackets than a vector of characters (complete with all those pesky quotes and such).

Here are the first two letters.

letters[1:2]
#> [1] "a" "b"

And the last 17 LETTERS.

LETTERS[10:26]
#>  [1] "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" "T" "U" "V" "W" "X" "Y" "Z"

82.2 Repeat each element of a vector with each

There are three arguments that help us repeat the values in the vector in rep() with different patterns: each, times, and length.out. These can be used individually or in combination.

With each we repeat each unique character in the vector the defined number of times. The replication is done “elementwise”, so the repeats of each unique character are all in a row.

Let’s repeat two characters three times each. The resulting vector is 6 observations long.

This is an example of how I might make a grouping variable for simulating data to be used in a two-sample analysis.

rep(letters[1:2], each = 3)
#> [1] "a" "a" "a" "b" "b" "b"

82.3 Repeat a whole vector with the times argument

The times argument can be used when we want to repeat the whole vector rather than repeating it elementwise.

We’ll make a two-group variable again, but this time we’ll change the repeating pattern of the values in the variable.

rep(letters[1:2], times = 3)
#> [1] "a" "b" "a" "b" "a" "b"

82.4 Set the output vector length with the length.out argument

The length.out argument has rep() repeat the whole vector. However, it repeats the vector only until the defined length is reached. Using length.out is another way to get unbalanced groups.

Rather than defining the number of repeats like we did with each and times we define the length of the output vector.

Here we’ll make a two-group variable of length 5. This means the second group will have one less value than the first.

rep(letters[1:2], length.out = 5)
#> [1] "a" "b" "a" "b" "a"

82.5 Repeat each element a different number of times

Unlike each and length.out, we can use times with a vector of values. This allows us to repeat each element of the character vector a different number of times. This is one way to simulate unbalanced groups. Using times with a vector repeats each element like each does, which can make it harder to remember which argument does what.

Let’s repeat the first element twice and the second four times.

rep(letters[1:2], times = c(2, 4) )
#> [1] "a" "a" "b" "b" "b" "b"

82.6 Combining each with times

As your simulation situation get more complicated, you may need more complicated patterns for your categorical variable. The each argument can be combined with times to first repeat each value elementwise (via each) and then repeat that whole pattern (via times).

When using times this way it will only take a single value and not a vector.

Let’s repeat each value twice, 3 times.

rep(letters[1:2], each = 2, times = 3)
#>  [1] "a" "a" "b" "b" "a" "a" "b" "b" "a" "a" "b" "b"

82.7 Combining each with length.out

Similarly we can use each with length.out. This can lead to some imbalance.

Here we’ll repeat the two values twice each but with a total final vector length of 7.

rep(letters[1:2], each = 2, length.out = 7)
#> [1] "a" "a" "b" "b" "a" "a" "b"

Note you can’t use length.out and times together (if you try, length.out will be given priority and times ignored).