Is R function applying syntactic sugar

Is R's family of applications more than syntactic sugar?


... in terms of execution time and / or memory.

If it doesn't, prove it with a snippet of code. Note that acceleration due to vectorization does not count. The speedup has to come from (,, ...) itself.

Reply:


The functions in R do not provide improved performance over other loop functions (e.g.). One exception is the one, which can be a little faster because it does more work in C code than in R (see this question for an example).

In general, however, the rule applies that You should use an Apply function for clarity, not for performance .

I want to add that Apply functions have no side effects which is an important difference in functional programming with R. This can be overridden using or, but it can be very dangerous. Side effects also make it difficult to understand a program, since the status of a variable depends on its history.

To edit:

Just to emphasize this with a trivial example that computes the Fibonacci sequence recursively; This can be done multiple times to get an accurate measurement. The point, however, is that none of the methods have significantly different performance:

Edit 2:

In terms of using parallel packages for R (e.g. rpvm, rmpi, snow), these generally offer family functions (even the package is essentially equivalent, despite the name). Here is a simple example of the function in:

This example uses a socket cluster that does not require any additional software to be installed. Otherwise, you'll need something like PVM or MPI (see Tierney's clustering page). has to apply the following functions:

It makes sense to use functions that run in parallel because they no Have side effects . If you change a variable value within a loop, it is set globally. On the other hand, all functions can safely be used in parallel as changes are local to the function call (unless you try or or in this case you can introduce side effects). Needless to say, it is important to be careful with local and global variables, especially when it comes to parallel execution.

To edit:

Here is a trivial example to demonstrate the difference between and in relation to side effects:

Notice how that is changed, but not, in the parent environment.







Sometimes the acceleration can be significant, e.g. For example, if you need to nest for loops to get the average based on grouping more than one factor. Here you have two approaches that can get you exactly the same result:

Both give exactly the same result, namely a 5 x 10 matrix with the averages and named rows and columns. But :

Here we go. What have I won? ;-);







... and as I just wrote elsewhere, vapply is your friend! ... it's like sapply, but you also specify the return value type which makes it much faster.

Update from January 1, 2020:



I've written elsewhere that an example like Shane's doesn't really emphasize the difference in performance between the different types of loop syntax, since all of the time is spent inside the function, rather than actually emphasizing the loop. In addition, the code unfairly compares an out of memory for loop to functions in the Apply family that return a value. Here is a slightly different example that highlights the point.

If you want to save the result, you can apply family functions much be more than syntactic sugar.

(The simple listing of z is only 0.2s, so the lapply is much faster. Initializing the z in the for loop is pretty quick as I am taking the average of the last 5 out of 6 runs to be outside of the system moves hardly anything)

Another thing to note, however, is that there is one more reason to use family functions regardless of their performance, clarity, or the absence of side effects. A loop usually encourages putting as much in the loop as possible. This is because, for each loop, variables must be set up to store information (including possible operations). Apply statements tend to be biased. Often times, you will want to do multiple operations on your data, some of which can be vectorized but some of which may not. In R, unlike other languages, it is best to separate these operations and perform those that are not vectorized in an apply statement (or a vectorized version of the function) and those that are vectorized as true vector operations. This often speeds up performance enormously.

Using Joris Meys as an example of replacing a traditional for loop with a convenient R function, we can use it to demonstrate the efficiency of writing code in a more R friendly way for a similar speed up without the special function.

This is much faster than the loop and just a little slower than the built-in optimized function. That's not because it's that much faster than, but because only one operation is performed on each iteration of the loop. In this code, everything else is vectorized. In Joris Meys' traditional loop, there are many (7?) Operations occurring in each iteration, and there is quite a bit of setup just to get it done. Also, notice how much more compact this is than the version.







When applying functions over subsets of a vector, it can be pretty quick as a for loop. Example:

However, in most situations there is no increase in speed, and in some cases it can even be much slower:

But for these situations we have and:


We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from.

By continuing, you consent to our use of cookies and other tracking technologies and affirm you're at least 16 years old or have consent from a parent or guardian.

You can read details in our Cookie policy and Privacy policy.