What is the programming language R used for?

Introduction to RStudio

As the course progresses, we will alternate between two different approaches to learning. I have these approaches Top down and Bottom up and will explain the idea behind it in detail. Getting started with R is essential through the use of RStudio facilitated. We will discuss the features and handling of this program in detail below.

Regardless of the approach, we will directly carry out exercises in R on the respective topics. Most of these exercises are described in these documents, and the corresponding R code can be taken directly from these documents RStudio be accepted and executed.

Furthermore, homework assignments have to be completed between the course dates. I will announce the exact procedure during the course. To assess your performance in the course, interactive online questions have to be answered. I will also announce details on this during the course.

aims

  • Building R-Studio.
  • where and how can you find useful information about R and RStudio.
  • Definition of data and objects in R
  • basic logic of commands and functions in R.
  • Read in and edit data.
  • request simple descriptive statistics.
  • create and edit simple graphics.
  • know some useful (statistical) functions.

General

Below is a brief overview of the genesis of R.

History of R1

illustration 1: from F to R

  • Developed in 1992, R is closely based on the S language. Another source of inspiration was the Scheme programming language. The software was first presented to the public in 1993 and since June 1995, R has been under the GNU General Public License.
  • By 1996 or 1997 there were between 50 and 100 people on a mailing list who helped improve the language together. In 1997 the R Development Core Team was formed (today R Core Team), which takes care of the further development of R and can change the source code.
  • The Comprehensive R Archive Network (CRAN) as a platform for packages started on April 23, 1997 to give users the possibility to more easily share self-written functions with others.
  • R for macOS has been available since April 2001. In September 2002 the members of the R Development Core Team founded the non-profit association The R Foundation for Statistical Computing in Vienna, which takes care of the external presentation.
  • The R version 2.0 was released on October 4th, 2004. Since then, R has been using lazy loading to load data quickly with little memory usage.
  • From version 2.1 (April 18, 2005), R supports different language versions (internationalization) and character encodings, especially UTF-8.
  • With the introduction of version 2.11 in April 2010, which makes R usable on 64-bit systems and can address up to eight terabytes of RAM, the entry into big data processing was successful.
  • In October 2011 (version 2.14) the parallel execution of functions was introduced.
  • In addition to all these technical developments, countless packages for all conceivable areas of application have been developed by the community.
  • Currently (as of November 2018) there are 13,346 packages (with the command available.packages ()) you can query this at any time) only on CRAN! The packages of other repositories (e.g. Bioconductor2 with 1,649 packages) not yet taken into account!

Why should R. use?

R is a programming language that is used particularly for the analysis and visualization of data. A lot of new developments in statistics happen in R. Even without previous knowledge of a programming language, you can quickly find your way around in R.

With the help of RStudio, the countless tutorials and help pages, you can write the first helpful programs after a very short time. All of these tools are open source, i.e. transparent, independently changeable and, above all, free of charge. If you compare the advantages and disadvantages of using R, the decision to put work into R can be justified relatively easily:

  • advantages
    • In addition to common programs for statistical analysis, such as “SPSS”, R has the advantage that it is available free of charge all over the world.
    • R can import most of the popular formats and provides an open source format for data sets created.
    • In addition, R provides more powerful and more evaluation methods than many other programs.
    • R is a programming environment. Functions can be easily adapted to your own needs. Complex problems can be solved even if the developers have not (yet) implemented them.
    • R is continuously being developed by the scientific community. New statistical methods are usually also integrated into R. A standardized package system facilitates subsequent installation as well as the publication of your own packages.
    • R creates professional graphics in a variety of formats.
    • R facilitates the reproducibility of studies through partially self-documenting codes.
    • R offers a variety of online tutorials, blogs and other forms of assistance.
  • disadvantage
    • For the beginner, the functionality and operation of R will undoubtedly take some getting used to.
    • the formatting of results in R (tables, graphics, etc.) often has to be carried out by the user through additional program steps.
    • R is basically code-based. User-friendly interaction windows for the parameterization of analyzes must be created by the programmer himself with the help of appropriate program packages (e.g. Shiny, Java, etc.).

When should R. use?

A hardcoR.e ’ler would not understand this question and would probably devote himself to programming again, shaking his head, without paying any further attention to the questioner. For everyone else, however, the question arises as to whether and from when the effort to learn a programming language is worthwhile. Actually, everyone can only decide for themselves. Good reasons could be, for example, if:

  • No financial means are available to purchase statistical software that is subject to a fee.
  • the software available is not able to solve a statistical problem at hand.
  • you want to develop an evaluation strategy (sanity checks, standard procedures, etc.) in a team.
  • dynamic reporting should be used in the course of a research process.

When should R not use?

If the sole aim is to process a data set exploratively and / or with common analysis methods, you can fall back on existing and user-friendly applications. In the following we will briefly take a look at two very useful program packages.

If programs such as SPSS, SAS or similar are available, one should be clear about the goals set before starting R and make a cost-benefit calculation. Very often it is much easier to familiarize yourself with the syntax of the respective program than to solve everything in R. It should also be noted that (at least) SPSS already offers an interface with R. Details can be found in the corresponding manuals.

Other important considerations when moving / introducing R are:

  • how many employees use a statistical evaluation program and how many of them would have to familiarize themselves with R.
  • which evaluation methods do I want to use in the long term? Are there procedures that cannot be implemented with SPSS, SAS, etc.?
  • How much can I rely on the correctness of the results from R? After all, everything is freeware and no liability is assumed for the correctness of the results. Does that play a role in my decision?

Alternatives to R

HardcoR.e ’ler do not know the word alternative in connection with R. Therefore, the following figures give an overview of the currently (as of Nov. 2018) essential statistical programs on the market3without claiming to be a real alternative to R.

Free software:

Figure 2: freely available statistics programs

Proprietary software:

Figure 3: fee-based statistics programs

As of November 2018, according to Tiobe4 R in 14 place - 3 places before Matlab! SPSS is not included in the current evaluation list. The usage history of R since 2010 suggests that R will retain its importance for statistical analysis in the years to come, if not significantly further developed.

Figure 4: History of the TIOBE index for R since 2008 (the ratings are calculated by counting hits of the most popular search engines. The number of hits determines the ratings of a language. The counted hits are normalized for each search engine for all languages ​​in the list. In other words, all languages ​​together have a score of 100%.)

In addition to this list, there are many other applications, some of which have been tailor-made for very special analysis techniques - if necessary, simply search for them on the web.

JAMOVI

Is a computer program for data analysis and statistical tests.

Figure 5: Brief description of JAMOVI

Jamovi uses R. for the statistical evaluations. This means that this program can also be used, if necessary, to switch between the convenient use of JAMOVI and a detailed further development of JAMOVI programs. The package provides the prerequisites for easily switching between R and JAMOVI jmv. In order to make the R code used in JAMOVI visible, the following setting must be selected in JAMOVI:

Figure 6: Representation of the R code in JAMOVI

If you copy the code part from JAMOVI to R, it can be edited / expanded:

JASP

Is also a computer program for data analysis and statistical tests. JASP has a similar range of functions as JAMOVI. A special feature can be that most of the functions in addition to the usual (“frequentistic”) form also in a second form, based on the Bayesian statistics, Are available. As with JAMOVI, the range of functions can be expanded using modules.

Figure 7: Brief description of JASP

In the case of JASP, in addition to Bayesian statistics, the data sets available for getting started with this program are also worth mentioning. Sorted according to subject areas and referenced according to origin, you can experiment with the corresponding processes and, if necessary, look up the literature - which uses the same data sets to explain the processes. You can also export this data and use it in JAMOVI!