Learn R


The Lynda course continues with examples on how to create charts and plots, check statistical assumptions and the reliability of your data, look for data outliers, and use other data analysis tools. Finally, learn how to get charts and tables out of R and share your results with presentations and web pages.Topics include:- What is R?

  • Installing R
  • Creating bar character for categorical variables
  • Building histograms
  • Calculating frequencies and descriptives
  • Computing new variables
  • Creating scatterplots
  • Comparing means

Up and Running with R

Vignette can help to show some examples for using package.

# Brings up list of vignettes (examples) in editor window
> vignette(package = "qcc")
# Open web page with hyperlinks for vignette PDFs etc.
> browseVignettes(package = "qcc")

Charts and Statistics for One Variable

> barplot(site.freq[order(site.freq, decreasing = T)])

We can define the color we want each bar show.

> fbba <- c(rep("gray", 5),
+           rgb(59, 89, 152, maxColorValue = 255))
> barplot(site.freq[order(site.freq)], horiz = TRUE, col = fbba)

We can do some customization for the bar plot.

barplot(site.freq[order(site.freq)],
        horiz = T,         # Horizontal
        col = fbba,        # Use colors "fbba"
        border = NA,       # No borders
        xlim = c(0, 100),  # Scale from 0-100
        main = "Preferred Social Networking Site\nA Survey of 202 Users",
        xlab = "Number of Users")

hist(sn$Age,
     #border = NA,
     col = "beige", # Or use: col = colors() [18]
     main = "Ages of Respondents\nSocial Networking Survey of 202 Users",
     xlab = "Age of Respondents")

Color list of R. We can refer to this picture to find color by name or index.

boxplot(sn$Age,
        col = "beige",
        notch = T,
        horizontal = T,
        main = "Ages of Respondents\nSocial Networking Survey of 202 Users",
        xlab = "Age of Respondents")

We can use summary function to describe the data.

Charts for Associations

# Is there an association between the percentage of people
# in a state with college degrees and interest in
# data visualization?
plot(google$degree, google$data_viz,
     main = "Interest in Data Visualization Searches\nby Percent of Population with College Degrees",
     xlab = "Population with College Degrees",
     ylab = "Searches for \"Data Visualization\"",
     pch = 20,
     col = "grey")
# Linear regression line (y ~ x) 
abline(lm(google$data_viz ~ google$degree), col="red")
# Lowess smoother line (x, y)
lines(lowess(google$degree, google$data_viz), col="blue")

# Use "Pairs Plot" from "psych" package
install.packages("psych")
library("psych")
pairs.panels(google[c(3, 7, 4, 5)], gap = 0)

With package psych, we can draw scatter picture with more information.

On top of it, we have overlaid, what is called a kernel density estimator. It's like a normal distribution, but you see it can have bumps in it. You'll see that on degree. At the very bottom of that, it's really tiny here, but we have sort of a dot plot that shows where the actual scores are for each one with these tiny vertical lines. Then what we have are the scatterplots. These are on the bottom left side of the matrix. We have the scatterplot with the dot for the means of the two variables.

We have a lowess smoother coming through; that's the curved red line. Then the ellipse is sort of a confidence interval for the correlation coefficient. The sort of the longer and narrower the ellipse, the stronger the association, the rounder, the less the association. The numbers that are on the top side are mirror images of these, and those are correlation coefficients for each one of them. So, for instance, we can see that the correlation between data_viz, and degree is positive, and it's 0.75.

Correlations go from zero to one. Zero is no linear relationship, and one is a perfect linear relationship. They are positive if there is an uphill relationship, and negative if it's downhill. That's a very strong association. On the other hand, you can see that interest in data_viz, and interest in NBA as a search term -- that's the scatterplot that's in the very bottom left -- it's kind of circular and scattered all over the place. If you look at the very top right of this matrix, you see the correlation is 0.23. It's not very strong. Anyhow, this is a really rich kind of matrix that shows histograms, it shows dot plots, it shows kernel density estimators, it shows scatterplots with lowess smoothers, and its correlations, and it's one of the great reasons for using the psych package.

Comments
Write a Comment