Because I was fully expecting to fail the assessment, I screen shot the questions thinking I'll work them out later and try again. Having passed it instead, I am curious enough to go back and work out the answers anyway. It's very possible that I did the following problems wrong. So do leave a comment if you disagree with my answers.
If you are doing the LinkedIn R assessment test, I highly recommend having R open. If you don't have it installed, use an online R interpreter such as https://www.tutorialspoint.com/execute_r_online.php.
R Programming Notesheet
I am also reviewing R more seriously now (June 2020), and started to make a note sheet covering R syntax. For those of you interviewing and need a refresher on R, check the post here. It is still a working progress right now.
LinkedIn R Assessment Questions
So here's the handful of LinkedIn R Assessment questions and my answers. Again, might be right; might be wrong.
1) The correlation between predictors a and b is 0.0. The state anova(lm(y ~ a+b, dataset)) returns an SS total of 120 and an SS of y on a of 67.5. The total R-squared is 0.75. What percent of the variability of y is attributable to predictor b. This is a statistics question. Even without knowing anything about R, seeing anova(lm(y ~ a+b, dataset) should suggest to you that it's asking about the ANOVAs of a standard OLS regression.
The formula for R^2 is 1-SSr/SSt, where SSr is the "explained sum of squares, and SSt is the total sum of squares.
We are given that the independent variables are uncorrelated (i.e., so we can just sum the SSr for y ~ a and SSr for y ~ b together to get the SSr of y ~ a+b), and that the sum of squares for y ~ a is 67.5, SSt of y ~ a+b is 120. R^2 (for y ~ a+b) is 0.75.
Knowing that R^2 = SSr/SSt. We can fill in the #s as: 0.75 = (67.5+?)/120 and solve for ?. ? Is 22.5.
So the answer is 22.5
2) How many values does each element of xrange contain?
rawdata <- br="" c="" rnorm="">fact <- br="" gl="">xrange <- fact="" range="" rawdata="" simplify="TRUE)</i" tapply="">->->->
The answer is 2.
rnorm() generates random normal variables. c() just string them all in a list. The gl function generates "factor levels". gl(n,k) expects an integer n giving the number of levels and an integer k that gives the number of replications. Sort of like repmat in matlab. So gl(2,3) gives a "1 1 1 2 2 2". Finally, tapply breaks the vector into groups, and here we are calculating "range". In this question, that doesn't matter other than the you should know range returns 2 numbers.
You can just type the above into an R and find out. If you do that, you'll get:
$`1`
[1] -1.136463 1.295984
$`2`
[1] -1.864532 1.330702
$`3`
[1] -1.437063 2.343262
$`4`
[1] -1.602229 2.204354
3) A data frame named pizza includes a numeric column named week. Missing values are coded as NA. Which function returns the percent of rows with an NA value for the week?
So missing values are encoded as NA.
From Matlab, I know is.na probably will return a T/F value on whether an element is NA or not.
Since we want the number of rows with NA, we do not need the ! to flip the T/F values.So it is just a matter of making sure I know the names of the functions in R (instead of Matlab or Stata or SAS or Perl).
You can check using the following code:
pizza = c(NA,NA,2)
is.na(pizza)
mean(is.na(pizza))
4) Which set of two statements -- followed by the cbind[] function -- results in a data frame named vbound?
cbind takes objects. I assume it combines different objects into something that's more general. I.e., if you have two lists, you get back a list. But if you combine a list and a dataframe, you probably get back a data frame. So I'm guessing to get a dataframe, you would do something like this:
V1 <- br="" c="">V2 <- br="" c="" data.frame="">Vbound <- cbind="" i="">->->->
5) X is a vector of type integer, as shown in line 1 below. What is the type of the result returned by the statement > median(x)? X <- b="" c="">->
You can sort of guess that L is just a way to specify type, and since the whole list of type L. median, I am guessing, returns either a predetermined type, or it returns the most general type in the list. L represents integer type. And since the whole list is integer type, it probably returns an integer type if it can. (It can't if there is an even number in the list, since then it needs to return an average of two numbers, likely a decimal.
Note the From the documentation: "The default method returns a length-one object of the same type as x, except when x is logical or integer of even length, when the result will be double."
So the answer is double if there are an even number of items in X, and integer if there's an odd number of items.
6) Review the code below. Can you coerce mylist to a data frame using as.data.frame?
mylist <- b="" c="" list="">
->
Since data.frame is a generic container (like a cell in MATLAB), I am guessing the answer is yes. But as the question suggests, it requires "coercion" via the I function. The following works.
mylist <- br="" c="" list="">df<- as.data.frame="" br="" mylist="">df->->
7) How do you show the names of all the objects in memory that contain the letter V?
It's purely a syntax question. I just entered some sample code to check. Turns out the right answer is ls(pat="V")
Example code:
aa = 3
aV = 4
ls(pat="V")
8) Why does sum(!is.na(pizza$week)) return the number of rows with valid, non-NA values in the column named week?
See question 3 above. The ! negates the true/false values, and T=1 and F=0 in most programming languages. So !is.na assigns things that are not NA the value 1. Summing these 1s and 0s return the number of things in the list that are not NA.