Thursday, 17 July 2014

The Data Analysts Toolkit Day 4: Getting to know R a bit better

So what actually is R? Let's take a look at running some commands, and we can talk about it as we go along.

R is a powerful environment for statistical computing
It is like a calculator and its power is: 
… it lets you save results in variables. Let's take a look:

> x <- 3="" font="">
> y <- 5="" font="">
> z = 4
> x + y + z

Ok, so we can run little commands and it gives us numbers back. What's all the fuss about?

We can save data in files. They are saved as .RData files. R lets us import and export data in lots of different ways. If it sounds fun, it is because it is! Let's take a look.

> someData <- 1:10="" font="" nbsp="">
> save(someData, file = "D:/DataAnalystsToolkit/someData.Rdata") 
> rm(someData) 
> load("D:/DataAnalystsToolkit/someData.Rdata") 
> print(someData)

Well, that is great if our data is in RData format. But normally we need to import data from different sources.

Import from CSV File
Read.csv()
MyDataFrame <- div="" filepath.csv="" nbsp="" read.csv="">
print(MyDataFrame)

Alternatively we can load data in directly from RStudio. 
Go to Tools in RStudio, and select Import Dataset. 
Select the file CountryCodes.csv and select the Import button.
In RStudio, you will now see the data in the data pane.

The console window will show the following:
> #import dataset
> CountryCodes <- directory="" files="" header="F)</font" orking="" ountrycodes.csv="" read.csv="" rogram="">
>   View(CountryCodes)

Once the data is imported, we can check the data.

dim(CountryCodes)
head(CountryCodes)
tail(CountryCodes) 


So how do we connect R to SQL Server?

The Package RODBC provides R with a connection to ODBC databases:

> library(RODBC) 
> myodbcConnect <- dsn="servername" font="" nbsp="" odbcconnect="" pwd="******" uid="userid">

myQuery <- ...="" div="" from="" mytable="" nbsp="" where="">
# or read query from file 
# myQuery <- ataanalyststoolkit="" myquery.sql="" nchars="99999) </div" readchar="">
myData <- color="#ffa500" font="" sqlquery="">myodbcConnect
, myQuery, errors=TRUE) 
odbcCloseAll()

I hope that the rather bizarre colour coding will help you to see how the variables add up together to create a query.

RODBC also works for importing data from Excel files:

library(RODBC) 
filename <- ataanalyststoolkit="" font="" myspreadsheet.xls="" nbsp="">
myxlsFile <- filename="" odbcconnectexcel="" readonly="FALSE) </font">
sqlSave(myxlsFile, a, rownames = FALSE) 
b <- a="" font="" myxlsfile="" nbsp="" sqlfetch="">
odbcCloseAll()

Next, we will look in more detail how R and PowerBI complement one another.










Post a Comment