Wednesday, 20 August 2014

Jen's Diary, figuring it out this week


He started keeping a journal - had been, in fact, secretly doing so for some time: the furtive act of a deranged person (Philip Dick, Valis).
 
What  have I been up to this week? There’s lots of activity been happening at Jen Towers, as usual. That, plus I have a stinking, miserable cold so please feel free to send me a remote hug! I’m writing this in my dressing gown and drinking Lemsip.


As before, I don't officially speak for PASS but I can see that people were reading the last installment, so I thought I'd write another. So it's your fault really!
 
This Friday 22nd August, I will be doing a twitter ‘Ask Jen’ online session again, which is specifically aimed at European time zones and for the good SQL Server SQLFamily in Cape Town, Dubai and so on, for example. Check out the time zone here.  I got some good questions last time, and I thought I’d repeat the exercise. I cannot guarantee that I will be able to answer everything but I will try.

 Those of you who have read your Connector this week will have seen Adam Jorgensen’s discussion about the Fiscal Year 2015 Budget. Ever since I joined the Board last January, we have all been doing a lot of work on getting the budget right. My specific area is the Virtual Chapters, and I have spent a lot of time discussing it with my ever-patient HQ Counterpart (Thank you Lana!). The focus was on growth, and I’m really excited about the Virtual Chapters. For the purposes of transparency, you can see the actual budget for the Virtual Chapters here on page 13.   The VC focus was very much on growth; with the news recently that we gave over 21,000 hours of training material last year to the community – for free – I hope that this helps to see how worthwhile the Virtual Chapters are to SQLfamily around the world. If you haven’t signed up for one, why not join today? http://www.sqlpass.org/PASSChapters/VirtualChapters.aspx Even now, for the VCs, I am starting to think about how we can make the process easier for next time, and taking away points to help us to learn. I’m very grateful to the Finance team, Adam and the others who took time to explain the budgeting process.


From the Board perspective, I’ve spent a few hours on the phone this week but I’m not complaining. The PASS HQ and Board members and other volunteers always leave me with that feeling of ‘oh man, I wish I’d thought of that!’ They are truly a bunch of very smart people and I learn something every day. In fact, I'd go right into Imposter Syndrome only I'm not sure I'm smart enough for that.
 
What have we talked about for a few hours, I hear you ask? Well, it wasn’t one single call, fortunately! I had meetings to talk about exciting things for Summit and PASS BA Conference. I’m sorry for the teaser but everyone works really hard, so I won’t steal anyone’s thunder since it wouldn’t be fair. All I’ll just say that you need to keep reading your Connector emails for news.
 

I have had another week of trying to wrap my head around activities in 28 Virtual Chapters. This week, my focus has been on trying to see what I can do to help Virtual Chapters that need new volunteers since they are growing so fast. If anyone is interested in being part of the Virtual Chapter story, please do get in touch. Why volunteer for the VCs? Well, it means that you reach people globally by offering community education. I love holding my user groups since I love in-person events. However, virtual events mean that you help that person at their desk, working away, who isn’t able for any reason to attend virtual events. For example, I’m a mom, and I find it really hard to attend user groups unless I book a babysitter well in advance (and I do). However, I can be part of the virtual chapters at any point because I don’t need to leave my house. It is huge to be able to have hundreds of people online, participating in learning. I love that buzz!
 

I’ve also been working with the team, evaluating some new ideas for a couple of Virtual Chapters. I love people’s ideas and people in the community are so smart. If you have any ideas, please do get in touch. We have a very informal process. Firstly, we have to make sure that the new idea isn’t close to any existing, established Virtual Chapters. This would mean splitting the audience, and there is no need to do that. We don’t tend to split out technologies very much, but we did split out the Excel BI Virtual Chapter based on a need for Excel information. This particular VC isn’t aimed at people who have SQL Server backgrounds. Instead, it is aimed at people who use Excel in work every day, regardless of the data source. 

 
More locally, I’m working with community legend Neil Hambly on next week’s London PASS Chapter user group. We are trying out a new idea of having a ‘theme’, so we are having a Special Edition on Performance. Neil is a seasoned presenter, and he will be presenting on waits. After that, we will have Ash presenting on demystifying flash memory for the SQL Server DBA. I think that both sessions will help people who care about SQL Server performance, and I’ve always wanted to know more about the using flash as opposed to other technologies. It promises to be a good one.

 

Enjoy your day J

 

 

Sunday, 17 August 2014

Day 6: The Data Analysts Toolkit: Why are Excel and R useful together, and how do we connect them?





Why is analytics interesting? Well, companies are starting to view it as profitable. For example, McKinsey showed analytics was worth 100Bn today, and estimated to be over 320Bn by 2020.
When I speak to customers, this is the 'end goal' - they want to use their data in order to analyse and predict what their customers are saying to them. However, it seems that folks can be a bit vague on what predictive modelling actually is.



I think that this is why Power BI and Excel are a good mix together. It makes concepts like Predictive Modelling accessible, after a bit of a learning curve. Excel is accessible and user-friendly, and we can enhance our stats delivery using R as well as Excel.


One area of interest is Predictive Modelling. This is the process of using a statistical or model to predict the value of a target variable. What does this actually mean?  Predictive modelling is where we work to the predict values in new data, rather than trying to explain an existing data set. To do this, we work with variables. By their nature, these vary; if they didn’t, they would be called a constant.

One pioneer was Francis Galton, who was a bit of an Indiana Jones in his day.  Although he wrote in the 19th century, his work is considered good and clear enough to read today. Therefore, this research has a long lineage, although it seems to be a new thing. We will start with the simplest: linear regression.

Linear regression compares two variables x and y to answer the question, “How does y change with x?” For predictive modelling, we start out with what are known as ‘predictor variables’; in terms of this question, this would be x. The result is called the target variable. In this question, this would be y. Why would we do this?


  • Machine Learning
  • Statistics
  • Programming with Software
  • Programming with Data 
  • Fun!

Why would businesses work with it at all?

  • to discover new knowledge and patterns in the data
  • to improve business results 
  • to deliver better customised services
If we have only one predictor variable and the response and the predictor variable have a linear relationship, the data can be analyzed with a simple linear model. When there is more than one predictor variable, we would use multiple regression. In this case, our question would be: , “How does y change with multiple x?” 

In fitting statistical models in which some variables are used to predict others, we want to find is that the x and y variables do not vary independently of each other, but that they tend to vary together. We hope to find that y is varying as a straight-line function of x.

If we were to visualise the data, we would hope to find a pleasing line chart which shows y and x  relating to each other in a straight line, with a minimal amount of ‘noise’ in the chart. Visualising the data means that the relationship is very clear; analysing the data means that the data itself is robust and it has been checked.

I think that's why, in practice, Power BI, Excel and R work well together. R has got some great visualisations, but people are very comfortable with Excel for visualisations. All that loading packages stuff you have to do in R... it doesn't work for everyone. So we use R and Excel, at a high level, as follows:

  • We cleanse and prepare data with Excel or Power Query
  • We use RODBC to load data into R
  • We analyse and verify the data in R
  • We build models in R
  • We load the data back into Excel using RODBC
  • We visualise the data for results

Excel is, after all, one of the world's most successful software applications ever, with reputedly over one billion users. Using them both together means that you get the best of both words: R for analysis and model building: Excel is the 'default' for munging data around, and visualising it. I'm sure that one of the most popular buttons on software such as Tableau, QlikView et al is the 'Export to Excel' or 'Export to CSV' functionality. I'd be interested to know in what people think about that!

Building linear regression models in R is very simple; in our next session, we will look at how to do that, and then how to visualise it in Excel. Doing all this is easier than you think, and I will show you how.


Tuesday, 12 August 2014

Like bookclubs? Interested in learning Javascript online?


Like bookclubs? Interested in learning Javascript online? 

Ike Ellis is the lead for the PASS Book Readers VirtualChapter, and we're going to learn JavaScript in 8 weeks by following the recommended text and having fun discussions. 

August 20th is the first session, so you have plenty of time to get the book!

Sign up for the webinar at this link:



Friday, 8 August 2014

Jen's Diary, figuring it out this week

Don’t read my diary when I’m gone. Ok, I’m going to work now, when you wake up this morning, please read my diary. Look through my things, and figure me out. (Kurt Cobain)

I liked Andy Warren's suggestion about blogging more often about PASS.  At first, I was flattered that he'd actually read my blog at all! Then, I thought about it and, although I don't officially speak for PASS (I'm not on the Executive Committee), folks might like to know what I've been working on for the PASS Board. I should probably do this regularly because I do something "PASS" pretty every evening.

I've been up to a lot of other PASS work as well as organising SQLSaturday London Business Analytics edition, as part of my role for the Board. I will blog separately about this work. 

Guide, Encourage and Motivate Volunteers has been the theme this week. Well, I've been so busy that I had to look in my email to see what I've been doing! I'm a believer in individual contact where possible and I spend a lot of time helping folks over email or Skype. 

This week, I've had a lot of email traffic from community members who are volunteers, interested in volunteering, and some follow up from SQLBits where people are interested in PASS Chapters and want to get involved. A couple of folks are interested in setting up PASS Chapters at various locations around Europe, and a few others would like to volunteer but aren't really sure how to get started. There's the usual amount of techie questions that I get directly as well, which usually happens via Facebook or direct to my email account. I've been lucky to point one particular person to the High Availability and Disaster Recovery Virtual Chapter, since Ryan Adam's session was perfect for him. My portfolio is the Virtual Chapters, and there is an ongoing work there; for example, helping to find speakers or Virtual Chapters leader support. We are working on the Virtual Chapter sessions for September, so watch this space for more details. 

I was chuffed to be on the PASS website, talking about the benefits of running for PASS Board.  I made a little video and posted it on YouTube, and lo and behold, it was on the PASS site! I was pleased to see Adam Jorgensen's video as well. 

Why run for PASS? My video is below:




Here's a great infographic, which says it without my Scottish accent.





















I hope that helps to see what I've been doing this week. Any questions, please let me know.


August 8th, 2014

Free Windows Azure Hybrid Cloud eBook


Hearing buzz around hybrid cloud solutions? Got some questions?
For a limited time, download an eBook from Microsoft solution architects that shares their perspective on what hybrid IT is, why it’s important, and how you should determine whether a hybrid cloud is right for your business.
Download a copy today!

Thursday, 17 July 2014

The Data Analysts Toolkit Day 5: How do R and Power BI fit together?

How do R and Power BI fit together?

Technically, it is about munging data around between R and Excel and the Power BI components. 

You can use RODBC to connect to data between R and SQL Server, or R and Excel. Alternatively you can import data in.

Why else might you use R?

  • Pivot Tables are not always enough
  • Scaling Data (ScaleR)
  • R is very good at static data visualisation but Power BI and Excel are very good at dynamic data visualisation
  • You want to double check your results or do further analysis
They complement one another; they do not replace one another.

You may have heard my story about one organisation calculating the median incorrectly. 

The truth is, people don't often check their data. I help design data warehouses all the time, and I don't always hear people talk about reconciliation. I do hear about people racking up SSAS and diving straight in.

Just because something works technically, does not mean it is correct.

Upworthy and Facebook use R. A lot. So why not you? It is achievable.
Why R, and not some other package?

  • R most widely used data analysis software - used by 2M + data scientist, statisticians and analysts
  • Most powerful statistical programming language
  • used with RStudio, it can help you for the purposes of productivity
  • Create beautiful and unique data visualisations - as seen in New York Times, Twitter and Flowing Data
  • Thriving open-source community - leading edge of analytics research
  • Fills the talent gap - new graduates prefer R.
  • It's fun!

Excel is used by an estimated 1.3 billion people on the planet. That sounds really impressive, until you think that many people are often using it wrong!
R just helps you to do that double check, the sanity check, to see if your data is correct.


The Data Analysts Toolkit Day 4: Getting to know R a bit better

So what actually is R? Let's take a look at running some commands, and we can talk about it as we go along.

R is a powerful environment for statistical computing
It is like a calculator and its power is: 
… it lets you save results in variables. Let's take a look:

> x <- 3="" font="">
> y <- 5="" font="">
> z = 4
> x + y + z

Ok, so we can run little commands and it gives us numbers back. What's all the fuss about?

We can save data in files. They are saved as .RData files. R lets us import and export data in lots of different ways. If it sounds fun, it is because it is! Let's take a look.

> someData <- 1:10="" font="" nbsp="">
> save(someData, file = "D:/DataAnalystsToolkit/someData.Rdata") 
> rm(someData) 
> load("D:/DataAnalystsToolkit/someData.Rdata") 
> print(someData)

Well, that is great if our data is in RData format. But normally we need to import data from different sources.

Import from CSV File
Read.csv()
MyDataFrame <- div="" filepath.csv="" nbsp="" read.csv="">
print(MyDataFrame)

Alternatively we can load data in directly from RStudio. 
Go to Tools in RStudio, and select Import Dataset. 
Select the file CountryCodes.csv and select the Import button.
In RStudio, you will now see the data in the data pane.

The console window will show the following:
> #import dataset
> CountryCodes <- directory="" files="" header="F)</font" orking="" ountrycodes.csv="" read.csv="" rogram="">
>   View(CountryCodes)

Once the data is imported, we can check the data.

dim(CountryCodes)
head(CountryCodes)
tail(CountryCodes) 


So how do we connect R to SQL Server?

The Package RODBC provides R with a connection to ODBC databases:

> library(RODBC) 
> myodbcConnect <- dsn="servername" font="" nbsp="" odbcconnect="" pwd="******" uid="userid">

myQuery <- ...="" div="" from="" mytable="" nbsp="" where="">
# or read query from file 
# myQuery <- ataanalyststoolkit="" myquery.sql="" nchars="99999) </div" readchar="">
myData <- color="#ffa500" font="" sqlquery="">myodbcConnect
, myQuery, errors=TRUE) 
odbcCloseAll()

I hope that the rather bizarre colour coding will help you to see how the variables add up together to create a query.

RODBC also works for importing data from Excel files:

library(RODBC) 
filename <- ataanalyststoolkit="" font="" myspreadsheet.xls="" nbsp="">
myxlsFile <- filename="" odbcconnectexcel="" readonly="FALSE) </font">
sqlSave(myxlsFile, a, rownames = FALSE) 
b <- a="" font="" myxlsfile="" nbsp="" sqlfetch="">
odbcCloseAll()

Next, we will look in more detail how R and PowerBI complement one another.