Saturday, February 1, 2014

Plotting GlucoseBuddy Data Using R, Starting from the Beginning

This page is about people who have a lot of data, divided into categories, and want to understand that data to draw conclusions. This is about people with type 1 diabetes, who have a lot of data about their blood glucose levels, but may have trouble figuring out from their data exactly what they are doing well, and where they could put effort into improving. In short, you have simple plots like this that just don't capture all the nuances of your life:


You may have logged tons of data, and just averaging it all together doesn't do justice to either your effort, or the biochemical reasons to avoid extreme blood glucose values. This article is about seeing all your data, at once, so you can draw conclusions, in the simplest way possible. For me, that plot looks something like this:



This article is about you making the same plots with your data, using free tools.


Background: Why the Numbers are Important

Diabetes is a disease of unregulated sugar levels in the blood stream. Normally, the human body continually adjusts blood glucose levels [wikipedia] to maintain them in a fairly narrow range, between 4 mM and 7 mM. 5 mM (mM = millimolar) is the same thing as 5 mmol/L, but to convert to mg/dL, you would need to multiply by 18. How much glucose [PubChem] does it take to make 5 mM (90 mg/dL) in the human body? With a slight bit of math, we can turn molarity (mM) [wikipedia] into an amount that you may be familiar with from your kitchen:

5 mM * 0.0001 M/mM = 0.005 M
0.005 M * 180.15 g/mol = 0.9 g/L
0.9 g/L * 0.03527 oz/g = 0.032 oz sugar per liter blood
0.032 oz sugar per liter blood * 5 liters blood per person = 0.16 oz sugar

Now it's not actually fair or correct to convert a measure of mass (oz) to a measure of volume (cup/tablespoon/teaspoon), but we'll do it the best we can just to get a sense of how much stuff we're talking about here.

0.16 oz = 0.023 cups [based upon granulated sugar]
0.023 cups * 48 teaspoons/cup =
1 teaspoon Glucose / 150 lb person to raise blood glucose by 5 mM.

Sucrose is the white granulated sugar that you bake with, and it is a 'disaccharide', meaning that it made of two sugar units, namely glucose and fructose. Sucrose has roughly 1/2 the glycemic index of glucose, being made up of 1/2 glucose and 1/2 fructose. Therefore, about 2 teaspoons of sucrose would raise the average person's blood sugar 5mM, assuming you absorb everything you eat. Now a can of soda has 10 teaspoons of sugar in it, and drinking one would put the average diabetic way over their blood glucose limit. Over the long term, high blood glucose levels can cause any of a number of health problems, such as heart disease, kidney, eye, and nerve damage.

My sister has type 1 diabetes [MedlinePlus]. People with type 1 diabetes don't make insulin, which is the compound that the human body uses regulate blood glucose levels. That means she has to do it on her own, without the help of her body. It's not easy. Actually, it's quite hard. The rest of us get to take for granted that our bodies are dutifully checking and adjusting our blood glucose levels. So I might be eating a Snickers bar or a KitKat, no big deal, my body will make some adjustments. My body may ask cells at my waistline to mop of the excess glucose, and I remain otherwise healthy. But that same Snickers bar could push the blood glucose of a type 1 diabetic way out of control.

The solution provided by modern medicine for the problem of a type 1 diabetic is to inject insulin, manually, to consciously emulate what the human body normally does unconsciously. It takes discipline, and a lot of energy. Every day, she needs to manage her blood glucose at the microscopic level. It's not easy to think about the big picture when you're caught up in the minute-to-minute details. So this is where I thought I could help.

I knew that she was recording her blood glucose levels using Glucose Buddy [GlucoseBuddy], an app that runs on your smartphone or on their website and logs the results from your glucose tests. She was generating tons of data, but didn't have time to make a big project out of analyzing it. I wanted to find a way to plot the data for her so she could see the big picture, but not just "that was a bad week", but in a way that drew comparisons that might be hard to notice just looking at the numbers as they come in.

I was learning the program R [R Project], which is a free program that is great for analyzing data. What follows is a step-by-step explanation of how I loaded, plotted, and distilled the data. People already using glucose buddy might be able to just run the commands, and get a fresh look at how their blood glucose levels are doing. There seem to be tons of people learning to use R [stackoverflow], and maybe some of those people will find this example helpful for more distant appications as well.


GlucoseBuddy: Getting Your Data

GlucoseBuddy.com is a great way to get started logging and evaluating your glucose levels. My goal here is to improve on one of the plots offered by glucosebuddy.com, so you can see all your data, not just the average over some long interval. The idea here is that if you made the effort to log your data, you probably want to see it.

To start, you'll need your data that you logged at glucosebuddy.com. Login, and download your data to a CSV file, which is a comma separated list format that can be read by any program. MsExcel is a great way to open the file and look around, and even plot data. If you don't have data but still want to follow along, [Example Data].

When you load it in a spreadsheet, it will look something like this:




Starting with R: Don't Be Afraid

R is free and powerful, albeit somewhat confusing. Hence, this guide is meant to entice you to put a little effort into learning it. You can start by getting it here: R project. There are graphical user interface versions of it, but I think the simplest and most powerful version is the command line. Start R, and type the following command:

help(plot)

This command gives you a description of how to use the 'plot' command. Use the up and down arrows to read the help notes, then press 'q' to exit the help display when you're done. The help() command can be used to lookup info on any command. But before we can plot data, we need to load it first. We will load up some data from Glucose Buddy, from a file named 'MyExportedGlucoseBuddyLogs.csv'.

read.table("MyExportedGlucoseBuddyLogs.csv")
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 

  line 1 did not have 14 elements

The above command was supposed to read in all the data, but it failed. R expected the file to use spaces to separate the data, while the file actually used commas. Let's rerun the last command with some changes. Press the 'up' arrow to get the last command you entered on the command line, then use left and right arrows to position the cursor and edit the command. We need to tell R that the GlucoseBuddy data has a header line, and that it uses commas to separate the data columns:

read.table("MyExportedGlucoseBuddyLogs.csv", header=TRUE, sep=",", as.is=TRUE)
    Type Value   Unit    Name            Event           Date.Time Notes

  1    M   2.0  units humalog     After Dinner 05/07/2012 22:12:53      
  2   BG   7.8 mmol/L               Before Bed 05/08/2012 00:11:35      
  3    M  11.0  units  lantus       Before Bed 05/08/2012 00:13:18      
  4   BG   5.4 mmol/L         Before Breakfast 05/08/2012 07:09:34      
etc...

You're likely to have so much data that it all scrolls off the top of the screen. To capture the output into a variable, change the command to read as follows:

csv <- read.table("MyExportedGlucoseBuddyLogs.csv", header=TRUE, sep=",", as.is=TRUE)

Now the variable 'csv' has stored all the data that was loaded. We can see the contents of the variable 'csv' by just typing its name at the command line:

csv
    Type Value   Unit    Name            Event           Date.Time Notes

  1    M   2.0  units humalog     After Dinner 05/07/2012 22:12:53      
  2   BG   7.8 mmol/L               Before Bed 05/08/2012 00:11:35      
  3    M  11.0  units  lantus       Before Bed 05/08/2012 00:13:18      
  4   BG   5.4 mmol/L         Before Breakfast 05/08/2012 07:09:34    
etc...

R keeps track of the different columns in the data file by their name that was given by the 'header' line in the file. To list the column names:

names(csv)
[1] "Type"      "Value"     "Unit"      "Name"      "Event"     "Date.Time" 
[7] "Notes"

Re-typing these commands can be tedious, even with 'up' and 'down' arrows to recall previous commands, or even if you're using cut-and-paste. To get around re-typing text, I like to put all the commands in a text file (i.e. "myRCommandFile.R"). Then when I update the commands using a text editor, I can re-run all the commands in the file with this single command:

source("myRCommandFile.R")

Each time you run the 'source()' command, every command within that file is run, so you can easily reinitialize all the variables that you are using, or just keep track of the comands that you have used. You can also make comments in your file by putting a ' # ' at the beginning. Staying organized is important, because things always get more complicated than you anticipated.

Getting back to your data, you can look at the values for only one column name from your GlucoseBuddy data (i.e. 'Value') with the following:

csv$Type
  [1] "BG"  "M"   "BG"  "BG"  "BG"  "M"   "M"   "BG"  "BG"  "BG"  "BG"  "M"   
 [13] "BG"  "M"   "BG"  "BG"  "BG"  "M"   "BG"  "BG"  "BG"  "BG"  "M"   "M"   
 [25] "BG"  "BG"  "BG"  "BG"  "M"   "BG"  "BG"  "BG"  "M"   "M"   "BG"  "BG"  
 [37] "BG"  "BG"  "BG"  "M"   "BG"  "M"   "BG"  "BG"  "BG"  "BG"  "M"   "M"
etc...

The terminal prints only the data for the 'Type' column for every row in the dataset. The 'BG' entries are blood glucose measurements. To extract out only the blood glucose measurments, we need a subset of the dataset where 'Type' is equal to 'BG'. Let's immediately save this set of data to a variable 'bloodGlucose'.

bloodGlucose <- subset(csv,csv$Type=="BG") 

Now when you type 'bloodGlucose$Type', the only values you see should be 'BG', because you just selected that subset. You can see the blood glucose values for each of these data points using the 'Value' column name:

bloodGlucose$Value
  [1]  6.4  7.2  6.4  6.7  5.1  4.1  6.2  8.2  7.3  7.2 10.2  8.6  5.6  5.5  5.2 
 [16]  5.3  6.7  6.8  6.9  5.4  8.7  6.9  4.8  5.3  4.3  5.8  4.1  5.5  4.9  9.3 
 [31]  7.0  6.2 10.1  5.9  7.5  7.3 14.4  6.7  4.6  4.8  9.1  4.2  8.3  6.5  5.3 
etc...

Plotting with R: Show Me the Data

With the blood glucose data stored in memory in a variable 'bloodGlucose', let's see some results. With one typed command, let's visually summarize all the data, as divided up by the 'Event' that the blood glucose reading was logged under. We'll use a boxplot, which shows the average value of the data as a line through the center of a box. The box covers the upper and lower 'quartiles', or points between 25% and 75% of the data. The 'whiskers' to the box plot summarize the spread of the data beyond the quartiles, while some 'outliers' may lie outside the whiskers (wikipedia). You can make one like this:

boxplot(as.numeric(bloodGlucose$Value) ~ factor(bloodGlucose$Event,ordered=TRUE),
 las=2,
 main="Daily Blood Glucose",
 ylab="Blood Glucose (mM)") 

The boxplot command worked like this: 'boxplot(<y-data-column> ~ <x-data-column>)'. To turn the list of character strings stored in 'Event' into a set of categories, I used 'factor()'. This transforms the X-axis data into an enumerated data type. I slipped in the extra parameter 'las=2' to rotate the labels so they all fit. To find out where I got that parameter, you could do 'help(par)' to read about graphing parameters. Or you could skip it. Either way, the text still falls off the bottom of the graph. To fix it, we need to set another graphing parameter so that R knows to leave space for our long words. We need to set the parameter 'fig'. Coincidentally, you could also learn more about this with 'help(par)', or you could just google it. The following command displays the current values of 'fig'.

par('fig')
[1] 0 1 0 1

The parameter 'fig' is actually a list of 4 numbers, specifying the size of the plot on a scale of 0 to 1, for X and Y. The R command 'c()' makes a list out of the values you give it. Let's set aside 1/10 of the plot space at the bottom to fit our long labels. Reset the 'fig' parameter as follows (if it opens a blank window after the 'par()' command, leave it open), then replot the boxplot.

par(fig=c(0,1,0.1,1))
boxplot(as.numeric(bloodGlucose$Value) ~ factor(bloodGlucose$Event,ordered=TRUE),
 las=2,
 main="Daily Blood Glucose",
 ylab="Blood Glucose (mM)")

Loading Packages to Help Us

Next up is showing all the data points, not just the boxes. We'll use the R package 'beeswarm', which makes a one dimensional scatter plot for you. If you haven't installed the package already, you can do so with the following R command. A graphical menu pops up, then you pick which mirror to download from, and then it should install.

install.packages('beeswarm')

With the beeswarm package installed, you need to load the library. While you install the package only once, you have to load the library every time you start R, including the time that you have just installed it. Note that there are no quotes this time.

library(beeswarm)

Let's replot the data using beeswarm.

beeswarm(as.numeric(bloodGlucose$Value) ~ factor(bloodGlucose$Event,ordered=TRUE),
 las=2,
 cex=0.5,
 spacing=0.8,
 xlab="",
 main="Daily Blood Glucose",
 ylab="Blood Glucose (mM)")

The extra parameter 'cex=0.5' makes each point half the default size, and 'spacing=0.8' slightly reduces the spacing between points in a category.

Now let's color the points by the time of day that the data was recorded. The time of each data point is stored in column 'Date.Time'. To look at the time for data in the first row:

bloodGlucose$Date.Time[1]
[1] "04/10/2012 01:29:00"

While that looks like a date, R doesn't actually know that it is, because we loaded it as a string of characters. You can see that with this command:

class(bloodGlucose$Date.Time) 
[1] "character" 

We can parse the character string into a time data type like this, storing the time data type in place of the original data:

bloodGlucose$Date.Time <- strptime(bloodGlucose$Date.Time,"%m/%d/%Y %H:%M:%S")

To see that it worked, we can peek again at the first row of data.

bloodGlucose$Date.Time[1] 
[1] "2012-04-10 01:29:00" 

We can check that we converted the data to a time data type by using the 'class()' command again.

class(bloodGlucose$Date.Time) 
[1] "POSIXlt" "POSIXt" 

We are ready to use the time data to color the points by their time of day. The parameter 'pwcol' specifies a list of colors that corresponds to each data point. The parameter 'pch' specifies the point shape. Then we use the 'format()' command to take a time object and format it how we like, in this case as just the hour of the day on a 24 hour clock.

beeswarm(as.numeric(bloodGlucose$Value) ~ factor(bloodGlucose$Event,ordered=TRUE),
 pwcol=format(bloodGlucose$Date.Time,format="%H"),
 las=2,
 cex=0.5,
 spacing=0.8,
 xlab="",
 main="Daily Blood Glucose",
 ylab="Blood Glucose (mM)",
 pch=7)

The colors are surprisingly meaningless, because the default palette doesn't put similar colors next to each other. To get a rainbow effect while moving through the hours of the day, we need to redefine the palette, and then rerun the previous beeswarm command.

myPalette <- rainbow(24)
palette(myPalette)
beeswarm(as.numeric(bloodGlucose$Value) ~ factor(bloodGlucose$Event,ordered=TRUE),
 las=2,
 cex=0.5,
 spacing=0.8,
 xlab="",
 main="Daily Blood Glucose",
 ylab="Blood Glucose (mM)",
 pwcol=format(bloodGlucose$Date.Time,format="%H"),
 pch=7)

Now there appears to be some organization to the color scheme. It's just that we need a legend to see what that organization is. Here we'll use the 'shape' library for it's function 'colorlegend()'. As before, install the library if you haven't already.

install.packages('shape')
Load the 'shape' library, which will allow us to use the function 'colorlegend()' to add a legend to your existing plot.
library(shape)
colorlegend(col=myPalette,zlim=c(0,24),zval=seq(0,22,2),posx=c(0.9,0.93),posy=c(0.1,0.9))

I added some parameters to the 'colorlegend()' command to set our 24 hour range, to label every other hour between 0 and 22, and to position the legend on the right side of the plot. We have a problem with the legend overlapping the plot, as we did with the text labels running off the bottom of the screen before. This time we will adjust the figure margins, since the overall figure size needs to stay big (so the legend can be plotted), we just need the main plot to stay out of the right side. The default values for the margin may not be intuitive at first, but they leave room for the figure title, the axes labels, and the axes themselves. First we peek at the default margin settings, then we increase the right margin by resetting the parameter 'mar'.

> par("mar") 
[1] 5.1 4.1 4.1 2.1 
par(mar=c(5.1,4.1,4.1,5.1))
beeswarm(as.numeric(bloodGlucose$Value) ~ factor(bloodGlucose$Event,ordered=TRUE),
 pwcol=format(bloodGlucose$Date.Time,format="%H"),
 las=2,
 cex=0.5,
 spacing=0.8,
 xlab="",
 main="Daily Blood Glucose",
 ylab="Blood Glucose (mM)",
 pch=7) 
colorlegend(col=myPalette,zlim=c(0,24),zval=seq(0,22,2),posx=c(0.9,0.93),posy=c(0.1,0.9))

I think the plot is starting to look good. There is one more thing I'd like to do to put the data in an intuitive order. But first, particularly if you're new to R, take a minute to congratulate yourself. Then we'll dig into the details of sorting the X-axis.


Sorting and Rearranging Data within R

Remember that we used 'factor()' to make an enumerated data type, or set of categories, from our list of character strings that defined our 'Event'. The enumerated data type defines the categories, then assigns each row in our data set to one of those categories. To print out those categories, you can use the 'levels()' command on a factor, like this:

events <- levels(factor(bloodGlucose$Event)) 

events 
 [1] "After Activity"   "After Breakfast"  "After Dinner"     "After Lunch"     
 [5] "After Snack"      "Before Activity"  "Before Bed"       "Before Breakfast" 
 [9] "Before Dinner"    "Before Lunch"     "Before Snack"     "During Activity" 
[13] "During Night"     "Out Of Bed"      

To assign an average time to each 'Event', first we need to be able to work with time as a number. Assuming that you still have the 'POSIXlt' type time data (see above), we can print out the hours of the day associated with the first 10 data rows using the following:

> format(bloodGlucose$Date.Time,format="%H")[1:10] 
 [1] "01" "01" "03" "14" "18" "21" "22" "22" "07" "13"

Note that there are quotes around all the numbers. That's because R thinks that they are character strings. To convert them to a number, at least as far as R is concerned, use the 'as.numeric()' function.

> as.numeric(format(bloodGlucose$Date.Time,format="%H"))[1:10] 
 [1]  1  1  3 14 18 21 22 22  7 13 

Once we have converted the values into 'numbers', then we can do things like calculate their average, or 'mean()'.

> mean(as.numeric(format(bloodGlucose$Date.Time,format="%H"))[1:10]) 
[1] 12.2 

Using these commands, we can make a list of categories that summarizes all the events, and we can calculate the average time of an event. The next trick will be putting these two ideas together by calcularing the average hour of the day for each category. You could choose to go through all the categories yourself, explicitly. But in doing so, you'd be missing out on the whole point of R, where you can write generic code that will deal with it for you. Suppose some months your data doesn't have a 'Before Exercise' data point. Do you want to manually redraw it every time? What if one month you add the category 'During Activity', are you going to scroll through your excel spreadsheet to see if you need to add new categories to your little R script?

In a moment, we will write some code that will slay the problem once and for all. We will define a function that takes as parameters the 'Event' name (categoryName parameter), and the whole blood glucose dataset (bloodGlucose parameter). Just to show the syntax for defining a function, we will define an example function that takes one parameter, and then prints it to the terminal. This example function is stored in variable 'myFunction'. Note the comment line beginning with a ' # ' that will remind us what the source code is doing.

myFunction <- function(myInput) {
        # This function prints the input parameter: myInput
        print(myInput)
        }

myFunction("hello")
[1] "hello"

To define the function that calculates the mean hour of the day for a particular 'Event', it will take as parameters the 'Event' name (categoryName), and the 'bloodGlucose' dataset (bloodGlucose). I have defined one additional parameter, because it is simpler to think about the end of the day as when you go to sleep, which is not necessarily midnight. To define the hour that the day ends, I will use one more parameter named 'rolloverTime', which marks the division between your morning or night. If a value for 'rolloverTime' is not specified, it has a default value of 4am.

calcBloodGlucMean <- function(categoryName,bloodGlucose,rolloverTime=4) { 
        # select subset of times that are for this 'Event' 
        hourList <- subset(bloodGlucose,bloodGlucose$Event==categoryName)$Date.Time 

        # select only the hour of the day out of the time data type
        #print(hourList)  # Error if BloodGlucose$Date.Time is not  “POSIXt”, i.e. “NA”
        hourList <- as.numeric( format( hourList, format="%H")) 

        # make effective end of day equal to rollover time instead of midnight
        hourList[hourList < rolloverTime] <- hourList[hourList < rolloverTime]+24 

        # use modulus in case all times were after midnight, but before your rollover time.
        #   i.e. a 1am average time will still be '01:00', not '25:00'.
        hourMean <- mean(hourList) %% 24 

        return(hourMean) 
        } 

The 'apply' functions in R can be very useful, but they it can take some time to get comfortable with them. These functions are part of what make R so special, and my understanding of them is still comparatively rudimentary [stackoverflow]. Here we use the list apply function 'sapply()' to apply the calcBloodGlucMean function to every event in our 'Event' list:

  sapply(events,calcBloodGlucMean,bloodGlucose=bloodGlucose) 
    After Activity  After Breakfast     After Dinner      After Lunch  
       17.153846        11.280000        20.926829        14.839506  
     After Snack  Before Activity       Before Bed Before Breakfast  
       16.933333        18.142857        22.326848         9.095745  
   Before Dinner     Before Lunch     Before Snack  During Activity  
       17.632558        12.341463        15.769231        17.500000  
    During Night       Out Of Bed  
       23.800000        10.473684  

There it is, the average hour of the day for all the events in the dataset. But I would find it easier to look at and to use if the data were presented in a table. We can make a table by wrapping the result into a new 'data.frame()'. The whole blood glucose dataset is already stored in a data frame, which is why the 'bloodGlucose$Event' syntax works to pull out our individual columns. Let's define a data.frame with two columns: 'Event' and 'AvgTime'. 'Event' will hold all the event names, and 'AvgTime' will hold their respective mean hour of the day.

eventAvgTime <- data.frame(  
 Event=unlist(events),  
 AvgTime=unlist(lapply(events,calcBloodGlucMean,bloodGlucose=bloodGlucose))  
 )  

eventAvgTime
              Event   AvgTime 
1    After Activity 17.153846 
2   After Breakfast 11.280000 
3      After Dinner 20.926829 
4       After Lunch 14.839506 
5       After Snack 16.933333 
6   Before Activity 18.142857 
7        Before Bed 22.326848 
8  Before Breakfast  9.095745 
9     Before Dinner 17.632558 
10     Before Lunch 12.341463 
11     Before Snack 15.769231 
12  During Activity 17.500000 
13     During Night 23.800000 
14       Out Of Bed 10.473684 

I needed the 'unlist()' command to get the values out of the list that was returned by the function 'lapply()', before inserting them into the data frame. Lists of lists in R actually still confuse me, but this worked. Alternatively, we could have used the 'sapply()' function that we used above, but I liked how 'lapply()' left the row names as their numeric defaults.

The finishing touch will be to sort this data frame, based upon the average time of the 'Event' during the day. The 'order()' command returns a list of row indices that would put the data in order. Those reordered indices can then be used to get each row out of the original data frame, thereby sorting it. Here we will just overwrite the original variable with the sorted data, but you could have put it in another variable name if you wanted.

# reorder the events by mean time 
eventAvgTime <- eventAvgTime[order(eventAvgTime$AvgTime),]  

eventAvgTime
              Event   AvgTime 
8  Before Breakfast  9.095745 
14       Out Of Bed 10.473684 
2   After Breakfast 11.280000 
10     Before Lunch 12.341463 
4       After Lunch 14.839506 
11     Before Snack 15.769231 
5       After Snack 16.933333 
1    After Activity 17.153846 
12  During Activity 17.500000 
9     Before Dinner 17.632558 
6   Before Activity 18.142857 
3      After Dinner 20.926829 
7        Before Bed 22.326848 
13     During Night 23.800000

We now have an ordered list for 'Event'. The beeswarm plot used a factor to describe the categories in the plot. We need to create an ordered factor so that our 'Event' data is plotted in order by beeswarm. Here we replace the original factor stored in bloodGlucose$Event with a factor that has been reordered according to our function. The first argument to factor is the data to divide into categories, the second argument, 'levels', names the categories, and the third argument, 'ordered', tells R that we want our data in the exact order that we supplied in 'levels'.

# recreate original Event as ordered factor  
bloodGlucose$Event <- factor(bloodGlucose$Event,levels=eventAvgTime$Event,ordered=TRUE)

You can check that the factor now has the categories in the right order, using the 'levels()' command.

levels(bloodGlucose$Event) 
 [1] "Before Breakfast" "Out Of Bed"       "After Breakfast"  "Before Lunch"     
 [5] "After Lunch"      "Before Snack"     "After Snack"      "After Activity"   
 [9] "During Activity"  "Before Dinner"    "Before Activity"  "After Dinner"     
[13] "Before Bed"       "During Night"

We can now replot the data, and the categories will be in the order that they occur throughout the day. As a bonus, I'll overlay the boxplot on top of the beeswarm plot.

myPalette <- rainbow(24)
palette(myPalette) 
beeswarm(as.numeric(bloodGlucose$Value) ~ bloodGlucose$Event,
 las=2,
 cex=0.5,
 spacing=0.8,
 xlab="",
 main="Daily Blood Glucose",
 ylab="Blood Glucose (mM)",
 pwcol=as.numeric(format(bloodGlucose$Date.Time,format="%H"))+24,
 pch=7) 
boxplot(as.numeric(bloodGlucose$Value) ~ bloodGlucose$Event,
 las=2,
 add=TRUE) 
colorlegend(col=myPalette,zlim=c(0,24),zval=seq(0,22,2),posx=c(0.9,0.93),posy=c(0.1,0.9))

One additional note about the commands above: I added an adjustment to the pwcol argument. It seems that white is used when a point has a value of '0' (midnight). The fix makes use of how the palette wraps back around when you go past the last color, which is 24, so I just add 24 to all the pwcol values. You may note that the highest BG value in 'Before Bed' is only colored red like it should be after this using this fix.


Details: Titles, Labels, and Axes

The default Y-axis only label 3 points, and I would like more. While he default axis labels in R often leave me wanting, R gives you full control to fix it. First we will make a plot that gets rid of the default Y-axis labels by using the plotting parameter 'yaxt' and setting it to 'n', which translates into English as: Y-AxisTicks=NO.

beeswarm(as.numeric(bloodGlucose$Value) ~ bloodGlucose$Event,
 yaxt="n",
 pwcol=as.numeric(format(bloodGlucose$Date.Time,format="%H"))+24,
 las=2,
 cex=0.5,
 spacing=0.8,
 xlab="",
 main="This is the Title",
 ylab="This is the Y-axis",
 pch=7) 

Now it's up to us which points get labeled. The command 'seq()' creates a list of numbers in the range between the first argument and the second argument, using the increment between numbers provided by the third argument. Once we have made that sequence of numbers, we can apply them to the existing plot with the command 'axis()'.

# create a Y axis that labels every integer
ylabels <- seq(floor(min(bloodGlucose$Value)),ceiling(max(bloodGlucose$Value)),1) 

ylabels 
 [1]  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 
axis(2,at=ylabels,las=1) 

Remember you can use the command 'help(axis)' to see what the parameters mean (also written as '?axis'). We now have a plot with every unit increase of bloodGlucose labeled. If you want to change the axis, you have to redo the plot, and then apply the changed axis command to the new plot.
For a type I diabetic, a generally recommended range of blood glucose values is between 5 and 7.2 mM [wikipedia]. To highlight this range on the plot, I want to make the background a slightly different color in that interval. Let's make everything below 5 and above 7.2 a gray background, while leaving the middle range white. Using a polygon draw command, we can change the colors in those specific areas. To do so, we need the points that mark the extremities of the plot. After a plot has been drawn, the corner points that mark the extremities of the plot are available in the parameter 'usr'. We can take these coordinates, and then plot two boxes: one with the upper corners of the plot and bounded by 7.2, and another with the lower corners of the plot and bounded by 5. This will add the polygon to the plot.

# list extremeties of the plot
par('usr') 
[1] -0.06000 15.06000  1.35832 17.44168 
plotCorners <- par('usr') 
# define gray with hexadecimal RGBA color code: '#RRGGBBAA'
bColor <- "#EEEEEEFF"
# target range of 5-7.2 (90-130 mg/dL)
polygon(c(plotCorners[1],plotCorners[2], plotCorners[2],plotCorners[1]), 
         c(plotCorners[3],plotCorners[3], 5,5), 
         col=bColor,lty=0) 
polygon(c(plotCorners[1],plotCorners[2], plotCorners[2],plotCorners[1]), 
         c(9,9,plotCorners[4],plotCorners[4]), 
         col=bColor,lty=0) 
polygon(c(plotCorners[1],plotCorners[2], plotCorners[2],plotCorners[1]), 
         c(plotCorners[3],plotCorners[3],plotCorners[4],plotCorners[4]),  
         col=NULL) 

The polygon covered our plot. We can replot the orginal plot over the top again to get the result that we want. We do this by adding the beeswarm plot on top of existing graph with the graphical parameter 'add=TRUE'.

beeswarm(as.numeric(bloodGlucose$Value) ~ bloodGlucose$Event,
 add=TRUE,
 yaxt="n",
 pwcol=as.numeric(format(bloodGlucose$Date.Time,format="%H"))+24,
 las=2,
 cex=0.5,
 spacing=0.8,
 xlab="",
 main="This is the Title",
 ylab="This is the Y-axis",
 pch=7) 

There is a lot of information in that plot. You could generate monthly or quarterly plots to see how things are going. Maybe it would make be easier to explain how you're doing to your doctor. My hope is that your data helps you understand what your body is doing, and that you are your own expert at what your body needs at any given time. Here are all the plotting commands together, with a slight change of style for the points, and some added text labeling the legend.

beeswarm(as.numeric(bloodGlucose$Value) ~ bloodGlucose$Event,
 yaxt="n",
 las=2,
 pwbg=as.numeric(format(bloodGlucose$Date.Time,format="%H"))+24,
 cex=0.6,spacing=0.8,
 xlab="",
 main="Daily Blood Glucose",
 ylab="Blood Glucose (mM)",
 pch=21)  
polygon(c(plotCorners[1],plotCorners[2], plotCorners[2],plotCorners[1]), 
        c(plotCorners[3],plotCorners[3], 5,5), 
        col=bColor,lty=0) 
polygon(c(plotCorners[1],plotCorners[2], plotCorners[2],plotCorners[1]), 
        c(9,9,plotCorners[4],plotCorners[4]), 
        col=bColor,lty=0) 
polygon(c(plotCorners[1],plotCorners[2], plotCorners[2],plotCorners[1]), 
         c(plotCorners[3],plotCorners[3],plotCorners[4],plotCorners[4]),  
         col=NULL) 
beeswarm(as.numeric(bloodGlucose[,2]) ~ bloodGlucose$Event,
 add=TRUE,
 yaxt="n",
 las=2,
 pwbg=as.numeric(format(bloodGlucose$Date.Time,format="%H"))+24,
 cex=0.6,
 spacing=0.8,
 xlab="",
 main="Daily Blood Glucose",
 ylab="Blood Glucose (mM)",
 pch=21)  
boxplot(as.numeric(bloodGlucose$Value) ~ bloodGlucose$Event,
 las=2,
 add=TRUE,
 cex=0.6) 
axis(2,at=ylabels,las=1) 
colorlegend(col=myPalette,zlim=c(0,24),zval=seq(0,22,2),posx=c(0.9,0.93),posy=c(0.1,0.9))
text(x=(par("usr")[1]+0.92*(par("usr")[2]-par("usr")[1])),
 y=(par("usr")[3]+0.95*(par("usr")[4]-par("usr")[3])),
 labels="Time(hr)") 

And there it is. Congratulations, you've done it. Best wishes for your data crunching.



Updated: 3/1/2014

No comments :

Post a Comment

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License .