Friday, February 28, 2014

Getting a Look at Insulin Doses, using R

Type I diabetics need to manage a lot of information to keep their blood glucose levels healthy. Many have logs of their data, such as blood glucose concentrations and insulin doses. A large number of tools exist to help with this, such as Glucose Buddy. If you're logging the data yourself, you probably already know the trends in the data just from looking at each data point as it comes, but it can be harder to compare details across larger time intervals, say maybe if blood glucose values were more variable in the morning, from one month to the next. In this article I wanted to discuss a number of different ways to represent the insulin doses graphically. Then if you wanted to compare one month to the next, or maybe weekends to weekdays, you could make two of these graphs and hold them side-by-side, and hopefuly the differences will be self-evident.

Below is my favorite way of plotting the data, but I worked through a variety of alternatives while constructing the graph. The progression used a number of different plotting styles in R, and this article is the story of how to make them, including the one shown below.



To begin plotting your own data, you first need the free program R [R Project]. Then you need your data exported from Glucose Buddy, as a ".csv" file. If you have data from somewhere else, you would need to load it into R in a table with column names that correspond to the column headings in this sample Glucose Buddy log file [Example Data]. In an earlier post, there is more discussion about getting up and running with R and importing data. The commands below, run at the command line in R, should import the data from your log file, parse the date/time information, and extract insulin doses to a variable "medicine".

# load Glucose Buddy log
logFileName <- 'MyExportedGlucoseBuddyLogs.csv'
csv <- read.table(logFileName,sep=",",as.is=TRUE,header=TRUE)

# convert date and time character string to a time object
csv$Date.Time <- strptime(csv$Date.Time, format="%m/%d/%Y %H:%M:%S")

# extract only the insulin dosage info from the log file
medicine <- csv[csv$Type=="M",]

# convert hour of the day into a floating point value according to minutes
medicineHour <- as.numeric(format(medicine$Date.Time,format="%H")) +
                           as.numeric(format(medicine$Date.Time,format="%M"))/60

The last command listed above creates an array that represents the hours and minutes of the day as a single floating point value. With this data loaded, we can go straight to plotting. We start with a 'symbol' plot, using circles with diameters that correspond to the amount of insulin given at a particular time. The circles are plotted using transparency (i.e. color with an alpha channel) so that overlapping data points can be seen.

symbols(x=medicineHour,
       y=medicine$Value,
       circles=sqrt(medicine$Value/10),
       inches=FALSE,
       fg="#000000FF",
       bg="#5555FF55",
       xlab='Time of Day (hr)',
       ylab='Insulin Dose (units)',
       las=1,
       xaxp=c(0,24,8),
       yaxp=c(0,14,7),
       main="Insulin Dose by Time of Day")

axis(side=1,at=c(1:24),labels=rep("",24))
axis(side=2,at=c(1:14),labels=rep("",14))

plotCorners <- par('usr')
polygon(c(plotCorners[1],plotCorners[2], plotCorners[2],plotCorners[1]),
        c(plotCorners[3],plotCorners[3],plotCorners[4],plotCorners[4]),
        col=NULL)


If you want a description of the arguments to the 'symbols' command, you could pull up the help screen for it by typing 'help(symbols)' or '?symbols'. The added axis commands draw minor ticks without labels on the axes, and the polygon command redraws the plot perimeter, which is sometimes partially covered by the contents of the plot. The larger circles highlight where the larger doses are, and the transparency of the circles indicates where data points have stracked on top of each other. Still, I find it hard to compare the density of blue transparency across the plot.

Using 2D Histograms to Show Frequency

The overlapping transparent symbols are essentially emulating a 2D histogram, where the color at a given point represents the frequency that data point is observed. So let's use some of the 2D histogram plotting functions in R to see if they are an improvement. The next example uses 'hist2d' from the 'gplots' libary, which you may need to download. If you need directions on downloading an R library, you can find a few in this previous article. I also used the library 'RColorBrewer' to create the color gradient for the palette. Finally, because a few of the data points are very frequent while most of the rest are comparatively sparse, I re-weighted the color scheme by the cube root of the frequency, which is accomplished by the 'FUN' (function) argument. You could also not re-weight the data, but in this case I thought it brought out more of the details visually.

library(gplots)
library(RColorBrewer)
hist2d(medicineHour,y=medicine$Value,nbins=c(30,14),
       FUN=function(x){length(x)^0.33},
       col=c(colorRampPalette(brewer.pal(9,"Blues"))(15)),
       xlab='Time of Day (hr)',
       ylab='Insulin Dose (units)',
       las=1,
       xaxp=c(0,24,12),yaxp=c(0,14,7),
       main="Insulin Dose by Time of Day")

axis(side=1,at=c(1:24),labels=rep("",24))
axis(side=2,at=c(1:14),labels=rep("",14))

plotCorners <- par('usr')
polygon(c(plotCorners[1],plotCorners[2], plotCorners[2],plotCorners[1]),
        c(plotCorners[3],plotCorners[3],plotCorners[4],plotCorners[4]),
        col=NULL)


The 2D histogram still looks sparse, and some recurring insulin doses that are recorded as midnight dominate the plot. There are a number of data points in the middle of the day, but it is hard for the eye to integrate which regions really are more frequent than others. Some other plotting functions in R such as 'image' and 'contour' will plot data if you already have x, y, and z values, but they won't bin the data for us to make a histogram. To use these other functions, we can capture the histogram values that 'hist2d' generated, and then replot these values ourselves.

# capture histogram data and format as matrix
binOutput <- hist2d(medicineHour,
                         y=medicine$Value,nbins=c(30,14),
                         FUN=function(x){length(x)^0.33},
                         col=c(colorRampPalette(brewer.pal(9,"Blues"))(15)),
                         show=F)[1]
binOutput <- as.matrix(binOutput$counts)

# reset 'NA' values to 0
binOutput[is.na(binOutput)] <- 0

The argument 'show=F' instructed hist2d to not plot anything, and just return the histogram data. We convert the data to a 'matrix' type for use with 'image' and 'contour'. We also need set 'NA' values to 0, so these outliers don't mess up our color scheme. Now we are ready to plot with 'contour' and 'image'.

contour(binOutput,
      xaxs='i',
      yaxs='i',
      las=1,
      xlab='Time of Day (hr)',
      ylab='Insulin Dose (units)',
      main='Insulin Dose by Time of Day',
      drawlabels=FALSE)


The contour plot is a nice way to illustrate how the 2D histogram is a lot like a topographical map, where height represents frequency. The default values here are a great start, but I'd rather not have the sharp edges, since the data is approximating a probability distribution for insulin doses throughout the day, and probability distributions are smooth when there is substantial uncertainty, which we have here. One clever way to smooth out the plot without much work is to just use the command 'smoothScatter'. To do the smoothing, it takes two additional parameters (nbin = number of bins, bandwidth = smoothing diameter), but it doesn't require any other additional information.

smoothScatter(x=medicineHour,
      y=medicine$Value,
      nrpoints=0,
      transformation=function(x) x^0.65,
      colramp=colorRampPalette(c("blue","green","red",'purple')),
      cex=5,
      nbin=200,
      bandwidth=0.4,
      xlab='Time of Day (hr)',
      ylab='Insulin Dose (units)',
      xaxp=c(0,24,12),
      yaxp=c(c(0,16,8)),
      xaxs='i',
      yaxs='i',
      las=1,
      main='Insulin Dose by Time of Day')

axis(side=1,at=c(1:24),labels=rep("",24))
axis(side=2,at=c(1:14),labels=rep("",14))


The smoothing highlights the distribution of the points, while de-emphasizing where the individual points actually were. The command 'smoothScatter' actually uses the command 'bkde2D'. Let's run our data through 'bkde2D' ourselves, then we can plot the smoothed data a couple of different ways.

# make a table with hour of the day as X, and insulin dose as Y
medicineData <- cbind(medicineHour,y=medicine$Value)

library(KernSmooth)
smoothOutput <- bkde2D(medicineData,
                               bandwidth=c(1.7,1),
                               gridsize=c(100,100),
                               range.x=list(c(0,24),c(0,15)))

Our previous contour plot had sharp edges, but since the smoothed data set is interpolated to have many more data points, we will get nice smooth contour lines from the new smoothed data.

contour(x=smoothOutput$x1,
      y=smoothOutput$x2,
      z=smoothOutput$fhat^0.5,
      drawlabels=FALSE,
      nlevels=10,
      xaxp=c(0,24,12),
      yaxp=c(c(0,16,8)),
      xaxs='i',
      yaxs='i',
      las=1,
      main='Insulin Dose by Time of Day',
      xlab='Time of Day (hr)',
      ylab='Insulin Dose (units)')

axis(side=1,at=c(1:24),labels=rep("",24))
axis(side=2,at=c(1:14),labels=rep("",14))


The smooth lines make the global distribution of insulin doses easier to see, without getting bogged down in the details. You could adjust the 'bandwidth' and 'gridsize' arguments to 'bkde2D' to see more or less features in the data. Sticking with these values, we could use 'persp' for a 3D perspective of this contour plot.

persp(smoothOutput$fhat,
      phi=25,
      theta=-35,
      expand=0.5,
      xlab='Time of Day (hr)',
      ylab='Insulin Dose (units)',
      zlab='Frequency',
      xaxp=c(0,24,12),
      yaxp=c(c(0,16,8)),
      main='Insulin Dose by Time of Day')


The 3D plot is interesting, but I don't think it is more informative, since the perspective actually hides some of the data. To finish up our sightseeing tour of R graphs, we'll return to the 2D histogram, but this time plotting our smoothed values directly with 'image'. We'll add a legend from the library 'shape', starting with a rainbow color scheme.

library(shape)

# set the color palette
myPalette <- rainbow(32)

# leave space for color legend
par(fig=c(0,0.85,0,1))

# plot the 2D histogram from smoothed values
image(x=smoothOutput$x1,
      y=smoothOutput$x2,
      z=smoothOutput$fhat^0.5,
      col=myPalette,
      xlab='Time of Day (hr)',
      ylab='Insulin Dose (units)',
      las=1,
      xaxp=c(0,24,12),
      yaxp=c(c(0,16,8)),
      main='Insulin Dose by Time of Day')
axis(side=1,at=c(1:24),labels=rep("",24))
axis(side=2,at=c(1:14),labels=rep("",14))
plotCorners <- par('usr')
polygon(c(plotCorners[1],plotCorners[2], plotCorners[2],plotCorners[1]),
      c(plotCorners[3],plotCorners[3],plotCorners[4],plotCorners[4]),
      col=NULL)

# add in a color figure legend at the right
par(fig=c(0.85,1.00,0,1))
colorlegend(col=myPalette,
        zlim=c(0,max(medicineData[,2])),
        zval=c(-1000),
        posx=c(0,0.3),
        posy=c(0.1,0.85))

# manually add some labels, since colorlegend only does numeric labels.
text(x=0.85*mean(par("usr")[1:2]),
        y=(par("usr")[3]+0.95*(par("usr")[4]-par("usr")[3])),
        labels="Frequency")
text(x=0.65*mean(par("usr")[1:2]),
        y=(par("usr")[3]+0.85*(par("usr")[4]-par("usr")[3])),
        labels="Often",pos=4)
text(x=0.65*mean(par("usr")[1:2]),
        y=(par("usr")[3]+0.1*(par("usr")[4]-par("usr")[3])),
        labels="Never",pos=4)


The colored contour lines really highlight the distribution of the doses, but to me, red just doesn't feel like it should be the lowest value. In the rainbow palette, red is also the highest value as the palette loops back on itself, potentially creating some confusion. Let's create our own palette, going from colors that feel cold to colors that feel hot. For good measure, we increase the palette size from 32 to 64, and then just re-use the code from above, except of course where the 'myPalette' was defined.

# set the new custom palette (using RColorBrewer)
myPalette <- colorRampPalette(c('black','blue','cyan','green','yellow','red'))(64)

# repeat the code above, leaving out the original definition of 'myPalette'


There it is, an approximation of the probability distribution of insulin doses by various times of day, given whatever data you decided to throw at it. It's a nice way to look at the numbers, without directly looking at the numbers.



Updated: 3/1/2014

No comments :

Post a Comment

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License .