Hi All,
While working on a project analyzing advertising clicks, we just put together a [R] S4 class to FIT, in a data stream (e.g., row-by-row) a finite mixture of logistic regression models. It can be found under Downloads. Here is an example of it’s use: the code below generates a dataset and plots it. In the final lines we run through the dataset row by row, and fit models with 1, 2, and 3 clusters:
## Usage examples: source("onlineMixtureLogistic.R") library(lattice) # Create a dataset: set.seed(12345) n <- 10e4 # Number of subjects k <- 2 # Number of predictors (including intercept) j <- 2 # Number of clusters pj <- c(.3, .7) # Cluster probabilties betas <- matrix( c( c(3 , -2.5), c(-2, 5) ), nrow=j, byrow=TRUE) X <- matrix(c(rep(1,n),runif((k-1) * n,-5,5)), ncol=k) cluster <- sample(1:j, n, TRUE, pj) y <- gen.mixture(X, betas, cluster) # Plot the dataset: library(lattice) xyplot(jitter(y) ~ X[,2], groups=cluster) # Inspect the elements: betas table(cluster) / sum(table(cluster)) # Instantiate object (predictors, clusters) oLM1 <- OnlineLogMixture(k,1) oLM2 <- OnlineLogMixture(k,2) oLM3 <- OnlineLogMixture(k,3) for(i in 1:nrow(X)){ oLM1 <- add.observation(oLM1, y[i], X[i,]) oLM2 <- add.observation(oLM2, y[i], X[i,]) oLM3 <- add.observation(oLM3, y[i], X[i,]) } summary(oLM1) summary(oLM2) summary(oLM3)