Chapter 4 Variable Selection

4.1 Functiondal regression with points of impact

4.1.1 State of Art

Nonparametric variable selection approach (NOVAS). NOVAS is quite expensive from a computational perspective, Frédéric Ferraty, Hall, and Vieu (2010).
A wavelet-based weighted LASSO functional linear (FWLASSO). FWLASSO requires transforming the original variables and assuming a linear model, Zhao, Chen, and Ogden (2015).
Berrendero, Cuevas, and Torrecilla (2016) use the Maxima-hunting proposal to choose the most relevant design points in functional classification setting.

4.1.2 Local maxima distance correlation approach (LMDC), (Ordóñez et al. 2018)

In this work we study the utility of distance correlation G. J. Székely, Rizzo, and Bakirov (2007) as an intrinsic method for variable selection.
Neither projection nor transformation of the variables is needed. Moreover, it is unnecessary to assume an a priori regression model.
LMDC approach consists in calculating the local maxima of the distance correlation along the curve.

4.1.3 LMDC Algorithm: `LMDC.select()` function

Calculate de distance correlation (DC) \(R(t) = \left \lbrace R(X(t_j),Y) \right\rbrace {_{j=1}^N}\), from the data \(\left \lbrace X_i (t_j),Y_i \right\rbrace _{i=1}^n\).
Calculate the LM of the \(\hat{\mathcal{R}} (t)\). Only the significant local maxima for a default level of significance are selected. Denoting the arguments values (argvals) of the local maxima a \(\tilde t_1,\tilde t_2,\ldots,\tilde t_{\tilde{N}}\) (\(\tilde{N}<N\)), we ordered them from highest to lowest values of DC, that is \(\hat{\mathcal{R}}(\tilde t_1) \geq \hat{ \mathcal{R}}(\tilde t_2) >\ldots \geq \hat {\mathcal{R}}(\tilde t_{\tilde{N}})\)

library(fda.usc)
pred2gsam <- fda.usc.devel:::pred2gsam 
predict.classif <-  fda.usc.devel:::predict.classif
predict.fregre.gsam  <-  fda.usc.devel:::predict.fregre.gsam

data(tecator)
X.d2<-fdata.deriv(tecator[["absorp.fdata"]],
nderiv = 2)
colnames(X.d2[["data"]])<-paste0("X",round(X.d2[["argvals"]]))
dat <- data.frame("y"=tecator[["y"]][["Fat"]],X.d2[["data"]] )
tol<-.2
dc.raw <- LMDC.select("y",data = dat, tol = tol,pvalue = 0.05,
plot=F)
# Preselected impact points 
covar<-names(dat)[-1][dc.raw[["maxLocal"]]]
covar

##  [1] "X933"  "X1046" "X907"  "X886"  "X896"  "X1010" "X1020" "X1030" "X945" 
## [10] "X876"  "X915"  "X862"  "X993"

length(covar)

## [1] 13

4.1.4 LMDC Algorithm: `LMDC.regre()` function

(Optionally) Check if the relationship between the reponse and the predictor variables is linear: \(H_0:\,Y=\big<X,\beta\big>+\epsilon\), versus a general alternative using a test of linearity proposed in Garcı́a-Portugués, González-Manteiga, and Febrero-Bande (2014).

ftest<-flm.test(dat[,-1], dat[,"y"],
verbose=F,plot.it=F)
ftest

## 
##  PCvM test for the functional linear model using optimal PLS basis
##  representation
## 
## data:  Y=<X,b>+e
## PCvM statistic = 216.66, p-value < 2.2e-16

ftest[["p.value"]]

## [1] 0

Fit a regression model to the response of interest \(Y\) using the vector of covariates \(X(\tilde{t})=\{X(\tilde t_1), \ldots, X(\tilde{t}_{\tilde{N}})\}\). A linear model will be used if the null hypothesis is not rejected and a nonparametric (e.g. generalized additive model) model otherwise.
(Optionally) Once the type model has been selected, we propose to Apply a forward stepwise regression method to determine the significant covariates, taking advantage of the fact that the local maxima have been ordered. This means we start with a model with the first covariate (the one with the highest value of distance correlation), and the rest of the ordered covariates are added to the model in turn. This substantially reduces the computing time.

if (ftest$p.value > 0.05) { # Linear relationship, step-wise lm is recommended
out <- LMDC.regre(y = "y", covar = covar, data = dat, pvalue=.05, method ="lm")
} else {# Non-Linear relationship, step-wise gam is recommended
out <- LMDC.regre(y = "y", covar = covar, data = dat,pvalue=.05, method ="gam")}  
out

## $model
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## y ~ s(X933, k = 4) + s(X1046, k = 4) + s(X907, k = 4) + s(X886, 
##     k = 4) + s(X896, k = 4) + s(X1010, k = 4) + s(X1020, k = 4) + 
##     s(X1030, k = 4) + s(X945, k = 4) + s(X876, k = 4) + s(X915, 
##     k = 4) + s(X993, k = 4)
## 
## Estimated degrees of freedom:
## 3.00 2.55 2.83 2.00 1.00 1.00 2.94 
## 2.92 2.81 2.57 1.00 2.85  total = 28.48 
## 
## GCV score: 0.4403176     
## 
## $xvar
##  [1] "X933"  "X1046" "X907"  "X886"  "X896"  "X1010" "X1020" "X1030" "X945" 
## [10] "X876"  "X915"  "X993" 
## 
## $pred
## NULL
## 
## $edf
## [1] 28.48142
## 
## $nvar
## [1] 12

Differences in mean square prediction error between linear (usign lm model) and non-linear (usign gam model) model

out <- LMDC.regre(y = "y", covar = covar, data = dat[1:165,],newdata=dat[166:215,], pvalue=.05, method ="lm")
mean((out$pred-dat$y[166:215])^2)

## [1] 11.75474

out <- LMDC.regre(y = "y", covar = covar, data = dat[1:165,],newdata=dat[166:215,], pvalue=.05, method ="gam")
mean((out$pred-dat$y[166:215])^2)

## [1] 1.148573

Binary classification example (Impact point selection, model estimation and prediction)

data(tecator)
X.d2<-fdata.deriv(tecator[["absorp.fdata"]],
nderiv = 2)
colnames(X.d2[["data"]])<-paste0("X",round(X.d2[["argvals"]]))
y2groups <- ifelse(tecator[["y"]][["Fat"]]<12,0,1)
dat <- data.frame("y2groups"=y2groups,X.d2[["data"]] )
tol<-.1
dc.raw <- LMDC.select("y2groups",data = dat, tol = tol,pvalue = 0.05,
plot=F)
# Preselected impact points 
covar<-names(dat)[-1][dc.raw[["maxLocal"]]]
covar

##  [1] "X945"  "X905"  "X933"  "X886"  "X1020" "X876"  "X1030" "X896"  "X1046"
## [10] "X862"  "X1010" "X915"  "X965"

length(covar)

## [1] 13

# GLM model (using binomial family), other multivariate model can be used
ind <- 1:129
ldata<-list("df"=dat[ind,])
form.glm<-formula(paste0("y2groups~",paste0(covar,collapse="+")))
out.glm  <- classif.glm(form.glm, data = ldata)
summary(out.glm)

##      - SUMMARY - 
## 
## -Probability of correct classification by group (prob.classification):
## 0 1 
## 1 1 
## 
## -Confusion matrix between the theoretical groups (by rows)
##   and estimated groups (by column) 
##    
##      0  1
##   0 58  0
##   1  0 71
## 
## -Probability of correct classification:  1

# Prediction
newldata<-list("df"=dat[-ind,])
pred.glm<-predict(out.glm,newldata)

# Confusion matrix
table(newldata$df$y2groups,pred.glm)

##    pred.glm
##      0  1
##   0 40  0
##   1  3 43

4.2 Variable selection in functional regression

Febrero-Bande, González-Manteiga, and Oviedo de la Fuente (2019) consider the problem of variable selection in regression models in the case of functional variables that may be mixed with other type of variables (scalar, multivariate, directional, etc.).

Our proposal begins with a simple null model and sequentially selects a new variable to be incorporated into the model based on the use of distance correlation proposed by (G. J. Székely, Rizzo, and Bakirov 2007). For the sake of simplicity, this paper only uses additive models.

\[ Y_i=\alpha+\sum_{j=1}^Jf_j({X_i^{(j)}})+\varepsilon_i,\quad i=1,\ldots,N \]

The proposed algorithm may assess the type of contribution (linear, non linear, …) of each variable. The algorithm has shown quite promising results when applied to simulations and real data sets.

4.2.1 State of Art

Stepwise regression, Akaike (1973). The main idea is to use some diagnostic tools, directly derived from the linear model, to evaluate the contribution of a new covariate and decide whether it should be included in the model. The final subset is usually constructed using: the forward and/or the backward selection.
Feature Selection using LASSO. The work by Tibshirani, 1996 proposing the LASSO estimator includes a \(l_1\)-type constraint for the coefficient vector \(\beta\). Several examples following the same line but using penalties or constraints such as: LARS (Efron et al. (2004)) and COSSO (Lin, Zhang, et al. (2006)). Each methods is based on a specific model, all the covariates must be included in the model at the same time and for functional data problems, the previous steps that commonly include variable standardization.
Berrendero, Cuevas, and Torrecilla (2016) use the Minimum Redundance Maximum Relevance (mRMR) procedure to choose the most relevant design points in functional classification setting.
A pure feature selection methods where the covariate is selected without a model. This is the approach employed in minimum Redundancy Maximum Relevance (mRMR), (Peng, Long, and Ding (2005)) where a new candidate covariate must have a great relevancy with the response while maintaining a lower redundancy with the covariates already selected in the model. he main advantage of this approach is that it is an incremental rule but the measures for redundancy and relevancy must be chosen in function of the regression model applied to ensure good predictive results in the final model. Berrendero, Cuevas, and Torrecilla (2018) used the Reproducing Kernel Hilbert Space (RKHS) for variable selection in FLM.
Boosting, see F. Ferraty and Vieu (2009) in a functional data context. Boosting selects at each step the best covariate/model with respect to the unexplained part of the response. The final prediction is constructed as a combination of the different steps.
Partial distance correlation (PDC): used in Yenigün and Rizzo (2015) for VS in multivariate linear models, a definition of PDC among \(X\) and \(Y\) given \(Z\) was introduced based on computing the distance correlation among the residuals of two models: \(Y\) respect to \(Z\) and \(X\) respect to \(Z\). PDC is constructed under linear relationship assumptions among variables. Its implementation only uses the distance matrices among elements of \(X\), \(Y\) and \(Z\) (variables should have a similar scale).

Specifically, \(Z\) (the variables already in the model) could be a mix of functional, scalar or multivariate variables where an appropriate distance using all of them must be hard to compute. Even restricting ourselves to the scalar case, those variables should have a similar scale.

4.2.2 Algorithm

All the previous solutions are not completely satisfactory in a functional data framework, specially when the number of possible covariates can be arbitrarily large. We are interested in an automatic regression procedure capable of dealing with a large number of covariates of different nature, possibly very closely related to one another.

The key of the whole procedure is the extensive use of the DC that presents two important advantages: the choice of the variate is made without considering a model and it is possible to compute this quantity for variates of different nature as it is only computed from distances. The distance correlation (DC) is computed among the residuals of the current model with each candidate. Taking into account that the residuals have the same nature as the response variable, the DC can always be computed at each step.

Our proposal is presented is a very general way, we have restricted ourselves to additive models that offer a balanced compromise between predictive ability and simplicity. The obtained results are quite promising in scenarios where no competitors are available because no other procedure can deal with variates of different nature in a homogeneous way.

The procedure was applied to a real problem related with the Iberian Energy Market (Price and Demand) where the number of possible covariates is really big. The algorithm was able to find synthetic regression models offering interesting insights about the relationship among the response and the covariates. The final selected models mix functional, scalar and categorical information.

Our algorithm can be formalized as follows:

Let \(Y\) the response and \(S=\{X^1,\ldots,X^p\}\) the set of all possible predictors.
Set \(\hat{Y}=\bar{Y}\), and let \(M^{(0)}=\emptyset\) the initial set of the variates included in the model. Set \(i=0\).
Compute the residuals of the current model: \(\hat{\varepsilon}=Y-\hat{Y}\).
Choose \(X^j\in S\) such that: 1) \(\mathcal{R}\{\hat{\varepsilon},X^j\}\ge \mathcal{R}\{\hat{\varepsilon},X^k\}, \forall k\ne j\in S\) and 2) the null hypothesis for the test of independence among \(\left\{X^j\right\}\) and \(\hat{\varepsilon}\) is rejected. IF NOT, END.
Update the sets \(M\) and \(S\): \(M^{(i+1)}=M^{(i)}\cup\{X^j\}\), and \(S=S\backslash\{X^j\}\).
Compute the new model for \(Y\) using \(M^{(i+1)}\) choosing the best contribution of the new covariate. Typically, there will be a catalog of all possible ways of constructing correct models with the variates in \(M^{(i+1)}\) fixing the contributions of the variates in \(M^{(i)}\) and adding the new one.
Analyze the contribution of \(X^j\) in the new model respect to the current:

IF this contribution is not relevant (typically comparing with the current model) THEN \(M^{(i+1)}=M^{(i+1)}\backslash\{X^j\}\) and the current model remains unalterable
ELSE the new model becomes the current model and provides new predictions (\(\hat{Y}\)). Along the paper we have employed an additive model: \(\hat{Y}=\bar{Y}+\sum_{m\in M}\hat{f}_m\left(X^{(m)}\right)\) where at each step \(\hat{f}_m\) could be linear or nonlinear.

Update the number of iterations: \(i=i+1\) and go to 3
END.

The current model is the final model with the variates included in \(M^{(i)}\). \(S\) is either the empty set or contains those variables that accept the null hypothesis of the test of independence respect to the residuals of the current model.

4.3 Binary classification example

Nt=250
Np=250
nB=100
Nvar <- 50
set.seed(1)
Xdat <- mariachi(Nt,Np,Nvar)
Xtrain <- Xdat$Xtrain
Xtest <- Xdat$Xtest
ytrain <- Xtrain$grupo
ytest <- Xtest$grupo
pairs(Xtrain[,2:6],col=ytrain)

mdat <- Xtrain[,1:Nvar]
newmdat <- Xtest[,1:Nvar]
names(mdat)

##  [1] "grupo" "X1"    "X2"    "Z1"    "Z2"    "Z3"    "Z4"    "Z5"    "Z6"   
## [10] "Z7"    "Z8"    "Z9"    "Z10"   "Z11"   "Z12"   "Z13"   "Z14"   "Z15"  
## [19] "Z16"   "Z17"   "Z18"   "Z19"   "Z20"   "Z21"   "Z22"   "Z23"   "Z24"  
## [28] "Z25"   "Z26"   "Z27"   "Z28"   "Z29"   "Z30"   "Z31"   "Z32"   "Z33"  
## [37] "Z34"   "Z35"   "Z36"   "Z37"   "Z38"   "Z39"   "Z40"   "Z41"   "Z42"  
## [46] "Z43"   "Z44"   "Z45"   "Z46"   "Z47"

meas <- "accuracy"
library(randomForest)
rf1 <- randomForest(grupo~.,data=mdat)
varImpPlot(rf1)

pred.rf1 <- predict(rf1,newmdat,measure=meas)
cat2meas(ytest,pred.rf1)

## accuracy 
##    0.672

table(ytest,pred.rf1)

##      pred.rf1
## ytest  1  2
##     1 95 14
##     2 68 73

rf.vs <- fda.usc.devel:::classif.ML.vs(ldata(mdat),"grupo",classif="classif.randomForest")
summary(rf.vs)

##      - SUMMARY - 
## 
## -Probability of correct classification by group (prob.classification):
## [1] 0.9185185 0.8869565
## 
## -Confusion matrix between the theoretical groups (by rows)
##   and estimated groups (by column) 
##    
##       1   2
##   1 124  11
##   2  13 102
## 
## -Probability of correct classification:  0.904

rf.vs$i.predictor

##  X1  X2  Z1  Z2  Z3  Z4  Z5  Z6  Z7  Z8  Z9 Z10 Z11 Z12 Z13 Z14 Z15 Z16 Z17 Z18 
##   1   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 
## Z19 Z20 Z21 Z22 Z23 Z24 Z25 Z26 Z27 Z28 Z29 Z30 Z31 Z32 Z33 Z34 Z35 Z36 Z37 Z38 
##   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 
## Z39 Z40 Z41 Z42 Z43 Z44 Z45 Z46 Z47 
##   0   0   0   0   0   0   0   0   0

rf.vs$ipredictor

## [1] "X2" "X1"

head(round(rf.vs$dcor,3))

##      X1    X2 Z1 Z2 Z3 Z4 Z5 Z6 Z7 Z8 Z9 Z10  Z11 Z12 Z13   Z14 Z15 Z16 Z17
## 1 0.016 0.018  0  0  0  0  0  0  0  0  0   0 0.01   0   0 0.010   0   0   0
## 2 0.032 0.000  0  0  0  0  0  0  0  0  0   0 0.00   0   0 0.000   0   0   0
## 3 0.000 0.000  0  0  0  0  0  0  0  0  0   0 0.00   0   0 0.011   0   0   0
## 4 0.000 0.000  0  0  0  0  0  0  0  0  0   0 0.00   0   0 0.011   0   0   0
## 5 0.000 0.000  0  0  0  0  0  0  0  0  0   0 0.00   0   0 0.000   0   0   0
## 6 0.000 0.000  0  0  0  0  0  0  0  0  0   0 0.00   0   0 0.000   0   0   0
##     Z18 Z19 Z20 Z21   Z22 Z23 Z24 Z25 Z26 Z27 Z28 Z29 Z30  Z31 Z32 Z33 Z34 Z35
## 1 0.015   0   0   0 0.013   0   0   0   0   0   0   0   0 0.00   0   0   0   0
## 2 0.000   0   0   0 0.000   0   0   0   0   0   0   0   0 0.01   0   0   0   0
## 3 0.000   0   0   0 0.000   0   0   0   0   0   0   0   0 0.00   0   0   0   0
## 4 0.000   0   0   0 0.000   0   0   0   0   0   0   0   0 0.00   0   0   0   0
## 5 0.000   0   0   0 0.000   0   0   0   0   0   0   0   0 0.00   0   0   0   0
## 6 0.000   0   0   0 0.000   0   0   0   0   0   0   0   0 0.00   0   0   0   0
##   Z36   Z37 Z38 Z39 Z40 Z41 Z42 Z43 Z44 Z45 Z46 Z47
## 1   0 0.000   0   0   0   0   0   0   0   0   0   0
## 2   0 0.000   0   0   0   0   0   0   0   0   0   0
## 3   0 0.016   0   0   0   0   0   0   0   0   0   0
## 4   0 0.000   0   0   0   0   0   0   0   0   0   0
## 5   0 0.000   0   0   0   0   0   0   0   0   0   0
## 6   0 0.000   0   0   0   0   0   0   0   0   0   0

pred.rf.vs <- predict(rf.vs,ldata(newmdat))
cat2meas(ytest,pred.rf.vs)

## accuracy 
##    0.896

table(ytest,pred.rf.vs)

##      pred.rf.vs
## ytest   1   2
##     1 103   6
##     2  20 121

predictors <- names(mdat)[-1]
form <- formula(paste0("grupo~",paste0("s(",predictors,")",collapse="+")))
form

## grupo ~ s(X1) + s(X2) + s(Z1) + s(Z2) + s(Z3) + s(Z4) + s(Z5) + 
##     s(Z6) + s(Z7) + s(Z8) + s(Z9) + s(Z10) + s(Z11) + s(Z12) + 
##     s(Z13) + s(Z14) + s(Z15) + s(Z16) + s(Z17) + s(Z18) + s(Z19) + 
##     s(Z20) + s(Z21) + s(Z22) + s(Z23) + s(Z24) + s(Z25) + s(Z26) + 
##     s(Z27) + s(Z28) + s(Z29) + s(Z30) + s(Z31) + s(Z32) + s(Z33) + 
##     s(Z34) + s(Z35) + s(Z36) + s(Z37) + s(Z38) + s(Z39) + s(Z40) + 
##     s(Z41) + s(Z42) + s(Z43) + s(Z44) + s(Z45) + s(Z46) + s(Z47)

ldat <- ldata(mdat)
# gsam <- classif.gsam(form,data=ldat)
# Error in gam(formula = as.formula(pf), data = XX, family = family) : Model has more coefficients than data

form <- formula(paste0("grupo~",paste0("s(",predictors,",k=5)",collapse="+")))
ldat <- ldata(mdat)
gsam <- classif.gsam(form,data=ldat)
pred.gsam <- predict(gsam,ldata(newmdat))
cat2meas(ytest,pred.gsam)

## accuracy 
##    0.896

table(ytest,pred.gsam)

##      pred.gsam
## ytest   1   2
##     1  97  12
##     2  14 127

gsam.vs <- classif.gsam.vs(ldata(mdat),"grupo")
summary(gsam.vs)

##      - SUMMARY - 
## 
## -Probability of correct classification by group (prob.classification):
## 1 2 
## 1 1 
## 
## -Confusion matrix between the theoretical groups (by rows)
##   and estimated groups (by column) 
##    
##       1   2
##   1 135   0
##   2   0 115
## 
## -Probability of correct classification:  1

gsam.vs$i.predictor

##  X1  X2  Z1  Z2  Z3  Z4  Z5  Z6  Z7  Z8  Z9 Z10 Z11 Z12 Z13 Z14 Z15 Z16 Z17 Z18 
##   1   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 
## Z19 Z20 Z21 Z22 Z23 Z24 Z25 Z26 Z27 Z28 Z29 Z30 Z31 Z32 Z33 Z34 Z35 Z36 Z37 Z38 
##   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 
## Z39 Z40 Z41 Z42 Z43 Z44 Z45 Z46 Z47 
##   0   0   0   0   0   0   0   0   0

gsam.vs$ipredictor

## [1] "X2" "X1"

pred.gsam.vs <- predict(gsam.vs,ldata(newmdat))
cat2meas(ytest,pred.gsam.vs)

## accuracy 
##    0.984

table(ytest,pred.gsam.vs)

##      pred.gsam.vs
## ytest   1   2
##     1 106   3
##     2   1 140

4.4 Hyperspectral images example

During the last years, the use of hyperspectral sensors has been extended to a great variety of applications such as discrimination among different land cover classes in remote sensing images.

4.4.1 Hyperspectral Data set: Pavia University

This is a remote sensing image obtained by the 103-band ROSIS sensor from the University of Pavia (Pavia Univ.), with a spatial dimension of 610 × 340 pixels

4.4.2 Hyperspectral images

In the University of Pavia image, 300 pixels for each class were chosen for training, and the rest were used as a test set.

1     2     3     4     5     6     7     8     9

6631 18649 2099 3064 1345 5029 1330 3682 947

[1] “Training”

1 2 3 4 5 6 7 8 9

300 300 300 300 300 300 300 300 300

[1] “Test”

yy

1 2 3 4 5 6 7 8 9

797 2308 244 324 119 589 117 424 78

names<-c(“Asphalt”,“Meadows”,“Gravel”,“Trees”,“Metal sheets”,“Bare Soil”,“Bitumen”,“Bricks”,“Shadows”)







``` r
# scalar covariates
names(ltest$df);
[1] "y"     "icol"  "irow"  "z0"    "z0.1"  "z0.2"  "z0.29" "z0.39" "z0.49" "z0.59" "z0.69" "z0.78"
[13] "z0.88" "z0.98"
# functional covariates 
names(ltest)[-1]
[1] "x"    "x.d1"
gsam1 <- classif.gsam(y ~ icol+irow ,data=ltrain2)
There were 18 warnings (use warnings() to see them)
summary(gsam1)
     - SUMMARY - 

-Probability of correct classification by group (prob.classification):
   1    2    3    4    5    6    7    8    9 
0.00 0.47 0.62 0.00 0.99 0.92 0.72 0.34 0.01 

-Confusion matrix between the theoretical groups (by rows)
  and estimated groups (by column) 
   
     1  2  3  4  5  6  7  8  9
  1  0 12 11  0 23 43  0  9  2
  2  0 47 11  0 23 19  0  0  0
  3  0  0 62  0  0  0  0 38  0
  4  0 19 14  0 29 27  0  9  2
  5  1  0  0  0 99  0  0  0  0
  6  2  0  0  6  0 92  0  0  0
  7 11  0  0 12  0  0 72  0  5
  8  0 20 21  0 15  0  4 34  6
  9  2  7 38 14 14 11  7  6  1

-Probability of correct classification:  0.4522 

pred.gsam1 <- predict(gsam1,ltest)
cat2meas(ytest,pred.gsam1)
[1] 0.439
table(ytest,pred.gsam1)
     pred.gsam1
ytest   1   2   3   4   5   6   7   8   9
    1   0  61  52   0 121 174   0  22  16
    2   0 639 186   0 321 168  16   2   0
    3   0   0  92   0   0   0   0  51   0
    4   0  38  17   0  74  53   3  21   8
    5   1   0   0   0 103   0   0   0   0
    6   5   0   0  37   0 325   0   0   0
    7  12   0   0  16   0   0  68   0   1
    8   0  49  39   0  24   0   9  89  14
    9   6   5  11   6  11  10  14   9   1

gsam1 <- classif.gsam(y ~ s(icol) + s(irow) ,data=ltrain2)
summary(gsam1)
     - SUMMARY - 

-Probability of correct classification by group (prob.classification):
   1    2    3    4    5    6    7    8    9 
0.00 0.47 0.62 0.00 0.99 0.92 0.72 0.34 0.01 

-Confusion matrix between the theoretical groups (by rows)
  and estimated groups (by column) 
   
     1  2  3  4  5  6  7  8  9
  1  0 12 11  0 23 43  0  9  2
  2  0 47 11  0 23 19  0  0  0
  3  0  0 62  0  0  0  0 38  0
  4  0 19 14  0 29 27  0  9  2
  5  1  0  0  0 99  0  0  0  0
  6  2  0  0  6  0 92  0  0  0
  7 11  0  0 12  0  0 72  0  5
  8  0 20 21  0 15  0  4 34  6
  9  2  7 38 14 14 11  7  6  1

-Probability of correct classification:  0.4522 

pred.gsam1 <- predict(gsam1,ltest)
cat2meas(ytest,pred.gsam1)
[1] 0.439
table(ytest,pred.gsam1)
     pred.gsam1
ytest   1   2   3   4   5   6   7   8   9
    1   0  61  52   0 121 174   0  22  16
    2   0 639 186   0 321 168  16   2   0
    3   0   0  92   0   0   0   0  51   0
    4   0  38  17   0  74  53   3  21   8
    5   1   0   0   0 103   0   0   0   0
    6   5   0   0  37   0 325   0   0   0
    7  12   0   0  16   0   0  68   0   1
    8   0  49  39   0  24   0   9  89  14
    9   6   5  11   6  11  10  14   9   1
 
gsam1 <- classif.gsam(y ~ s(x) ,data=ltrain2)
summary(gsam1)
     - SUMMARY - 

-Probability of correct classification by group (prob.classification):
   1    2    3    4    5    6    7    8    9 
0.62 0.78 0.64 0.97 1.00 0.25 0.82 0.71 1.00 

-Confusion matrix between the theoretical groups (by rows)
  and estimated groups (by column) 
   
      1   2   3   4   5   6   7   8   9
  1  62   0   4   0   0   2  27   5   0
  2   0  78   3  13   0   6   0   0   0
  3   2   1  64   0   0   0   7  26   0
  4   0   2   0  97   0   1   0   0   0
  5   0   0   0   0 100   0   0   0   0
  6   0  56   1   2   0  25   0  16   0
  7  15   0   3   0   0   0  82   0   0
  8   1   1  12   0   0   6   9  71   0
  9   0   0   0   0   0   0   0   0 100

-Probability of correct classification:  0.7544 

pred.gsam1 <- predict(gsam1,ltest)
cat2meas(ytest,pred.gsam1)
[1] 0.7016667
table(ytest,pred.gsam1)
     pred.gsam1
ytest    1    2    3    4    5    6    7    8    9
    1  281    2   36    0    2    3  111   11    0
    2    0 1001   27  186    0  116    0    2    0
    3    4    1  103    0    0    0    7   28    0
    4    0    7    0  207    0    0    0    0    0
    5    0    0    0    0  104    0    0    0    0
    6    1  215   11    5    0   90    0   45    0
    7    8    1    1    0    0    0   87    0    0
    8    2    3   32    0    0   11   17  159    0
    9    0    0    0    0    0    0    0    0   73

gsam1 <- classif.gsam(y ~ s(icol) + s(irow) + s(x)  ,data=ltrain2)
summary(gsam1)
     - SUMMARY - 

-Probability of correct classification by group (prob.classification):
   1    2    3    4    5    6    7    8    9 
0.49 0.80 0.75 0.94 1.00 0.78 0.84 0.74 1.00 

-Confusion matrix between the theoretical groups (by rows)
  and estimated groups (by column) 
   
      1   2   3   4   5   6   7   8   9
  1  49   0  10   0   0   7  30   4   0
  2   0  80   0  10   0  10   0   0   0
  3   0   1  75   0   0   0   0  24   0
  4   0   4   0  94   0   2   0   0   0
  5   0   0   0   0 100   0   0   0   0
  6   1  16   0   2   0  78   0   3   0
  7  16   0   0   0   0   0  84   0   0
  8   0   2  20   0   0   0   4  74   0
  9   0   0   0   0   0   0   0   0 100

-Probability of correct classification:  0.8156 

pred.gsam1 <- predict(gsam1,ltest)
cat2meas(ytest,pred.gsam1)
[1] 0.7766667
table(ytest,pred.gsam1)
     pred.gsam1
ytest    1    2    3    4    5    6    7    8    9
    1  236    1   46    0    2   43  115    3    0
    2    0 1026    1  192    0  112    0    1    0
    3    0    1  121    0    0    0    1   20    0
    4    0    6    0  207    0    1    0    0    0
    5    0    0    0    0  104    0    0    0    0
    6    2   44    0    5    0  308    0    8    0
    7    6    1    0    0    0    0   90    0    0
    8    4    3   36    0    0    0   16  165    0
    9    0    0    0    0    0    0    0    0   73

gsam.vs <- classif.gsam.vs(ltrain2,"y")

gsam.vs$i.predictor
 icol  irow    z0  z0.1  z0.2 z0.29 z0.39 z0.49 z0.59 z0.69 z0.78 z0.88 z0.98     x  x.d1 
    1     1     0     0     0     0     0     0     0     1     0     0     0     1     1 
gsam.vs$ipredictor
[1] "x"     "z0.69" "icol"  "x.d1" "irow"
round(gsam.vs$dcor,2)
   icol irow   z0 z0.1 z0.2 z0.29 z0.39 z0.49 z0.59 z0.69 z0.78 z0.88 z0.98    x x.d1
1  0.20 0.18 0.33 0.43 0.44  0.44  0.42  0.37  0.36  0.38  0.42  0.43  0.43 0.58 0.43
2  0.07 0.06 0.04 0.06 0.07  0.07  0.07  0.06  0.06  0.08  0.06  0.06  0.06 0.00 0.05
3  0.07 0.06 0.04 0.06 0.06  0.07  0.06  0.06  0.06  0.00  0.05  0.05  0.06 0.00 0.05
4  0.00 0.03 0.02 0.04 0.04  0.04  0.04  0.03  0.03  0.00  0.04  0.04  0.04 0.00 0.04
5  0.00 0.03 0.00 0.00 0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00 0.00 0.00
6  0.00 0.00 0.00 0.00 0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00 0.00 0.00

summary(gsam.vs)
     - SUMMARY - 

-Probability of correct classification by group (prob.classification):
   1    2    3    4    5    6    7    8    9 
0.72 0.84 0.82 0.94 1.00 0.86 0.88 0.73 1.00 

-Confusion matrix between the theoretical groups (by rows)
  and estimated groups (by column) 
   
      1   2   3   4   5   6   7   8   9
  1  72   0   5   0   0   3  15   5   0
  2   0  84   0   7   0   8   0   1   0
  3   0   1  82   0   0   0   1  16   0
  4   0   4   0  94   0   2   0   0   0
  5   0   0   0   0 100   0   0   0   0
  6   0   9   0   1   0  86   0   4   0
  7  12   0   0   0   0   0  88   0   0
  8   2   0  23   0   0   1   1  73   0
  9   0   0   0   0   0   0   0   0 100

-Probability of correct classification:  0.8656 

pred.gsam.vs <- predict(gsam.vs,ltest)
cat2meas(ytest,pred.gsam.vs)
[1] 0.8106667
table(ytest,pred.gsam.vs)
     pred.gsam.vs
ytest    1    2    3    4    5    6    7    8    9
    1  290    1   38    0    0   31   79    7    0
    2    3 1096    1  105    0  118    0    9    0
    3    0    1  117    0    0    0    2   23    0
    4    0    9    0  201    0    4    0    0    0
    5    0    0    0    0  104    0    0    0    0
    6    1   47    0    2    0  307    0   10    0
    7   16    0    0    0    0    0   80    1    0
    8    8    3   43    0    0    1    5  164    0
    9    0    0    0    0    0    0    0    0   73

4.4.3 Pravia Univ. Results

Accuracy average over 100 runs.
Method	\(x\)-pos., \(y\)-pos.	\(X(t)\)	\(X(t)\), \(x\)-pos., \(y\)-pos.
(F)GLM\(+\)MaxProb	0.4626	0.5898	0.7802
(F)GLM\(+\)MajVot	0.4600	0.6926	0.9022
(F)GAM\(+\)MaxProb	0.7070	0.7384	0.9768
(F)GAM \(+\)MajVot	0.7662	0.7824	0.9846

4.4.4 Functional data example

data(tecator)
y=tecator$y$Fat

# Potential functional covariates
x=tecator$absorp.fdata
x1<-fdata.deriv(x)
x2<-fdata.deriv(x,nderiv=2)

# Potential factor covariates
xcat0<-cut(rnorm(length(y)),4)
xcat1<-cut(tecator$y$Protein,4)
xcat2<-cut(tecator$y$Water,4)
ind <- 1:129

# 3 functionals (x,x1,x2), 3 factors (xcat0, xcat1, xcat2)
# and 100 potential scalars covariates (impact poitns of x1) 
dat    <- data.frame("Fat"=y, x1$data, xcat1, xcat2)
ldat <- list("df"=dat[ind,],"x"=x[ind,],"x1"=x1[ind,],"x2"=x2[ind,])

# Time consuming
res.gam1<-fregre.gsam.vs(data=ldat,y="Fat")
summary(res.gam1$model)

##       Fat            x2.PC1               x2.PC2               x2.PC3          
##  Min.   : 0.90   Min.   :-0.0033724   Min.   :-3.094e-03   Min.   :-0.0032994  
##  1st Qu.: 7.70   1st Qu.:-0.0023729   1st Qu.:-7.134e-04   1st Qu.:-0.0001902  
##  Median :14.60   Median :-0.0009539   Median :-2.468e-05   Median : 0.0001020  
##  Mean   :18.24   Mean   : 0.0000000   Mean   : 0.000e+00   Mean   : 0.0000000  
##  3rd Qu.:27.80   3rd Qu.: 0.0017247   3rd Qu.: 6.696e-04   3rd Qu.: 0.0002731  
##  Max.   :49.10   Max.   : 0.0095969   Max.   : 6.243e-03   Max.   : 0.0034894  
##      x2.PC4          
##  Min.   :-1.076e-03  
##  1st Qu.:-1.076e-04  
##  Median : 2.944e-05  
##  Mean   : 0.000e+00  
##  3rd Qu.: 1.294e-04  
##  Max.   : 1.125e-03

# Prediction like fregre.gsam() 
newldat <- list("df"=dat[-ind,],"x"=x[-ind,],"x1"=x1[-ind,],"x2"=x2[-ind,])
pred.gam1<-predict(res.gam1,newldat)
plot(dat[-ind,"Fat"],pred.gam1)

4.5 Optimum Multiscale Selection in 3D Point Cloud Classification (Oviedo-de la Fuente et al. 2021)

Supervised classification of 3D point clouds using machine learning algorithms and handcrafted local features as covariates frequently depends on the size of the neighborhood (scale) around each point used to determine those features. It is therefore crucial to estimate the scale or scales providing the best classification results. In this work, we propose three methods to estimate said scales, all of them based on calculating the maximum values of the distance correlation (DC) functions between the features and the label assigned to each point. The performance of the methods was tested using simulated data, and the method presenting the best results was applied to a benchmark data set for point cloud classification. This method consists of detecting the local maximums of DC functions previously smoothed to avoid choosing scales that are very close to each other. Five different classifiers were used: linear discriminant analysis, support vector machines, random forest, multinomial logistic regression and multilayer perceptron neural network. The results obtained were compared with those from other strategies available in the literature, being favorable to our approach.

Figure 4.1: Urban scene

Oviedo-de la Fuente, M.; Cabo, C.; Ordóñez, C.; Roca-Pardiñas, J. A Distance Correlation Approach for Optimum Multiscale Selection in 3D Point Cloud Classification. Mathematics 2021, 9, 1328. https://doi.org/10.3390/math9121328

Working paper

Figure 4.2: Forest scene

4.6 Model Comparison

Table collects the results of these fitted models. Recall that RMSE and MAE were calculated using 10-fold cross-validation. AIC and \(R^2_{adj}\) were computed using a model fitted with all available data.

Results for mean prediction models
Model	RMSE	\(\sigma_{\mathrm{RMSE}}\)	MAE	\(\sigma_{\mathrm{MAE}}\)	AIC	\(R^2_{adj}\)
M1	40.83	7.28	24.81	4.01	20254	79.38%
M2	57.87	35.23	25.28	4.99	19342	87.90%
M3	34.95	6.36	22.34	3.03	18626	86.70%

References

Akaike, Htrotugu. 1973. “Maximum Likelihood Identification of Gaussian Autoregressive Moving Average Models.” Biometrika 60 (2): 255–65.

Berrendero, José R, Antonio Cuevas, and Jos’e L Torrecilla. 2018. “On the Use of Reproducing Kernel Hilbert Spaces in Func2tional Classification.” Journal of the American Statistical Association 113: 1210–18.

Berrendero, José R, Antonio Cuevas, and José L Torrecilla. 2016. “Variable Selection in Functional Data Classification: A Maxima-Hunting Proposal.” Statistica Sinica, 619–38.

Efron, Bradley, Trevor Hastie, Iain Johnstone, Robert Tibshirani, et al. 2004. “Least Angle Regression.” The Annals of Statistics 32 (2): 407–99.

Febrero-Bande, Manuel, Wenceslao González-Manteiga, and Manuel Oviedo de la Fuente. 2019. “Variable Selection in Functional Additive Regression Models.” Computational Statistics 34 (2): 469–87. https://doi.org/10.1007/s00180-018-0844-5.

Ferraty, Frédéric, P Hall, and Philippe Vieu. 2010. “Most-Predictive Design Points for Functional Data Predictors.” Biometrika 97 (4): 807–24.

Ferraty, F., and P. Vieu. 2009. “Additive Prediction and Boosting for Functional Data.” Computational Statistics & Data Analysis 53 (4): 1400–1413. http://www.sciencedirect.com/science/article/pii/S0167947308005628.

Garcı́a-Portugués, Eduardo, Wenceslao González-Manteiga, and Manuel Febrero-Bande. 2014. “A Goodness-of-Fit Test for the Functional Linear Model with Scalar Response.” Journal of Computational and Graphical Statistics 23 (3): 761–78.

Lin, Yi, Hao Helen Zhang, et al. 2006. “Component Selection and Smoothing in Multivariate Nonparametric Regression.” The Annals of Statistics 34 (5): 2272–97.

Ordóñez, Celestino, Manuel Oviedo de la Fuente, Javier Roca-Pardiñas, and José Ramón Rodrı́guez-Pérez. 2018. “Determining Optimum Wavelengths for Leaf Water Content Estimation from Reflectance: A Distance Correlation Approach.” Chemometrics and Intelligent Laboratory Systems 173: 41–50.

Oviedo-de la Fuente, Manuel, Carlos Cabo, Celestino Ordóñez, and Javier Roca-Pardiñas. 2021. “A Distance Correlation Approach for Optimum Multiscale Selection in 3D Point Cloud Classification.” Mathematics 9 (12): 1328.

Peng, Hanchuan, Fuhui Long, and Chris Ding. 2005. “Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy.” IEEE Transactions on Pattern Analysis & Machine Intelligence, no. 8: 1226–38.

Székely, G. J., M. L. Rizzo, and N. K. Bakirov. 2007. “Measuring and Testing Dependence by Correlation of Distances.” The Annals of Statistics 35 (6): 2769–94. http://projecteuclid.org/euclid.aos/1201012979.

Yenigün, C Deniz, and Maria L Rizzo. 2015. “Variable Selection in Regression Using Maximal Correlation and Distance Correlation.” Journal of Statistical Computation and Simulation 85 (8): 1692–1705.

Zhao, Yihong, Huaihou Chen, and R Todd Ogden. 2015. “Wavelet-Based Weighted LASSO and Screening Approaches in Functional Linear Regression.” Journal of Computational and Graphical Statistics 24 (3): 655–75.