Projection Pursuit Random Forest — PPforest • PPforest

PPforest implements a random forest using projection pursuit trees algorithm (based on PPtreeViz package).

Usage

PPforest(data, class, std = TRUE, size.tr, m, PPmethod, size.p,
 lambda = .1, parallel = FALSE, cores = 2, rule = 1)

Arguments

data: Data frame with the complete data set.
class: A character with the name of the class variable.
std: if TRUE standardize the data set, needed to compute global importance measure.
size.tr: is the size proportion of the training if we want to split the data in training and test.
m: is the number of bootstrap replicates, this corresponds with the number of trees to grow. To ensure that each observation is predicted a few times we have to select this number no too small. m = 500 is by default.
PPmethod: is the projection pursuit index to optimize in each classification tree. The options are LDA and PDA, linear discriminant and penalized linear discriminant. By default it is LDA.
size.p: proportion of variables randomly sampled in each split.
lambda: penalty parameter in PDA index and is between 0 to 1 . If lambda = 0, no penalty parameter is added and the PDA index is the same as LDA index. If lambda = 1 all variables are treated as uncorrelated. The default value is lambda = 0.1.
parallel: logical condition, if it is TRUE then parallelize the function
cores: number of cores used in the parallelization
rule: split rule 1: mean of two group means 2: weighted mean of two group means - weight with group size 3: weighted mean of two group means - weight with group sd 4: weighted mean of two group means - weight with group se 5: mean of two group medians 6: weighted mean of two group medians - weight with group size 7: weighted mean of two group median - weight with group IQR 8: weighted mean of two group median - weight with group IQR and size

Value

An object of class PPforest with components.

prediction.training: predicted values for training data set.
training.error: error of the training data set.
prediction.test: predicted values for the test data set if testap = TRUE(default).
error.test: error of the test data set if testap = TRUE(default).
oob.error.forest: out of bag error in the forest.
oob.error.tree: out of bag error for each tree in the forest.
boot.samp: information of bootrap samples.
output.trees: output from a trees_pp for each bootrap sample.
proximity: Proximity matrix, if two cases are classified in the same terminal node then the proximity matrix is increased by one in PPforest there are one terminal node per class.
votes: a matrix with one row for each input data point and one column for each class, giving the fraction of (OOB) votes from the PPforest.
n.tree: number of trees grown in PPforest.
n.var: number of predictor variables selected to use for spliting at each node.
type: classification.
confusion: confusion matrix of the prediction (based on OOB data).
call: the original call to PPforest.
train: is the training data based on size.tr sample proportion
test: is the test data based on 1-size.tr sample proportion

References

Natalia da Silva, Dianne Cook & Eun-Kyung Lee (2021) A Projection Pursuit Forest Algorithm for Supervised Classification, Journal of Computational and Graphical Statistics, DOI: 10.1080/10618600.2020.1870480

Examples

#crab example with all the observations used as training
set.seed(123)
pprf.crab <- PPforest(data = crab, class = 'Type',
 std = FALSE, size.tr = 0.7, m = 200, size.p = .5, 
 PPmethod = 'LDA' , parallel = TRUE, cores = 2, rule=1)
pprf.crab
#> 
#> Call:
#>  PPforest(data = crab, class = "Type", std = FALSE, size.tr = 0.7,      m = 200, PPmethod = "LDA", size.p = 0.5, parallel = TRUE,      cores = 2, rule = 1) 
#>                Type of random forest: Classification
#>                      Number of trees: 200
#> No. of variables tried at each split: 3
#> 
#>         OOB estimate of  error rate: 5%
#> Confusion matrix:
#>              BlueFemale BlueMale OrangeFemale OrangeMale class.error
#> BlueFemale           33        2            0          0        0.06
#> BlueMale              3       32            0          0        0.09
#> OrangeFemale          0        0           33          2        0.06
#> OrangeMale            0        0            0         35        0.00