PPforest
implements a random forest using projection pursuit trees algorithm (based on PPtreeViz package).
Usage
PPforest(data, y, std = 'scale', size.tr, m, PPmethod, size.p,
lambda = .1, parallel = FALSE, cores = 2, rule = 1)
Arguments
- data
Data frame with the complete data set.
- y
A character with the name of the response variable.
- std
if TRUE standardize the data set, needed to compute global importance measure.
- size.tr
is the size proportion of the training if we want to split the data in training and test.
- m
is the number of bootstrap replicates, this corresponds with the number of trees to grow. To ensure that each observation is predicted a few times we have to select this number no too small.
m = 500
is by default.- PPmethod
is the projection pursuit index to optimize in each classification tree. The options are
LDA
andPDA
, linear discriminant and penalized linear discriminant. By default it isLDA
.- size.p
proportion of variables randomly sampled in each split.
- lambda
penalty parameter in PDA index and is between 0 to 1 . If
lambda = 0
, no penalty parameter is added and the PDA index is the same as LDA index. Iflambda = 1
all variables are treated as uncorrelated. The default value islambda = 0.1
.- parallel
logical condition, if it is TRUE then parallelize the function
- cores
number of cores used in the parallelization
- rule
split rule 1: mean of two group means 2: weighted mean of two group means - weight with group size 3: weighted mean of two group means - weight with group sd 4: weighted mean of two group means - weight with group se 5: mean of two group medians 6: weighted mean of two group medians - weight with group size 7: weighted mean of two group median - weight with group IQR 8: weighted mean of two group median - weight with group IQR and size
Value
An object of class PPforest
with components.
- prediction.training
predicted values for training data set.
- training.error
error of the training data set.
- prediction.test
predicted values for the test data set if
testap = TRUE
(default).- error.test
error of the test data set if
testap = TRUE
(default).- oob.error.forest
out of bag error in the forest.
- oob.error.tree
out of bag error for each tree in the forest.
- boot.samp
information of bootrap samples.
- output.trees
output from a
trees_pp
for each bootrap sample.- proximity
Proximity matrix, if two cases are classified in the same terminal node then the proximity matrix is increased by one in
PPforest
there are one terminal node per class.- votes
a matrix with one row for each input data point and one column for each class, giving the fraction of (OOB) votes from the
PPforest
.- n.tree
number of trees grown in
PPforest
.- n.var
number of predictor variables selected to use for spliting at each node.
- type
classification.
- confusion
confusion matrix of the prediction (based on OOB data).
- call
the original call to
PPforest
.- train
is the training data based on
size.tr
sample proportion- test
is the test data based on
1-size.tr
sample proportion
References
da Silva, N., Cook, D., & Lee, E. K. (2021). A projection pursuit forest algorithm for supervised classification. Journal of Computational and Graphical Statistics, 30(4), 1168-1180.
Examples
#crab example with all the observations used as training
set.seed(123)
pprf.crab <- PPforest(data = crab, y = 'Type',
std = 'no', size.tr = 0.8, m = 100, size.p = 1,
PPmethod = 'LDA' , parallel = TRUE, cores = 2, rule = 1)
pprf.crab
#>
#> Call:
#> PPforest(data = crab, y = "Type", std = "no", size.tr = 0.8, m = 100, PPmethod = "LDA", size.p = 1, parallel = TRUE, cores = 2, rule = 1)
#> Type of random forest: Classification
#> Number of trees: 100
#> No. of variables tried at each split: 5
#>
#> OOB estimate of error rate: 6.25%
#> Confusion matrix:
#> BlueFemale BlueMale OrangeFemale OrangeMale class.error
#> BlueFemale 37 3 0 0 0.07
#> BlueMale 6 34 0 0 0.15
#> OrangeFemale 0 0 39 1 0.03
#> OrangeMale 0 0 0 40 0.00