Skip to contents

PPforest implements a random forest using projection pursuit trees algorithm (based on PPtreeViz package).

Usage

PPforest(data, class, std = TRUE, size.tr, m, PPmethod, size.p,
 lambda = .1, parallel = FALSE, cores = 2, rule = 1)

Arguments

data

Data frame with the complete data set.

class

A character with the name of the class variable.

std

if TRUE standardize the data set, needed to compute global importance measure.

size.tr

is the size proportion of the training if we want to split the data in training and test.

m

is the number of bootstrap replicates, this corresponds with the number of trees to grow. To ensure that each observation is predicted a few times we have to select this number no too small. m = 500 is by default.

PPmethod

is the projection pursuit index to optimize in each classification tree. The options are LDA and PDA, linear discriminant and penalized linear discriminant. By default it is LDA.

size.p

proportion of variables randomly sampled in each split.

lambda

penalty parameter in PDA index and is between 0 to 1 . If lambda = 0, no penalty parameter is added and the PDA index is the same as LDA index. If lambda = 1 all variables are treated as uncorrelated. The default value is lambda = 0.1.

parallel

logical condition, if it is TRUE then parallelize the function

cores

number of cores used in the parallelization

rule

split rule 1: mean of two group means 2: weighted mean of two group means - weight with group size 3: weighted mean of two group means - weight with group sd 4: weighted mean of two group means - weight with group se 5: mean of two group medians 6: weighted mean of two group medians - weight with group size 7: weighted mean of two group median - weight with group IQR 8: weighted mean of two group median - weight with group IQR and size

Value

An object of class PPforest with components.

prediction.training

predicted values for training data set.

training.error

error of the training data set.

prediction.test

predicted values for the test data set if testap = TRUE(default).

error.test

error of the test data set if testap = TRUE(default).

oob.error.forest

out of bag error in the forest.

oob.error.tree

out of bag error for each tree in the forest.

boot.samp

information of bootrap samples.

output.trees

output from a trees_pp for each bootrap sample.

proximity

Proximity matrix, if two cases are classified in the same terminal node then the proximity matrix is increased by one in PPforest there are one terminal node per class.

votes

a matrix with one row for each input data point and one column for each class, giving the fraction of (OOB) votes from the PPforest.

n.tree

number of trees grown in PPforest.

n.var

number of predictor variables selected to use for spliting at each node.

type

classification.

confusion

confusion matrix of the prediction (based on OOB data).

call

the original call to PPforest.

train

is the training data based on size.tr sample proportion

test

is the test data based on 1-size.tr sample proportion

References

Natalia da Silva, Dianne Cook & Eun-Kyung Lee (2021) A Projection Pursuit Forest Algorithm for Supervised Classification, Journal of Computational and Graphical Statistics, DOI: 10.1080/10618600.2020.1870480

Examples

#crab example with all the observations used as training

pprf.crab <- PPforest(data = crab, class = 'Type',
 std = FALSE, size.tr = 1, m = 200, size.p = .5, 
 PPmethod = 'LDA' , parallel = TRUE, cores = 2, rule=1)
pprf.crab
#> 
#> Call:
#>  PPforest(data = crab, class = "Type", std = FALSE, size.tr = 1,      m = 200, PPmethod = "LDA", size.p = 0.5, parallel = TRUE,      cores = 2, rule = 1) 
#>                Type of random forest: Classification
#>                      Number of trees: 200
#> No. of variables tried at each split: 2
#> 
#>         OOB estimate of  error rate: 6.5%
#> Confusion matrix:
#>              BlueFemale BlueMale OrangeFemale OrangeMale class.error
#> BlueFemale           48        2            0          0        0.04
#> BlueMale              6       44            0          0        0.12
#> OrangeFemale          0        0           46          4        0.08
#> OrangeMale            0        1            0         49        0.02