RPFP: Regression by Partitioning Feature Projections

Developed by
Ilhan Uysal and H.Altay Guvenir

Department of Computer Engineering
Bilkent University
28 July 1999

Copyright (c) 1999 by Ilhan Uysal and H.Altay Guvenir.
Permission is expressly granted to use this code in any non-commercial work, provided that this notice is preserved.

The C program (rpfp.c) implements RPFP (Regression by Partitioning Feature Projections) method to approximate continuous function by using a given data set.

rpfp is invoked as:

rpfp <DOMAIN> -k <K> -v <V>

Here <DOMAIN> is the name of the domain, -k option determines the number of instances left after partitioning for the local region of the query and -v option determines the level of verbosity. The rpfp program expects the following files in the currect directory:

<DOMAIN>.info : Information file that record types of features
<DOMAIN>.train : Train set (Predicted feature is the last column)
<DOMAIN>.test : Test set (Predicted feature is the last column)

The output is writen to a file:

If verbosity option is set, then the intermediate activities are reported to a file called <DOMAIN>.log.rpfp.

An example run for housing data is called as:

rpfp housing -k 10 -v 3

The rpfp programs read information about the domain from the <DOMAIN>.info file. This file gives information about the number of features, their types. It must contain a line starting with the keyword Features. For example,

Features l l n l

indicates that there are 4 features; 1st, 2nd and 4th features take on linear values, while the 3rd feature is nominal. There are only 2 values are accepted; namely l (linear) and n (nominal).

For performance measuring, a shell script, cv, canbe employed, for experiments. The cv script can be invokes as;

cv <inducer> <DOMAIN> <fold> <k>

An example run for rpfp is:

cv rpfp housing 5 10

invokes rpfp for 5 fold cross-validation for k=10 on housing data. To be able to use cv, two files must be in directory:

<DOMAIN>.data : Data set
<DOMAIN>.info : Information file.