RPFP: Regression by Partitioning Feature Projections

Developed by
Ilhan Uysal and H.Altay Guvenir
{uilhan,guvenir}@cs.bilkent.edu.tr

Department of Computer Engineering
Bilkent University
28 July 1999

RPFP.tar.gz
Copyright (c) 1999 by Ilhan Uysal and H.Altay Guvenir.
Permission is expressly granted to use this code in any non-commercial work, provided that this notice is preserved.

The C program (rpfp.c) implements RPFP (Regression by Partitioning Feature Projections) method to approximate continuous function by using a given data set.

rpfp is invoked as:

rpfp <DOMAIN> -k <K> -v <V>

Here <DOMAIN> is the name of the domain, -k option determines the number of instances left after partitioning for the local region of the query and -v option determines the level of verbosity. The rpfp program expects the following files in the currect directory:

<DOMAIN>.info : Information file that record types of features
<DOMAIN>.train : Train set (Predicted feature is the last column)
<DOMAIN>.test : Test set (Predicted feature is the last column)

The output is writen to a file:
<DOMAIN>.res.rpfp

If verbosity option is set, then the intermediate activities are reported to a file called <DOMAIN>.log.rpfp.

An example run for housing data is called as:

rpfp housing -k 10 -v 3

The rpfp programs read information about the domain from the <DOMAIN>.info file. This file gives information about the number of features, their types. It must contain a line starting with the keyword Features. For example,

Features l l n l

indicates that there are 4 features; 1st, 2nd and 4th features take on linear values, while the 3rd feature is nominal. There are only 2 values are accepted; namely l (linear) and n (nominal).

For performance measuring, a shell script, cv, canbe employed, for experiments. The cv script can be invokes as;

cv <inducer> <DOMAIN> <fold> <k>

An example run for rpfp is:

cv rpfp housing 5 10

invokes rpfp for 5 fold cross-validation for k=10 on housing data. To be able to use cv, two files must be in directory:

<DOMAIN>.data : Data set
<DOMAIN>.info : Information file.