RFP: Regression on Feature Projections

Developed by
Ilhan Uysal, H.Altay Guvenir and Tolga Aydin

Department of Computer Engineering
Bilkent University
11 March 1999

Copyright (c) 1999 by Ilhan Uysal, H.Altay Guvenir, and Tolga Aydin.
Permission is expressly granted to use this code in any non-commercial work, provided that this notice is preserved.

The C program (rfp.c) implements RFP (Regression on Feature Projections) method to approximate continuous function by using a given data set.

rfp is invoked as:

rfp <DOMAIN> -k <K> -v <V>

Here <DOMAIN> is the name of the domain, -k option determines the number of instances left after partitioning for the local region of the query and -v option determines the level of verbosity. The rfp program expects the following files in the currect directory:

<DOMAIN>.info : Information file that record types of features
<DOMAIN>.train : Train set (Predicted feature is the last column)
<DOMAIN>.test : Test set (Predicted feature is the last column)

The output is writen to a file:

If verbosity option is set, then the intermediate activities are reported to a file called <DOMAIN>.log.rfp.

An example run for housing data is called as:

rfp housing -k 10 -v 3

The rfp program reads information about the domain from the <DOMAIN>.info file. This file gives information about the number of features, their types. It must contain a line starting with the keyword Features. For example,

Features l l n l

indicates that there are 4 features; 1st, 2nd and 4th features take on linear values, while the 3rd feature is nominal. There are only 2 values are accepted; namely l (linear) and n (nominal).

For performance measuring, a shell script, cv, canbe employed, for experiments. The cv script can be invokes as;

cv <inducer> <DOMAIN> <fold> <k>

An example run for rfp is:

cv rpfp housing 5 10

invokes rpfp for 5 fold cross-validation for k=10 on housing data. To be able to use cv, two files must be in directory:

<DOMAIN>.data : Data set
<DOMAIN>.info : Information file.