RSBFP: Regression by Selecting Best Feature Projections

Developed by
Tolga Aydin and H.Altay Guvenir

Department of Computer Engineering
Bilkent University
12 August 2000

Copyright (c) 2000 by Tolga Aydin and H.Altay Guvenir.
Permission is expressly granted to use this code in any non-commercial work, provided that this notice is preserved.

The C program (rsbfp.c) implements RSBFP (Regression by Selecting Best Feature Projections) method to approximate continuous function by using a given data set.

rsbfp is invoked as:

rsbfp <DOMAIN> [-v <V>]

Here <DOMAIN> is the name of the domain, and -v option determines the level of verbosity.

The rsbfp program expects the following files in the currect directory:

<DOMAIN>.info : Information file that records types of features
<DOMAIN>.train : Training set (Predicted feature is the last column)
<DOMAIN>.test : Querying set (Predicted feature is the last column)

The output is written to a file:

If verbosity option is set, then the intermediate activities are reported to a file called <DOMAIN>.log.rsbfp.

An example run for buying data is called as:

rsbfp buying -v 3

The rsbfp program reads information about the domain from the <DOMAIN>.info file. This file gives information about the number of features and their types. It must contain a line starting with the keyword Features. For example,

Features l l n l

indicates that there are 4 features; 1st, 2nd and 4th features take on linear values, while the 3rd feature is categorical.

For performance measuring, a shell script, cv, can be employed. The cv script can be invoked as;

cv <inducer> <DOMAIN> <fold>

An example run for rsbfp is:

cv rsbfp buying 10

This example runs rsbfp on buying data set by using 10-fold cross-validation. To be able to use cv, two files must be in directory:

<DOMAIN>.data : Data set
<DOMAIN>.info : Information file.