RSBFP: Regression by Selecting Best Feature Projections

Developed by
Tolga Aydin and H.Altay Guvenir
{atolga,guvenir}@cs.bilkent.edu.tr

Department of Computer Engineering
Bilkent University
12 August 2000

RSBFP.tar.gz
Copyright (c) 2000 by Tolga Aydin and H.Altay Guvenir.
Permission is expressly granted to use this code in any non-commercial work, provided that this notice is preserved.

The C program (rsbfp.c) implements RSBFP (Regression by Selecting Best Feature Projections) method to approximate continuous function by using a given data set.

rsbfp is invoked as:

rsbfp <DOMAIN> [-v <V>]

Here <DOMAIN> is the name of the domain, and -v option determines the level of verbosity.

The rsbfp program expects the following files in the currect directory:

<DOMAIN>.info : Information file that records types of features
<DOMAIN>.train : Training set (Predicted feature is the last column)
<DOMAIN>.test : Querying set (Predicted feature is the last column)

The output is written to a file:
<DOMAIN>.result.rsbfp

If verbosity option is set, then the intermediate activities are reported to a file called <DOMAIN>.log.rsbfp.

An example run for buying data is called as:

rsbfp buying -v 3

The rsbfp program reads information about the domain from the <DOMAIN>.info file. This file gives information about the number of features and their types. It must contain a line starting with the keyword Features. For example,

Features l l n l

indicates that there are 4 features; 1st, 2nd and 4th features take on linear values, while the 3rd feature is categorical.

For performance measuring, a shell script, cv, can be employed. The cv script can be invoked as;

cv <inducer> <DOMAIN> <fold>

An example run for rsbfp is:

cv rsbfp buying 10

This example runs rsbfp on buying data set by using 10-fold cross-validation. To be able to use cv, two files must be in directory:

<DOMAIN>.data : Data set
<DOMAIN>.info : Information file.