Search code examples
wekadata-mininglinear-regressionequation

Linear regression with a nominal attribute weka


I am having trouble interpreting the results from running the linear regression classifier on the cpu.with.vendor.arff training set. How do I treat the first 11 values in the equation where the nominal value is listed?

=== Run information ===

Scheme:weka.classifiers.functions.LinearRegression -S 0 -R 1.0E-8

Linear Regression Model

class =

-152.7641 * vendor=microdata,prime,formation,harris,dec,wang,perkin-elmer,nixdorf,bti,sratus,dg,burroughs,cambex,magnuson,honeywell,ipl,ibm,cdc,ncr,basf,gould,siemens,nas,adviser,sperry,amdahl +
141.8644 * vendor=prime,formation,harris,dec,wang,perkin-elmer,nixdorf,bti,sratus,dg,burroughs,cambex,magnuson,honeywell,ipl,ibm,cdc,ncr,basf,gould,siemens,nas,adviser,sperry,amdahl +
-38.2268 * vendor=burroughs,cambex,magnuson,honeywell,ipl,ibm,cdc,ncr,basf,gould,siemens,nas,adviser,sperry,amdahl +
39.4748 * vendor=cambex,magnuson,honeywell,ipl,ibm,cdc,ncr,basf,gould,siemens,nas,adviser,sperry,amdahl +
-39.5986 * vendor=honeywell,ipl,ibm,cdc,ncr,basf,gould,siemens,nas,adviser,sperry,amdahl +
21.412  * vendor=ipl,ibm,cdc,ncr,basf,gould,siemens,nas,adviser,sperry,amdahl +
-41.2397 * vendor=gould,siemens,nas,adviser,sperry,amdahl +
32.0545 * vendor=siemens,nas,adviser,sperry,amdahl +
-113.6927 * vendor=adviser,sperry,amdahl +
176.5205 * vendor=sperry,amdahl +
-51.2583 * vendor=amdahl +
0.0616 * MYCT +
0.0171 * MMIN +
0.0054 * MMAX +
0.6654 * CACH +
-1.4159 * CHMIN +
1.5538 * CHMAX +
-41.4854

Solution

  • If the vendor is equal to any of the line's nominal values, then the value is a one, otherwise, the value is a zero.

    For example, in line 1:

    -152.7641 * vendor=microdata,prime,formation,harris,dec,wang,perkin-elmer,nixdorf,bti,sratus,dg,burroughs,cambex,magnuson,honeywell,ipl,ibm,cdc,ncr,basf,gould,siemens,nas,adviser,sperry,amdahl
    

    The value would be subtracted by 152.7641 if and only if the vendor is equal to one of [microdata, prime, formation, harris, dec, wang, perkinlmer, nixdorf, bti, sratus, dg, burroughs, cambex, magnuson, honeywell, ipl, ibm, cdc, ncr, basf, gould, siemens, nas, adviser, sperry, amdahl].

    The value may be adjusted further based on the other 10 tests on the attribute as well, thus resulting in different offsets for different nominal values.

    Hope this Helps!