I have the following output from weka for SVM
classification. I wanted to plot the SVM classifier output in to anomaly or normal. How is it possible to get the SVM scoring function
out of this output?
=== Run information ===
Scheme: weka.classifiers.functions.SMO -C 1.0 -L 0.001 -P 1.0E-12 -N 0 -V -1 -W 1 -K "weka.classifiers.functions.supportVector.PolyKernel -E 1.0 -C 250007"
Relation: KDDTrain
Instances: 125973
Attributes: 42
Test mode: 10-fold cross-validation
=== Classifier model (full training set) ===
Kernel used:
Linear Kernel: K(x,y) = <x,y>
Classifier for classes: normal, anomaly
Machine linear: showing attribute weights, not support vectors.
-0.0498 * (normalized) duration
+ 0.5131 * (normalized) protocol_type=tcp
+ -0.6236 * (normalized) protocol_type=udp
+ 0.1105 * (normalized) protocol_type=icmp
+ -1.1861 * (normalized) service=auth
+ 0 * (normalized) service=bgp
+ 0 * (normalized) service=courier
+ 1 * (normalized) service=csnet_ns
+ 1 * (normalized) service=ctf
+ 1 * (normalized) service=daytime
+ -0 * (normalized) service=discard
+ -1.2505 * (normalized) service=domain
+ -0.6878 * (normalized) service=domain_u
+ 0.9418 * (normalized) service=echo
+ 1.1964 * (normalized) service=eco_i
+ 0.9767 * (normalized) service=ecr_i
+ 0.0073 * (normalized) service=efs
+ 0.0595 * (normalized) service=exec
+ -1.4426 * (normalized) service=finger
+ -1.047 * (normalized) service=ftp
+ -1.4225 * (normalized) service=ftp_data
+ 2 * (normalized) service=gopher
+ 1 * (normalized) service=hostnames
+ -0.9961 * (normalized) service=http
+ 0.7255 * (normalized) service=http_443
+ 0.5128 * (normalized) service=imap4
+ -6.3664 * (normalized) service=IRC
+ 1 * (normalized) service=iso_tsap
+ -0 * (normalized) service=klogin
+ 0 * (normalized) service=kshell
+ 0.7422 * (normalized) service=ldap
+ 1 * (normalized) service=link
+ 0.5993 * (normalized) service=login
+ 1 * (normalized) service=mtp
+ 1 * (normalized) service=name
+ 0.2322 * (normalized) service=netbios_dgm
+ 0.213 * (normalized) service=netbios_ns
+ 0.1902 * (normalized) service=netbios_ssn
+ 1.1472 * (normalized) service=netstat
+ 0.0504 * (normalized) service=nnsp
+ 1.058 * (normalized) service=nntp
+ -1 * (normalized) service=ntp_u
+ -1.5344 * (normalized) service=other
+ 1.3595 * (normalized) service=pm_dump
+ 0.8355 * (normalized) service=pop_2
+ -2 * (normalized) service=pop_3
+ 0 * (normalized) service=printer
+ 1.051 * (normalized) service=private
+ -0.3082 * (normalized) service=red_i
+ 1.0034 * (normalized) service=remote_job
+ 1.0112 * (normalized) service=rje
+ -1.0454 * (normalized) service=shell
+ -1.6948 * (normalized) service=smtp
+ 0.1388 * (normalized) service=sql_net
+ -0.3438 * (normalized) service=ssh
+ 1 * (normalized) service=supdup
+ 0.8756 * (normalized) service=systat
+ -1.6856 * (normalized) service=telnet
+ -0 * (normalized) service=tim_i
+ -0.8579 * (normalized) service=time
+ -0.726 * (normalized) service=urh_i
+ -1.0285 * (normalized) service=urp_i
+ 1.0347 * (normalized) service=uucp
+ 0 * (normalized) service=uucp_path
+ 0 * (normalized) service=vmnet
+ 1 * (normalized) service=whois
+ -1.3388 * (normalized) service=X11
+ 0 * (normalized) service=Z39_50
+ 1.7882 * (normalized) flag=OTH
+ -3.0982 * (normalized) flag=REJ
+ -1.7279 * (normalized) flag=RSTO
+ 1 * (normalized) flag=RSTOS0
+ 2.4264 * (normalized) flag=RSTR
+ 1.5906 * (normalized) flag=S0
+ -1.952 * (normalized) flag=S1
+ -0.9628 * (normalized) flag=S2
+ -0.3455 * (normalized) flag=S3
+ 1.2757 * (normalized) flag=SF
+ 0.0054 * (normalized) flag=SH
+ 0.8742 * (normalized) src_bytes
+ 0.0542 * (normalized) dst_bytes
+ -1.2659 * (normalized) land=1
+ 2.7922 * (normalized) wrong_fragment
+ 0.0662 * (normalized) urgent
+ 8.1153 * (normalized) hot
+ 2.4822 * (normalized) num_failed_logins
+ 0.2242 * (normalized) logged_in=1
+ -0.0544 * (normalized) num_compromised
+ 0.9248 * (normalized) root_shell
+ -2.363 * (normalized) su_attempted
+ -0.2024 * (normalized) num_root
+ -1.2791 * (normalized) num_file_creations
+ -0.0314 * (normalized) num_shells
+ -1.4125 * (normalized) num_access_files
+ -0.0154 * (normalized) is_host_login=1
+ -2.3307 * (normalized) is_guest_login=1
+ 4.3191 * (normalized) count
+ -2.7484 * (normalized) srv_count
+ -0.6276 * (normalized) serror_rate
+ 2.843 * (normalized) srv_serror_rate
+ 0.6105 * (normalized) rerror_rate
+ 3.1388 * (normalized) srv_rerror_rate
+ -0.1262 * (normalized) same_srv_rate
+ -0.1825 * (normalized) diff_srv_rate
+ 0.2961 * (normalized) srv_diff_host_rate
+ 0.7812 * (normalized) dst_host_count
+ -1.0053 * (normalized) dst_host_srv_count
+ 0.0284 * (normalized) dst_host_same_srv_rate
+ 0.4419 * (normalized) dst_host_diff_srv_rate
+ 1.384 * (normalized) dst_host_same_src_port_rate
+ 0.8004 * (normalized) dst_host_srv_diff_host_rate
+ 0.2301 * (normalized) dst_host_serror_rate
+ 0.6401 * (normalized) dst_host_srv_serror_rate
+ 0.6422 * (normalized) dst_host_rerror_rate
+ 0.3692 * (normalized) dst_host_srv_rerror_rate
- 2.5266
Number of kernel evaluations: -1049600465
Output prediction - sample output
inst# actual predicted error prediction
1 1:normal 1:normal 1
2 1:normal 1:normal 1
3 2:anomaly 2:anomaly 1
4 1:normal 1:normal 1
5 1:normal 1:normal 1
6 2:anomaly 2:anomaly 1
7 2:anomaly 2:anomaly 1
8 2:anomaly 2:anomaly 1
9 2:anomaly 2:anomaly 1
10 2:anomaly 2:anomaly 1
11 2:anomaly 2:anomaly 1
12 2:anomaly 2:anomaly 1
13 1:normal 1:normal 1
14 2:anomaly 1:normal + 1
15 2:anomaly 2:anomaly 1
16 2:anomaly 2:anomaly 1
17 1:normal 1:normal 1
18 2:anomaly 2:anomaly 1
19 1:normal 1:normal 1
20 1:normal 1:normal 1
21 2:anomaly 2:anomaly 1
22 2:anomaly 2:anomaly 1
23 1:normal 1:normal 1
24 1:normal 1:normal 1
25 2:anomaly 2:anomaly 1
26 1:normal 1:normal 1
27 2:anomaly 2:anomaly 1
28 1:normal 1:normal 1
29 1:normal 1:normal 1
30 1:normal 1:normal 1
31 2:anomaly 2:anomaly 1
32 2:anomaly 2:anomaly 1
33 1:normal 1:normal 1
34 2:anomaly 2:anomaly 1
35 1:normal 1:normal 1
36 1:normal 1:normal 1
37 1:normal 1:normal 1
38 2:anomaly 2:anomaly 1
39 1:normal 1:normal 1
40 2:anomaly 2:anomaly 1
41 2:anomaly 2:anomaly 1
42 2:anomaly 2:anomaly 1
43 1:normal 1:normal 1
44 1:normal 1:normal 1
45 1:normal 1:normal 1
46 2:anomaly 2:anomaly 1
47 2:anomaly 2:anomaly 1
48 1:normal 1:normal 1
49 2:anomaly 1:normal + 1
50 2:anomaly 2:anomaly 1
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.986 0.039 0.967 0.986 0.976 0.948 0.973 0.960 normal
0.961 0.014 0.983 0.961 0.972 0.948 0.973 0.963 anomaly
Weighted Avg. 0.974 0.028 0.974 0.974 0.974 0.948 0.973 0.962
=== Confusion Matrix ===
a b <-- classified as
66389 954 | a = normal
2301 56329 | b = anomaly
That output is the scoring function. Read the equals sign as a simple Boolean operator, evaluating to 1 for true, 0 for false. Thus, out of all the choices for a classification attributes, only one of the coefficients will affect the scoring value.
For example, let's consider only the first three attributes, with these normalized inputs and resulting values:
duration 2.0 -0.0498 * 2.0 => -0.0996
protocol_type icmp 0.1105
service eco_i 1.1964
Note that the other protocol_type and service terms (such as
-0.6236 * protocol_type=udp
) have comparisons that evaluate to 0 (protocol_type=upd becomes 0), so those coefficients won't affect the overall sum.
From these three attributes, the score so far is the sum of these three terms, or 1.2073. Continue with the other 39 attributes, plus the constant -2.5266 at the end, and there's your vector's score.
Does that explain it well enough?
The critical phrase in the blog you cite is:
if the output of the scoring function is negative then the input is classified as belonging to class y = -1. If the score is positive, the input is classified as belonging to class y = 1.
Yes, it's that simple: implement that nice, linear scoring function (42 variables, 116 terms). Plug in a vector. If the function comes up positive, the vector is normal; if it comes up negative, the vector is an anomaly.
Yes, your model is significantly longer than the blog's example. That example is based on two continuous features; you have 42 features, three of which are classification features (hence the extra 73 terms). The example has 3 support vectors; yours will have 43 (N dimensions requires N+1 support vectors). However, even this 42-dimensional model operates on the same principle: positive = normal, negative = anomaly.
As for your desire to map to a 2-dimensional display ... it's possible ... but I don't know what you'd find meaningful in this instance. Mapping 42 variables to 3 creates a lot of congestion in our space. I've seen some nice tricks here and there, especially with gradient fields where the force vectors are in the same spatial interpretation as the data points. A weather map manages to represent x,y,z coordinates of a measurement, adding wind velocity (3D), cloud cover, and maybe a couple other metrics into the display. That's maybe 10 symbolic dimensions.
In your case, we could perhaps just drop the dimensions with coefficients smaller than 0.07 as being insignificant; that saves 6 features. The three classification features we could perhaps represent with color, dashed/dotted/solid symbol, and a tiny text overlay on the O or X (normal/anomaly data). That's 9 down without using Cartesian position (x,y,z coordinates, assuming the plot is meaningful in 3D).
However, I don't know your data nearly well enough to suggest where we might cram the remaining 33 features into 2 or 3 dimensions. Can you somehow combine any of those inputs? Does a linear combination of multiple features give you a result that is still meaningful in prediction?
If not, then we're stuck with the canonical approach: pick interesting combinations of features (usually pairs). Plot a graph for each, ignoring the other features entirely. If none of those make visual sense ... there's our answer: no we can't plot the data nicely. Sorry, but reality often does this to us in a complex environment, and we handle the data in tables, correlations, and other methods we can handle with our 3D minds.