I am trying to implement an indoor location tracking system using bluetooth dongles. The idea is to walk around with an android device and calculate your location in a room based on the signal strengths of bluetooth dongles placed around the room. In order to do this I have decided to use machine learning to approximate, as closely as possible, the RSSI as a distance, meters for example. I have been told by a lecturer in my college that LibSVM is what I'm looking for so I've been doing some reading. I had a look at this tutorial and can't seem to get my head around the data that's needed to train the system. The data that I will have is:
I understand the data has to be in SVM format but I'm a bit unsure of what it should be in terms of input data and output data. The example below, taken from the tutorial I've mentioned, shows that a man is a class and a woman is a class. So in my case would I have just one class "dongle"? And should all the values dongle reflect the values I have stored in my database?
man voice:low figure:big income:good
woman voice:high figure:slim income:fare
- Convert the feature values to its numeric representation. Let's say, that best salary would be 5 and worst salary 1 (or no salary = 0), the same with other enumarated variables.
- We have 2 classes, man and women . convert the classes to numeric values: man = 1, woman = -1
- Save it in libsvm data format:
[class/target] 1:[firstFeatureValue] 2:[secondFeatureValue] etc. ex.: a women with great salary, low voice and small figure would be encoded like: -1 1:5 2:1.5 3:1.8
In general the input file format of SVM is
[label] [index1]:[value1] [index2]:[value2] ... [label] [index1]:[value1] [index2]:[value2] ...
Could someone give me an example of what I should be aiming for?
This is all brand new to me so any helpful hints or tips to get me going would be great. Thanks in advance
I've implemented a WiFi fingerprinting for indoor localization, so I'm aware of some of the issues here.
First, to determine your location, are you performing fingerprinting or signal-strength trilateration (which people mistakingly call triangulation)? Trilateration is the process of intersecting multiple spheres to find a location in space. On the other hand, fingerprinting is a classification problem that resolves signals to a location with no actual distances calculated.
Trilateration is extremely difficult indoors due to wireless problems like multi-path fading. These effects will cause your signal to attenuate, which in turn will cause your distances estimates to be off.
Fingerprinting is simply a classification problem. Like trilateration, it makes the assumption that the location of dongles do not change. However, unlike trilateration, it does not use distances at all.
Trilateration has the advantage that, assuming that the distance estimates are correct (which in reality is difficult to attain), you will be able to resolve your location over a continuous (non-discrete) range. Since fingerprinting is a classification problem, it must classify to one of a fixed set of discrete locations; for example, if your Bluetooth radios are arranged along the perimeter of a room, you may end up discretizing the interior of the room into one of 3x3 possible locations.
If you are going with fingerprinting, then you will need to collect training data with feature vectors that looks like:
MAC_1:-87, MAC_2:-40, MAC_3:-91, class=location_A
MAC_1:-31, MAC_2:-90, MAC_3:-79, class=location_B
Where for each location in the room, you read the RSSI from all the available Bluetooth radios you can sense. You should take at least 10 readings for each location. For WiFi, the RSSI values are integers in units of decibels in the range of -100 to -1 (where, for example, -20 dB means you are really close to the radio).
Now, when you are trying to perform the classification, you will take a reading like:
MAC_1:-89, MAC_2:-71, MAC_3:-22, class=?
The problem is to classify those RSSI readings to one of the locations.
In my previous work, I used a Naive Bayes classifier rather than SVM because Naive Bayes accommodates missing features easily (by allowing you to give a small probability mass to the missing feature). Also, in Naive Bayes, I used a Gaussian PDF function to calculate the likelihood probability P(location | MAC_i = RSSI_i) since all the RSSI values are numbers.