My question is not on the algorithm itself, but on the performance of F# when its pushed to its limit. In this instance a program, where I try to solve the problem with 25 cities by brute force (Dynamic Programming)
(The algorithm seemed to work fine for a small example, and I'm confident it does its job)
I'm printing a count variable in the console output window to monitor progress.
As expected for the first up to m=12
iterations, runtime increases exponentially.
m=2
1887ms;
m=3
1870ms;
m=4
1902ms;
m=5
2074ms;
m=6
2954ms;
m=7
6261ms;
m=8
16746ms;
m=9
38442ms;
m=10
80396ms;
m=11
140985ms;
m=12
207950ms;
So although the performance of before iteration 13 is not stellar, at least its seems "manageable". Actually, if my understanding is correct, iteration 12 and 13 are the most CPU heavy. Still, I would have expected from the pattern of execution, an execution time of around 300000ms for that iteration, and its not the case.
Zooming on the control monitor of my (new) iMAC Retina running on MacOS X 10.11.3 El Capitan, equipped with 3.3Ghz i7 quad-core, 16 GB RAM, and a SSD, F# running inside Xamarin.Studio 6.0. I see the memory usage by the program is a big 3 GB. It has not been optimized at all, but it should be well inside the capabilities of the machine and should not be a burden?
m=13
very very very slow progress, at the pace, it seemed it would take hours to compute. At this stage, on the CPU side of the monitor, it says the process uses 105% of CPU (first column on the left). After 1 hour (and 2/3 of completing the iteration) it crashed with the following message:
Error: Garbage collector could not allocate 16384 bytes of memory for major heap section.
I'm a bit surprised I need to do Garbage Collection because, memory didn't look like the main issue?
I'm defining an enormous Array2D
with 2^24 entries and 24 columns = 17M * 24 entries of (float32*sbyte) using 32+8=40 bits = 5 bytes each so 2 GB
Perhaps thats where the problem is that I make a for S in Subset_of_size_m
loop?
This has to be done 2,700,000 times at iteration 13 (stopped at 1,860,000) in this loop I make some assignments of lists. Perhaps there is a need for Garbage Collection there?
I've never done this before in F# (only in R where it sufficed to write the command GC()
or remove(object)
from time to time. Monitoring in the control monitor, memory never seems to be an issue compared to available RAM? What's happening?
I understand I should perhaps find a better algorithm which is less memory-intensive but anyway its the good occasion to learn about this problem, and compared to other people who solved the problem, the execution time if everything went as planned, was not totally ridiculous for a first (brutal) try.
Here is the source code
//////// Travelling Salesman problem ////////
open System
open System.Collections
open System.Collections.Generic
open System.IO
open System.Windows
open FSharp.Charting
exception InnerError of string
let stopWatch = System.Diagnostics.Stopwatch.StartNew()
///////////////// preparing the data /////////////////
// format of the files
//[number_of_cities]
//[x_1] [y_1] // coordinate
let x = File.ReadAllLines "/Users/francois-guillaume.rideau/Documents/MOOC/TSP.txt"
let split (text:string)=
text.Split [|'\t';' '|]
let splitInto2Values (A: string []) =
(float A.[0],float A.[1])
let parseLine (line:string) =
line
|> split
|> splitInto2Values
let num_cities = int x.[0] // 25
let cities = x |> Array.tail |> Array.map parseLine // [x_1][y_1]
let dist_matrix = Array2D.create num_cities num_cities 0.0f
for i in 0..(num_cities-1)do
for j in 0..(num_cities-1) do
dist_matrix.[i,j]<- float32 (sqrt ( (fst cities.[i] - fst cities.[j])*(fst cities.[i] - fst cities.[j]) + (snd cities.[i] - snd cities.[j])*(snd cities.[i] - snd cities.[j]) ))
let arrayOfArrays = [| [| 0.0f; 2.0f;1.0f;4.0f |]; [|2.0f;0.0f; 3.0f;5.0f |]; [|1.0f;3.0f; 0.0f;6.0f |]; [|4.0f;5.0f; 6.0f;0.0f |] |]
let example = Array2D.init 4 4 (fun i j -> arrayOfArrays.[i].[j])
// Dynamic programming algorithm
// Subproblems:
// For every destination j in {1,2,......n} every subset S in {1,2,....n} that contains 1 and j, let
// A(S,j) = minimum length of a path of a path from 1 to j that visits precisely the vertices of S [exactly once each]
// create A = Array2D indexed by subsets S that contain 1 and destinations j
// Base Case A[S,1] = 0 if S = {1} , +infinity otherwise
// for m = 2,3,4,.... n (m = subproblem size)
// for each Set S in {1,2,...n} of size m that contains 1
// for each j in S, j different from 1:
// A[S,j] = min (k in S, k <> j) {A[S-{j},k]+Ckj}
// Return min (j=2 to n) A[{1,2,3,....,n},j]+Cj1
let limit = 100000000.0f
//// the first city is city 0 in array D. we put it apart,
//// then we represents S as integers.
//// we take the binary representation of integer, and the pth bit indicates whether city p+1 belongs to S
//// for example S =11 = 1+2+8 contains cities 2,3 and 9 (members 11 will return [(0, 1); (1, 2); (3, 8)])
/////////////////////////////// with path ///////////////////////////////////////////
let TSP_all_c_Dynamic_Programming_with_path_main(D:float32 [,]) = // solves the TSP problem when ALL cities are connected together with an exhaustive search in exponential time O(n^2 2^n)
// the input is a distance matrix in float32
// memory usage is not optimized at all....
let num_cities = Array2D.length1 D
let l2 = Array2D.length2 D
if (l2<>num_cities) then failwith "Distance matrix is not a squared matrix"
let powers_of_2 = [|1;2;4;8;16;32;64;128;256;512;1024;2048;4096;8192;16384;32768;65536;131072;262144;524288;1048576;2097152;4194304;8388608;16777216|]
let power2 k =
if ((k >= 25) || (k<0)) then raise (InnerError("power exponent not allowed"))
else powers_of_2.[k]
let num_subsets = power2 (num_cities-1)
let S_full = num_subsets-1
let A = Array2D.create num_subsets (num_cities-1) (limit,-1y)
A.[0,0]<-(-0.0f,-2y)
let rec sumbits (n:int):int=
let rec helper acc m =
match m with
| 0 -> acc
| 1 -> acc+1 // remove this ?
| _ -> let r = m%2
helper (acc+r) (m>>>1)
helper 0 n
let hashtable = Array2D.create (num_cities-1) num_subsets false // hashtable.[i-1,n]=true if (sumbits n) = i
for k in 1..(num_subsets-1) do hashtable.[(sumbits k)-1,k]<-true
// member returns [(p,2^p);....] if the p+1th city is in S
let members S = [for j in 0..(num_cities-2) do let a= powers_of_2.[j] &&& S
if (a<>0) then yield (j,a)] // list length = num_cities-1
for m in 2..num_cities do // S size m
printfn "m=%A" m
let stopWatch = System.Diagnostics.Stopwatch.StartNew()
let mutable count = 1
let Subset_of_size_m = hashtable.[m-2,0..] |> Seq.mapi (fun i x -> (i,x)) |> Seq.filter (fun (a,b)-> (b=true)) |> Seq.map fst |> Seq.toList
for S in Subset_of_size_m do
if m = 2 then let (i,S') = (members S).Head
A.[S',i]<- (D.[0,i+1],-1y) // distance from 0 to city i+1
else
let S'_list = List.fold (fun acc x -> let a = (((snd x)^^^S_full)&&&S) // list of subsets of size m-1
if a = S then acc else (fst x,a)::acc ) [] (members S)
for (j,S') in S'_list do
A.[S,j] <- ([for (k,expk) in (members S') do
yield (fst A.[S',k]+D.[j+1,k+1],k) ]|> List.min |> fun (a,b)->(a,sbyte b))// can do faster than that if we remember from above ?
count <- count + 1 // to check progress in the console
if count%10000 =0 then printfn "count=%A" count
printfn "%f" stopWatch.Elapsed.TotalMilliseconds
// printfn "%A" A
// A.[num_subsets-1,0..]
A
let calculate_path_TSP_all_c_Dynamic_Programming_with_path (D:float32 [,]) =
// calls the core subroutine defined above
let A' = TSP_all_c_Dynamic_Programming_with_path_main D
// memory usage is not optimized at all....
// from here its smooth sailing, just analyzing the results.
let num_cities = Array2D.length1 D
let l2 = Array2D.length2 D
if (l2<>num_cities) then failwith "Distance matrix is not a squared matrix"
let powers_of_2 = [|1;2;4;8;16;32;64;128;256;512;1024;2048;4096;8192;16384;32768;65536;131072;262144;524288;1048576;2097152;4194304;8388608;16777216|]
let power2 k =
if ((k >= 25) || (k<0)) then raise (InnerError("power exponent not allowed"))
else powers_of_2.[k]
let num_subsets = power2 (num_cities-1)
let S_full = num_subsets-1
let A'' = A'.[S_full,0..]
let res' = [for i in 0..num_cities-2 do yield (fst A''.[i]+ example.[0,i+1]) ] |> Seq.mapi (fun i x -> (i, x)) |> Seq.minBy snd
printfn "TSP answer = %A" res'
let path = Array.create (num_cities+1) -1y
let mutable current_city = sbyte (fst res')
path.[num_cities-1]<- current_city// the last city
let mutable current_S = S_full
let mutable next_S = -2
let mutable next_city = -2y
for k in num_cities-2..-1..1 do
next_city <- snd A'.[current_S,int current_city]
next_S <- (S_full^^^powers_of_2.[int current_city]) &&& current_S
//printfn "k=%A current_S=%A next_city=%A" k current_S next_city
path.[k]<-next_city
current_S<-next_S
current_city<-next_city
for i in 0..num_cities do path.[i]<-path.[i]+1y
printfn "path=%A" path
////// main program /////
calculate_path_TSP_all_c_Dynamic_Programming_with_path dist_matrix
I haven't analyzed you code at all, but since you're targeting Debug/x86
configuration it means that you app:
Try changing it to something like Release/x64
(*). You can also tweak parameters of Mono's built-in garbage collector.
You can expect that the number of elements processed by your algorithm will decrease dramatically when data used by your program takes majority of memory available for a process. In such case, you program spends majority of time doing garbage collections (each GC can free only a small amount of memory because majority of objects is alive) and only a fraction of a time doing "real" computations (and computations require memory allocations which trigger GC).
The other thing that might be helpful when you're investigating memory usage is a memory profiler.
You mentioned that in R you are able to force garbage collection; in .NET you can also do that by calling GC.Collect()
, however, using it is discouraged in majority of cases (garbage collection is performed automatically when it is necessary).
*You need a compiler producing 64-bits assemblies and a 64-bit Common Language Runtime to run your code. Mono 2.10 on MacOS X doesn't support 64-bits unless it is built from sources:
Support for 64-bit VMs as of Mono 2.10 is only available if you build Mono from source code and install your own copy of the VM. In the future we will ship both mono and mono64 binaries for our users.
Newer versions have universal installers and I guess that they support 64-bits.