java multithreading parallel-processing linear-search

creating more threads in JAVA to parallelize linear search consumes more time

I have written a java program to paralleise Linear search algorithm,where input:100000 random nos and N=no of threads,where each thread takes 100000/N no of elements and perform linear search

I expected time taken for execution to decrease with increase in N, but instead execution time is increasing!

why? model used:PARALLEL SEARCHING ALGORITHM IN CRCW SHARED MEMORY SIMD MODEL

import java.io.*;
import java.util.*;
import java.util.Random;

class data
{
static int n;
static int N;
static int[] arr;
static int result;
static int eachProcSize;
static int item;
static void declare(int size,int noOfProcessors,int element)
{
n=size;
item=element;
N=noOfProcessors;
arr=new int[size];
eachProcSize=(int)Math.ceil((double)n/N);
result=0;
}
}


class sharedMemory
{
static int n;
static int ar[];
static int flag;

static void declare(int size)
{
n=size;
ar=new int[n];
flag=0;
}
static synchronized void write(int index,int value)
{
  ar[index]=value;
}
static synchronized void setFlag()
{
flag=1;
}

}
class monitorThread extends Thread
{
int i;
monitorThread(int index)
{
i=index;
}
public void run()
{
//if any processor has found element then it sets the flag
if(sharedMemory.ar[i]==1) sharedMemory.setFlag();
}
}

class processor extends Thread
{
int i;
int size;
int n=data.n;
int N=data.eachProcSize;
int[] arr;

processor(int pos )
{
 i=pos;
 size=data.eachProcSize;
 arr=new int[size];
 for(int j=0;j<size && (i*N+j)<n;j++)
 {
  arr[j]=data.arr[i*N+j];
 }

}

public int linearSearch(int item, int[] list) {

int i;
for(i=0;i<list.length;i++)
if(list[i]==item) return 1;

return 0;
}


public void run()
{
System.out.println("processor "+ i + " started  and has started reading from "+ arr[0]) ;



System.out.println("processor "+ i + " is performing linear search ..");
if(  linearSearch(data.item,arr)==1)sharedMemory.write(i,1);
else sharedMemory.write(i,0);
}
}



public class CRCW_SMSIMD_SEARCH{

public static void main(String args[])
{
Scanner in =new Scanner(System.in);
int i,j,f=0;

System.out.println("Enter no. of elements") ;
int n=in.nextInt();

System.out.println("Enter no. of processors") ;
int N=in.nextInt();

System.out.println("Enter the element you want to find..");
int item = in.nextInt();

data.declare(n,N,item);

System.out.println("each proc size is"+ data.eachProcSize) ;


//start time



sharedMemory.declare(N);

System.out.println(n+" elements are randomly generated :");

//random data generation

  Random ran = new Random();
for(i=0;i<n;i++)
   {
      data.arr[i]=ran.nextInt(10000000);
   }



Thread t[]=new Thread[N];
System.out.println("starting "+N + " processors and reading data....\n");


//long start = System.currentTimeMillis();

for(i=0;i<N;i++)
 {

     t[i]=new processor(i);

 }
  Thread[] monitor=new Thread[N];
  for(i=0;i<N;i++)
    monitor[i]=new monitorThread(i);

   long start = System.currentTimeMillis();

  for(i=0;i<N;i++)
   t[i].start();



 try{
  for(i=0;i<N;i++)
   t[i].join();
  }catch(Exception e){e.printStackTrace(); }




  for(i=0;i<N;i++)    //continously monitor the shared memory to check if any processor has written 1;
    monitor[i].start();


  long stop = System.currentTimeMillis();
  long timeTaken = stop- start;


  if(sharedMemory.flag==1)
  System.out.println("\nThe element was successfully found..");
  else
  System.out.println("\nThe element was not found");

  System.out.println("\n\n time taken(in ms) by "+N+"processors is: "       +timeTaken);



  }
}

Solution

The most likely reason for this is a decrease in cache performance.

Cache exploits locality of reference by reading ahead through the memory . The result of this process is that the cache has data that your program is going to request before your program requests it, improving the performance.

When your program searches at N addresses at the same time, cache tries to read ahead at all these locations. However, each thread has a much smaller number of items in cache, so the number of cache misses that need to be served at the same time goes up. Memory has limited bandwidth, so when one thread gets its cache filled, the remaining N-1 threads wait for memory, making no progress.