Please consider the below simple code:
thrust::device_vector<int> positions(6);
thrust::sequence(positions.begin(), positions.end());
thrust::pair<thrust::device_vector<int>::iterator, thrust::device_vector<int>::iterator > end;
//copyListOfNgramCounteachdoc contains: 0,1,1,1,1,3
end.first = copyListOfNgramCounteachdoc.begin();
end.second = positions.begin();
for(int i =0 ; i < numDocs; i++){
end= thrust::unique_by_key(end.first, end.first + 3,end.second);
}
int length = end.first - copyListOfNgramCounteachdoc.begin() ;
cout<<"the value of end -s is: "<<length;
for(int i =0 ; i< length ; i++){
cout<<copyListOfNgramCounteachdoc[i];
}
I expected the output to be 0,1,1,3 of this code; however, the output is 0,1,1. Can anyone let me know what I am missing? Note: the contents of copyListOfNgramCounteachdoc
is 0,1,1,1,1,3 . Also the type of copyListOfNgramCounteachdoc
is thrust::device_vector<int>
.
EDIT:
end.first = storeNcCounts.begin();
end.second = storeCompactedPositions.begin();
int indexToWriteForIndexesarr = 0;
for(int i =0 ; i < numDocs; i++){
iter = end.first;
end = thrust::unique_by_key_copy(copyListOfNgramCounteachdoc.begin() + (i*numUniqueNgrams), copyListOfNgramCounteachdoc.begin()+(i*numUniqueNgrams)+ numUniqueNgrams,positions.begin() + (i*numUniqueNgrams),end.first,end.second);
int numElementsCopied = (end.first - iter);
endIndex = beginIndex + numElementsCopied - 1;
storeBeginIndexEndIndexSCNCtoRead[indexToWriteForIndexesarr++] = beginIndex;
storeBeginIndexEndIndexSCNCtoRead[indexToWriteForIndexesarr++] = endIndex;
beginIndex = endIndex + 1;
}
I think what you want to use in this case is thrust::unique_by_key_copy
, but read on.
The problem is that unique_by_key
is not updating your input array unless it has to. In the case of the first call, it can return a sequence of unique keys by just dropping the duplicate 1
-- by moving the returned iterator forward, without actually compacting the input array.
You can see what is happening if you replace your loop with this one:
end.first = copyListOfNgramCounteachdoc.begin();
end.second = positions.begin();
thrust::device_vector<int>::iterator iter;
for(int i =0 ; i < numDocs; i++){
cout <<"before ";
for(iter = end.first; iter != end.first+3; iter++) cout<<*iter;
end = thrust::unique_by_key(end.first, end.first + 3,end.second);
cout <<" after ";
for(iter = copyListOfNgramCounteachdoc.begin(); iter != end.first; iter++) cout<<*iter;
cout << endl;
for(int i =0 ; i< 6; i++) cout<<copyListOfNgramCounteachdoc[i];
cout << endl;
}
For this code I get this output:
before 011 after 01
011223
before 122 after 0112
011223
You can see that the values in copyListofNgramCounteachdoc
are not changing. This is valid behavior. If you had used unique_by_key_copy
instead of unique_by_key
then Thrust would have been forced to actually compact the values in order to guarantee uniqueness, but in this case since there are only two values in each sequence, there is no need. The docs say:
The return value is an iterator new_last such that no two consecutive elements in the range [first, new_last) are equal. The iterators in the range [new_last, last) are all still dereferenceable, but the elements that they point to are unspecified. unique is stable, meaning that the relative order of elements that are not removed is unchanged.
If you use unique_by_key_copy
, then Thrust will be forced to copy the unique keys and values (with obvious cost implications), and you should see the behavior you were expecting.
BTW, if you can do this in a single call to unique_by_key
rather than doing them in a loop, I suggest that you do so.