checking whether a value of a variables belong to a set bootstrap

I have an array of integer say

theIndex = [ 1 2 6 7 17 2]

I have a dataframe with one column dataset[:id] containing integers say

dataset = DataFrame(id=[ 1, 1, 2, 2, 3, 3, 3, 4, 4, 4])

I want to select all observations in dataset that belongs to the index. and if they appear twice (or more) in the index, I want to select them twice (or more)

At the moment, I am doing it the dumb way.

theIndex = [ 1 2 6 7 17 2]
dataset = DataFrame(id=[ 1, 1, 2, 2, 3, 3, 3, 4, 4, 4])
dataset2 = DataFrame(id=Int64[])
for ii1=1:size(theIndex,2)
    for ii2=1:size(dataset[:id],1)
        any(i->i.==dataset[ii2,:id],theIndex[ii1]) ? 
        push!(dataset2,dataset[ii2,:id]) : nothing
    end
end

any more elegant solution?

Solution

Essentially, the question wants to calculate a SQL JOIN between theIndex and dataset. Unfortunately, this functionality is not fully implemented internally by DataFrames. So here is a quick (and efficient) simulation of a JOIN for this purpose:

using DataStructures

sort!(dataset, cols=:id]
j = 1
newvec = Vector{Int}() 
for (val,cnt) in SortedDict(countmap(theIndex))
    while j<=nrow(dataset)
        dataset[j,:id] > val && break
        dataset[j,:id] == val && append!(newvec,fill(j,cnt))
        j += 1
    end
end
dataset2 = dataset[newvec,:]

DataStructures package is used for the SortedDict. This implementation is should be more efficient than other multi-loop approaches.