Search code examples
pythonnumpyrefactoring

can't we vectorize code with nested loops to update matrix values


I wrote a piece of code but I am not sure if we can get rid of the loops and vectorize it to make it faster. Can you please give suggestions? I am just updating the co-occurence matrix .

 M = np.zeros((num_words,num_words))
    word2Ind = {words[i]:i  for i in range(len(words))}

    for document in corpus:
        for i,word in enumerate(document):
            for j in range(i - window_size ,i + window_size + 1):
                if i != j and j >= 0 and j <= len(document) - 1:
                    M[word2Ind[document[i]],word2Ind[document[j]]] += 1

Solution

  • You could at least, since the only thing you use word2ind for is in pieces word2int[document[?]] start with computing index for your document once for all, and then work from those index

    M = np.zeros((num_words,num_words))
    word2Ind = {words[i]:i  for i in range(len(words))}
    
    for document in corpus:
        IX=[word2Ind[d] for d in document]
        for i,word in enumerate(document):
            for j in range(i - window_size ,i + window_size + 1):
                if i != j and j >= 0 and j <= len(document) - 1:
                    M[IX[i], IX[j]] += 1
    

    It becomes then easier to slighly vecorize

    M = np.zeros((num_words,num_words))
    word2Ind = {words[i]:i  for i in range(len(words))}
    
    for document in corpus:
        IX=np.array([word2Ind[d] for d in document], dtype=np.uint32)
        for j in range(1 , window_size + 1):
            if j==0: continue
            M[IX[:-j], IX[j:]] += 1
            M[IX[j:], IX[:-j]] += 1