Search code examples
gochannelgoroutine

Why am I receiving additional elements through channel?


I've bumped in to a peculiar problem which I unfortunately haven't been able to reproduce in a minimal working example. I'll try to explain it and hopefully you can give me some hint at least.

I have two protocols: A & B. For each protocol there is one central party p1, and three outer parties, lets call those pn. Each party is implemented as a separate goroutine.

Protocol A is as follows:

  1. All parties perform a calculation, separately, and send their result of type *big.Int to p1.
  2. p1 receives all results and puts them in a slice which it sends back to each party pn.
  3. All parties receive the slice and perform a new calculation based on it, and send their result of type *DecryptionShare to p1.
  4. p1 receives all data and calculates a result.
  5. All parties output a result *big.Int.

To help with this I have three channels, one that is used for sending data p1 -> pn, one for pn -> p1 and one to output final results back to main thread (e.i. all pn read from and write to the same channels). The result 1. and 3. from pn is of different types though so that channel type is interface{}.

Protocol B first initiates protocol A and then perform further calculations, which are irrelevant.

Here's to my problem:

Protocol A on it's own works without ever showing problems. But, when I call B ~10 % of the runs, it panics in A, even though the only thing differing is B passing on the input parameters to A.

The error showing is

panic: interface conversion: interface {} is *big.Int, not *DecryptionShare

implying that p1 receives a *big.Int while it is at step 4, although it already received every parties *big.Int in step 2.

I have tried staying at step 2 a while longer using time.Sleep and select but I never get an additional *big.Int at that step, it only occasionally shows up at step 4.

If I instead of chan interface{} use two seperate channels chan *big.Int and chan *DecryptionShare protocol B terminates correctly which also implies that everything is read correctly from channels (e.i. no thread is left blocked). I was hoping to avoid this though as I already have numerous channels in play.

Does anyone have any ideas on why this panic occurs?

EDIT: Here's a minimal working example that doesn't produce the error though. Hopefully it can gain some insights. *DecryptionShare is replaced by int.

package tpsi

import (
    "math/big"
    "fmt"
    "crypto/rand"
    "testing"
)

type DecryptionShare struct {
    index int
    p *big.Int
}

func TestErs(t *testing.T) {
    message_channel1 := make(chan interface{})
    message_channel2 := make(chan []*big.Int)
    return_channel := make(chan *big.Int)
    n := 4
    go CentralParty(n, message_channel2, message_channel1, return_channel)
    for i := 1; i < n; i += 1 {
        go OtherParty(message_channel1, message_channel2, return_channel)
    }

    for i := 0; i < n; i += 1 {
        fmt.Println(<-return_channel)
    }
    t.Error("for display")
}

func CentralParty(n int, sender_channel chan<- []*big.Int, reciever_channel <-chan interface{}, return_channel chan<- *big.Int) {
    p, err := rand.Prime(rand.Reader, 256)
    if err != nil {panic(err)}

    all_p := make([]*big.Int, n)
    all_p[0] = p

    for i := 1; i < n; i += 1 {
        all_p[i] = (<-reciever_channel).(*big.Int)
    }

    for i := 1; i < n; i += 1 {
        sender_channel <- all_p
    }

    shares := make([]*DecryptionShare, 4)
    for i := 1; i < n; i += 1 {
        shares[i] = (<-reciever_channel).(*DecryptionShare)
    }

    return_channel <- shares[1].p

}

func OtherParty(sender_channel chan<- interface{}, reciever_channel <-chan []*big.Int, return_channel chan<- *big.Int) {
    p, err := rand.Prime(rand.Reader, 256)
    if err != nil {panic(err)}
    
    sender_channel <- p

    all_p := <-reciever_channel

    var ds DecryptionShare
    ds.p = p
    ds.index = all_p[0].BitLen()
    sender_channel <- &ds

    return_channel <- p

}

Solution

  • From the joint pressure of several commenters I forced myself to obtain a MWE. And as suggested by @oakad, I found the bug while doing so.

    The error was (unsurprisingly) from protocol B which reused the chan interface{} once again sending the first data type *big.Int, and thereby introducing a race condition.

    I completely neglected to consider race conditions across protocols.

    Thank you for the comments!