Hi guys im passing from Python3 to Go so im trying to rewrite a lib ive created to get a better performance.
im facing a problem due to the fact that im noob in Golang XD, im using a limited API to download hundreds of jsons and i want to use as less as possible requests when i can. so while downloading those jsons some of the URLs used are duplicated and the first idea i got is passing a map[stringLink]*myJsonReceived between my downloading functions ( goroutines ) and each goroutine before downloading checks if the link is already being processed by another one, so instead of requesting it again and waste Bandwidth + API Calls it should just wait for the Other goroutine to finish downloading it and get it from the dictionary.
I have few options :
1) the goroutine have to check if the link is within the map if so,it checks every 0.05s if the Pointer within the dictionary is still nil or contains the json. ( probably the badest way but it works )
2) change the map passed between goroutines to (map[stringlink]chan myjson) its probably the most efficient way but i have no idea how to send a single message to a channel and receive it by multiple awaiting Goroutines.
3) i can use the Option (2) by adding a counter to the struct and each time a goroutine founds that the url is already requested, it just add +1 to the counter and await the response from the channel,when the downloading goroutine completes it will send X messages to the channel. but this way will make me add too much LOCKs to the map which is a waste of performance.
Note: i need the map at the end of all functions execution to save the downloaded Jsons into my database to never download them again.
Thank you all in advance for your help.
What I would to to solve your task is I would use a goroutine pool for this. There would be a producer which sends URLs on a channel, and the worker goroutines would range over this channel to receive URLs to handle (fetch). Once a URL is "done", the same worker goroutine could also save it into database, or deliver the result on a result channel for a "collector" goroutine which could done the save sequentially should it be a requirement.
This construction by design makes sure every URL sent on the channel is received by only one worker goroutine, so you do not need any other synchronization (which you would need in case of using a shared map). For more about channels, see What are golang channels used for?
Go favors communication between goroutines (channels) over shared variables. Quoting from Effective Go: Share by communicating:
Do not communicate by sharing memory; instead, share memory by communicating.
For an example how you can create worker pools, see Is this an idiomatic worker thread pool in Go?