Every corner of Julia's documentation is filled with reminders to "avoid global scope variables". But I fail to see how this could be beneficial even in some of the most common data analysis scenarios, probably due to a misunderstanding regarding how Julia's compiler works.
For example, one function I use checks whether each token of a document belongs to a huge lexicon of acceptable tokens. Currently, I use something like this:
using CSV, DataFrames
accepted_tokens = @chain begin
CSV.read("accepted_tokens.csv", DataFrame)
Set{String}(_.tokens)
end
function redact_document(doc::String)
tokens = split(doc, " ")
redacted_tokens = [token in accepted_tokens ? token : "REDACTED" for token in tokens]
return join(" ", redacted_tokens)
end
Now, since redact_document
is the only function the uses accepted_tokens
I of course could just assign the variable inside the function, like this:
function redact_document(doc::String)
accepted_tokens = @chain begin
CSV.read("accepted_tokens.csv", DataFrame)
Set{String}(_.tokens)
end
tokens = split(doc, " ")
redacted_tokens = [token in accepted_tokens ? token : "REDACTED" for token in tokens]
return join(" ", redacted_tokens)
end
The reason I don't do this is that it seems to me that in this case accedted_tokens
would need to be assigned each time redact_document
is called, which seems like a total waste of time, given that I'd have to read a huge file from disk every time, instead of creating/assigning the variable just once (albeit in the global scope). I also don't want to declare accepted_tokens
as a constant, since I might want to tweak the lexicon as I develop my script.
Am I right on my reading of the code? Or, as I suspect, the compiler is smarter than what I take it to be, and I should still be wrapping my variables within the functions that use them?
While all has been said in comments, just for cleanness your code should look like this (you should pass accepted_tokens
as argument rather than to use a global variable):
function redact_document(doc::AbstractString, accepted_tokens::AbstractSet{<:AbstractString})
tokens = split(doc, " ")
redacted_tokens = [token in accepted_tokens ? token : "REDACTED" for token in tokens]
return join(" ", redacted_tokens)
end
The type declarations for function arguments are optional (do not affect performance), but if you use the usually it is better to use their abstract counterparts.