I have a list of functions and values I'd like to put into a nested list. I'd like the result to be a LISP style list(something that looks close to some LISP style executable code) that I can easily process later. This list comes from a "sentence", that gets split into a list by word - it keeps defined phrases (multi word) together, by checking any multi-word tokens in the DB first, gluing them and then separating them later. Any words not in the database are ignored. Punctuation is also ignored. It just matches actual stored Tokens. This lets me write a sentence that can get translated into functions I can process later. We can ignore the fact that these are functions, and just think of them as strings, since that's how they're stored, but functions and arguments describe the use case perfectly, since that's what they are. Functions will be the first item in a list, followed by their arguments in the same list. A function with no arguments will be in a list with only that one element. An argument that is itself a function will be in a list(with it's arguments, if any). The number of arguments is each function takes is preset, and won't vary (it could call other functions that take their own arguments though). Should technically be able to go infinitely deep, though I'm sure a few levels will suffice if limiting helps. This is a recursive type problem so depth level shouldn't really matter. If it helps, I'm using Django, and so have access to any model methods there, since Token is a Model, as are the sentences.
I'm calling the list items "Tokens". They can be more than one word. They're actually stored in a database. Each Token can have: symbol, val, kind Symbol: Pretty format string to search for in sentence Value: The thing we want in the list Kind: An integer; number of args or code for other kinds
KIND_CHOICES = [ (0, 'Argless Function'), (1, '1-Arg Function'), (2, '2-Arg Function'), (3, '3-Arg Function'), (4, '4-Arg Function'), (6, 'Value'), (7, 'Ignore'), ]
Let's use these Tokens for an example: ("Symbol",'val',kind)
("Walk",'walk',1) ("To",'to',1) ("Sandwich Shop",'sandwich-shop',6) ("Order",'place_order',2) ("Toasted",'toast',1) ("Sandwich",'sandwich',6) ("Dine-In",'here',0) ("Eat",'eat',1) ("Back",'back',1) ("Home",'residence',6) ("Nap",'sleep',0) ("on the sofa",7)
Here's an example sentence:
Walk To the Sandwich Shop, Order your favorite Toasted Sandwich for Dine-In, Eat your Sandwich, Walk Back Home, then finally Nap on the sofa.
The first list I'll end up with from my current working cleanup functions gives us:
['walk','to','sandwich-shop','place_order','toast','sandwich','here','eat','sandwich','walk','back','residence','sleep']
then, finally (the part I just can't get right! I'm off by one, get duplicates, missing tokens, or the wrong structure)
[['walk',['to','sandwich-shop']],['place_order',['toast','sandwich'],['here']],['eat','sandwich'],['walk',['back','residence']],['sleep']]
Some of my attempts have involved using a repeated placeholder string for the arguments, with various replace_or_append implementation attempts; inserting empty elements in a list for arguments, then using a put_in_next_empty_spot implementation(a few tries at that); and some simpler looping with an incremented index and pop. I'm just stuck on this for some reason and could use some brainpower on what seems like it should be a pretty simple problem to solve.
Here's some example code from one terribly failed attempt:
def nest(self, token, l, idx):
if token.is_func:
result = [token.val]
if token.arg_count:
for i in range(token.arg_count):
idx += 1
f = self.funcs.pop(idx)
t = self.get_token(f)
result = self.nest(t,result,idx)
l.append(result)
else:
l.append(token.val)
return l
def nestify(self):
nested = []
for i, s in enumerate(self.funcs):
token = self.get_token(s)
nested = self.nest(token,nested,i)
return nested
I've used some convenience methods I added to Token to check it's .kind property(the integer), to get the Token object, etc. [self.funcs] here is the first list of functions mentioned above(cleaned up flat list).
Original Sentence:
"Walk To Sandwich Shop Dine-In Nap"
Becomes (this works, and is what we want):
['walk', 'to', 'sandwich-shop', 'here', 'nap']
Which SHOULD Become:
[['walk', ['to', 'sandwich-shop']], ['here'], ['nap']]
But unfortunately, I'm getting:
['walk', ['to', ['here'], ['nap']], 'sandwich-shop']
To help clarify the way the output should look, when I process the entire list, I'll say 'the first item in the list is a function'. This way if it's a single-item list, it can just call that function, and if it's a multi-item list, we know the remaining items (or lists of items) are the arguments to that function.
Using our desired output list from the longer sentence example above:
[['walk',['to','sandwich-shop']],['place_order',['toast','sandwich'],['here']],['eat','sandwich'],['walk',['back','residence']],['sleep']]
If we take the very first item (a list):
['walk',['to','sandwich-shop']]
We'd process the 'walk' function passing in one arg: the result of evaluating the function 'to', passing it the argument 'sandwich-shop'.
If we take the second item (another list):
['place_order',['toast','sandwich'],['here']]
We'd call 'place_order' passing it the two args it wants: the first will be the result of evaluating the 'toast' function, passing 'sandwich' to 'toast'. The second arg to 'place_order' will be the result of calling/evaluating 'here'.
Hopefully this makes sense. :)
Any help here would be so very appreciated! While I'm new here, I've read the intro to posting a question and have hopefully provided everything needed.
Thanks!
Joy
To build the nested lists based on the argument specifications, you can use recursion with collections.deque
. By using the reference to the deque
passed to nest_tokens
, you can mutate the tokenized result in-place by popping off the number of arguments required for a "function":
from collections import deque
def tokenize(s, tokens):
while s:
if (o:=[(a, b) for a, *b in tokens if s.startswith(a)]):
j, k = max(o, key=lambda x:len(x[0]))
yield k
s = s[len(j):]
else:
s = s[1:]
def nest_tokens(tokens, to_d = None):
c = 0
while tokens and (to_d is None or c < to_d):
if (n:=tokens.popleft())[1] == 6:
yield n[0] #found a value, yield result
else:
#found an argument token, yield back as list if empty, otherwise -
#recursively call "nest_tokens" on "tokens" to produce the arguments
yield [n[0]] if not n[1] else [n[0], *nest_tokens(tokens, n[1])]
c += 1
def nest_tokenized(s, tokens, choice):
c = dict(choice)
t = [(*i, l) for i in tokenize(s, tokens) if (l:=c.get(i[-1])) != 'Ignore']
return list(nest_tokens(deque(t)))
KIND_CHOICES = [(0, 'Argless Function'), (1, '1-Arg Function'), (2, '2-Arg Function'), (3, '3-Arg Function'), (4, '4-Arg Function'), (6, 'Value'), (7, 'Ignore')]
tokens = [("Walk",'walk',1), ("To",'to',1), ("Sandwich Shop",'sandwich-shop',6), ("Order",'place_order',2), ("Toasted",'toast',1), ("Sandwich",'sandwich',6), ("Dine-In",'here',0), ("Eat",'eat',1), ("Back",'back',1), ("Home",'residence',6), ("Nap",'sleep',0), ("on the sofa",7)]
s = 'Walk To the Sandwich Shop, Order your favorite Toasted Sandwich for Dine-In, Eat your Sandwich, Walk Back Home, then finally Nap on the sofa.'
s1 = "Walk To Sandwich Shop Dine-In Nap"
print(nest_tokenized(s, tokens, KIND_CHOICES))
print(nest_tokenized(s1, tokens, KIND_CHOICES))
Output:
[['walk', ['to', 'sandwich-shop']], ['place_order', ['toast', 'sandwich'], ['here']], ['eat', 'sandwich'], ['walk', ['back', 'residence']], ['sleep']]
[['walk', ['to', 'sandwich-shop']], ['here'], ['sleep']]