Search code examples
pythonhtmlpython-3.xserverlogic

What is user_input = post_data.split('=')[1] present in the following section of code?


I am working on a piece of python code to handle the POST request method from an html sever. The block of code below was provided to me in the answer to a question I asked earlier. I've been going through it line by line to make sure that I understand the logic at play here.

But I'm hitting a snag on this line in particular.

user_input = post_data.split('=')[1]

So I'm going to outline what I think the logic is doing then provide the actual code, and I'm hoping someone could correct me where appropriate to make sure I actually understand the logic correctly. And furthermore explain why the line of code indicated above exists at all, I don't understand why the split method is needed in this case.

Could post_data not simply be taken as is and printed out instead? Or would that cause problems for some reason?

This top section here I am providing for context, as it's the rest of the program.

# Python 3 server example
from  http.server import BaseHTTPRequestHandler, HTTPServer 
import time #Why is time imported if it's not used? Hypothesis: the send response method on line 10 states among other things to send the current date. Thus time is needed to determine current date?

hostName = "localhost"
serverPort = 8080

class MyServer(BaseHTTPRequestHandler): 
    def do_GET(self): 
        self.send_response(200)
        self.send_header("Content-type", "text/html")
        self.end_headers()
        self.wfile.write(bytes("<html><head><title>https://pythonbasics.org</title></head>", "utf-8"))
        self.wfile.write(bytes("<p>Request: %s</p>" % self.path, "utf-8"))
        self.wfile.write(bytes("<body>", "utf-8"))
        self.wfile.write(bytes("<p>Hello world! This is a webpage!</p>", "utf-8"))
        self.wfile.write(bytes("<p> And hello to you! Please enter your name below!</p>", "utf-8"))
        self.wfile.write(bytes("""<form action="/" method="post">
                                   <label for="name">Please enter your name:</label><br>
                                   <input type="text" id="name" name="name"><br>
                                   <input type = "submit" value = "Click me!">
                                </form>  
                              """, "utf-8" )) 
        self.wfile.write(bytes())
        self.wfile.write(bytes("</body></html>", "utf-8"))

This section below is the portion of the code I am wanting to verify my understanding of. What I think is happening here logically and some additional questions have been included in comments within the code.

def do_POST(self): 
        content_length = int(self.headers['Content-Length']) 
        #The content-string above is an html header responsible for declaring the length of the text being passed to the server via the POST method. 
        #However, Content-Length is never declared a value that I can tell, not here or in the rest of the program. So, does HTML then by default take the entire document provided by the POST method if Content-Length is not assigned any value?
        # It is then coerced into an int and passed to the content_length variable. But what is the purpose of the ".headers" method?
       
        post_data = self.rfile.read(content_length).decode('utf-8') 
        #utf-8 is the unicode format for encoding/decoding the given text. .read is being passed the length of the message and thus is reading the entire message. The message is then passed to .rfile, I am not sure why this is, why is the standard .read method not sufficent?
        
        user_input = post_data.split('=')[1] 
        #I really have no idea why this the split method is needed at all. Could post_data not be used as it is?

        self.send_response(200) 
        #Sends the webpage the 200 response code indicating the server processed the request correctly.
        self.send_header('Content-type', 'text/html')
        #Declares that the data about to be sent is of the type text, and is written in HTMl
        self.end_headers()

        self.wfile.write(bytes("html>head>title>https://pythonbasics.org</title>/head>", "utf-8"))
        #I don't understand the use of the bytes class in these lines. I'm assuming that html needs information passed to it to be encoded into bytes for the transfer? This class does so?
        self.wfile.write(bytes("<body>", "utf-8"))
        self.wfile.write(bytes(f"<p>Hello {user_input}!</p>", "utf-8"))
        #At the beginning of this string after the bytes class ther is a single "f" present. Why is this? Is it something to do with html coding or python? Also am I right in thinking that {user_input} is the syntax in HTML for inserting a variable?
        self.wfile.write(bytes("</body></html>", "utf-8"))

Solution

  • One of the many things you develop when working as a software developer is the ability to try things out. What's the worst thing that can happen? Errors, errors everywhere (read it with Buzz Lightyear's voice if you get the reference)...

    Do we enjoy running into bugs and errors? Well, usually not... but we're not always sad either. An error is an opportunity to learn by cleaning your code, creating useful logs, learning about a new package functionality, etc.

    My point here is: do not be afraid to test. If you're facing someone else's code and you're unsure of what something does or why it's even there, don't be scared to remove it and rerun the code!

    For example... you've noticed that the time package is imported but not used. You can remove it and see what happens! (Spoiler: It works just fine. That package wasn't needed for your code to run at all)

    Another useful ability is to search and read package documentation. Yeah, it's boring sometimes, but in most scenarios, it will save your day and even teach you a new thing or two.

    For example... you've asked about the use of bytes inside the write method. Well, first, you need to understand that your server is based on a built-in class from the http package called BaseHTTPRequestHandler. More precisely, your class inherits from BaseHTTPRequestHandler, which means they both have the same attributes and methods.

    (Tip: if you don't understand the concept of inheritance maybe you should take a step back and reinforce the concepts of Object-Oriented Programming before diving into examples such as this webserver.)

    It's always a good idea to look at the docs when inheriting from an already coded class. I'll save you a few clicks and put the link directly here.

    There you can find information about the attribute wfile. It's said that

    wfile
    Contains the output stream for writing a response back to the client. Proper adherence to the HTTP protocol must be used when writing to this stream in order to achieve successful interoperation with HTTP clients.

    Changed in version 3.6: This is an io.BufferedIOBase stream.

    The docs are actually redirecting us to another documentation, which is fine. It often happens. We now need to know what an "io.BufferedIOBase stream" is. Or, more specifically, we need to know what its write method does.

    The docs are pretty straightforward and give us the answer in the first sentence:

    write(b, /)
    Write the given bytes-like object, b, and return the number of bytes written (always equal to the length of b in bytes, since if the write fails an OSError will be raised). [...]

    The puzzle has been solved. The write method needs a "bytes-like" object as input, so that's why we use bytes all around when calling self.wfile.write.

    These two skills (testing and reading docs) are core tools for you to develop to understand better how things work in the software-developing world. I felt that it was important to give you this background before going to your specific question. But now lets go

    Why the need of split?

    I'll take some time here to give you an overview of some other topics as well. I'm no expert, but hopefully, you'll find this interesting.

    When using HTTP forms, it's useful to understand how the information passed by the user is sent to the server. To do this, you can run your server code, access its page link by any browser, and then open the "devtools" (check how to do it on this link). And finally, open the "Network" tab

    You'll see it's empty at first, but then just test your code: input your name and click the button. Now, a few requests may have appeared.

    Look for the POST request, since it's the one called by the button-click event. How do we know that? In your code the <form> tag has the following options <form action="/" method="post">, which says the server to run a method of type POST when that button is clicked.

    After selecting the POST request look for its headers. It has a lot of them, but only one is useful right now: the "Content-Type". It states how the input data from the user will be written in the request. See the print screen below

    A piece of the POST request header

    The application/x-www-form-urlencoded states that the information we'll be written as name-value pairs written like name1=value1&name2=value2&name3=value3 etc. (There are some answers over SO that talk about this, such as this one)

    This starts to ring a bell, doesn't it? But lets dive some more. You can see the actual information sent to the server on the "Payload" tab. I've printed it for you as well

    The POST request payload

    You can guess by that I wrote a single X and clicked the button. But see, this is the information the browser is sending. The name of the property is "name" because of the option inside the label tag of your form. If you change it to something like <label for="banana"> we would see banana=X in the browser.

    That's how the browser sends the information to the server. From the server side, you need to read it somehow. This is achieved by the rfile.read method.

    Now you know the path to the docs you'll understand that this method needs an integer for the number of bytes to be read. Happily, this number of bytes is sent in the request header under the key "Content-Length".

    You can check at the first print that in my case, the "Content-Length" was 6: the name of the property (name) is 4 bytes long; the = sign is 1 byte long; and my answer (X) is also 1 byte long. That's why the Content-Length is 6.

    By now you can probably figure everything out:

    • You need the content_length variable to get the size of the POST payload in number of bytes
    • When reading the payload you get the information like a "name/value" pair separated by a =
    • Since you want to print only "Hello {user_answer}" you need only the value, therefore you split the payload name=user_answer and voilà.

    This was a pretty long explanation of the "why" the code was written like this. I often enjoy this journey of reading docs, but it can be boring, I know.

    Another way to understand the "why" would be to change your code to print the entire post_data without the split. Then you would see the name=answer printed out, and you'd probably figure everything out by yourself.

    I hope you've learned a thing or two. Feel free to ask more if you like. Happy coding to you (: