Search code examples
pythonpython-3.xrobotframework

Retrieve all file names in folders/sub-folders in Python / RobotFramework


Please bear with me!

So I have a site that I'm currently trying to retrieve File Names from. We will use Example.com as the site name for example...

I am using Robot Framework to log into the site and then proceed into the correct destination where the Folders / Files Names live. My automation needs to go to the correct Folder where this file lives and then modify changes if needed.

At this point, I'm at the path example.com/applications/folders/ which shows me the following:

Folder1
Folder2
etc...

So now, I was planning to type out a Python script to iterate through each folder / sub-folder and grab the file names BUT, when I click under the Network Tab and then played around with the Request URL, it seems like it might be an undocumented API but I may be completely wrong. The Request URL looks something like this:

http://example.com/exp/api/app/objects/appFolder/?brief=true

Since I noticed the keyword "api" in the request URL, is it safe to assume that there's an API that I can utilize? Working with this site and wasn't given an API documentation so I have no clue what I'm doing in terms of correct REST calls.

Just to add, I'm not too familiar with the content inside the Network Tab (so anything to do with Name, Status, Type, Initiator, Size, etc) is all new to me and would appreciate a link or anything that helped you understand what in the world is going on when you click anywhere.

Anyways, is there a method using Robot Framework I can use to tackle this? I have an idea on what to do but not sure what approach to take in terms of retrieving the File Names.

Thanks guys.


Solution

  • This seems more like a penetration testing (hacking) related questions to be honest. That being said, I would like to make a disclaimer, that you do this only on sites you manage or have explicit permission to do so.

    What you seem to be aiming at is a way to scan for files at a website. I am not familiar with Robot Framework but did dabble in Selenium, but there already exists tools for this kind of thing, mostly based on brute force scans or dictionary based scans as some discussed here. Another resource is this website that talks about specific python packages and their pen test uses.

    All that being said, from my short search as Robot Framework is similar to Selenium, you could theoretically do the same thing those other tools do. Mainly login and use a created dictionary or brute force one to go though all the possible filenames, and if you find one put it in a dictionary/list.

    Another approach is to intercept the communication and try understand the API though Wireshark and local network tab (like in Firefox or Chrome). And try common API restful patterns (ways to retrieve items vs lists, e.g. items/<item_id> etc.). This can be combined with previous approach of course.

    Another recommended and ethical approach is just to directly contact the company in question (highly recommended) and ask for API instructions. I would assume http://example.com/exp/api/app/objects/appFolder/?brief=true request probably contains the path folder that your trying to get in the POST request. All that being said, without a specific website, it is impossible to know exactly what would be a solution to your question.

    TLDR; Recommend asking the company/website or doing some detective work on the API post request, without a specific link can't be more precise.