I'm working primarily with Python on a Database with 17.000.000 records for 2.800.000 unique ID's. Each ID represents an event of a shipping process, and all instances have: ID, EVENT, TIMESTAMP (Datetime). There is an event that usually but not always starts the sequence and there are multiple outcomes, i.e delivered, returned, etc.
My goal in this is to find the most common path each ID takes, meaning, what is the order of events that occur and get the bottlenecks in the process.
Is there any visualization tool I can use with Python that has this structure built into it? How would you recommend I approach this issue?
Thank you
This is not an answer for Python, but given the question, I think there is a better way to perform such analysis.
I do not know if you have ever heard of Process Mining. But I think it is the perfect thing to use in your case. Basically, process mining consists of analyzing the flow of a process. I have worked with differen tools, some of them are:
Basically, all you need to do is define what is the ID (you already have it), then choose which columns represents the timestamp (you already have it) and the name of the event (you also have it).
Any of those tools will be able to give you back a perfect analysis of your flow. Which is the most common path, the average time for each event, etc. If you add more attributes, it can even return what is the reason to follow one path or another based on those attributes.