c++graph adjacency-matrix undirected-graph

Question about adjacency matrix implementation

I am having issues with a graph implementation using an adjacency matrix. As a little background, I have to read from a file each line containing an actor, a movie they played in, and the year it was made. My job is to create a graph from the file. The graph would then consist of vertexes of actors whose edge relation is that they starred in the same film together. In the file, the actor is the vertex, the movie and year it was published is the edge. I have found a way to store the vertexes in an std::vector and have also created a struct called MovieInformation that stores a string movieName and int movie year. These are also stored in a vector< pair< int,MovieInformation>>.

However, I don't know how to insert the information into the vectors in a way such that the information of the actor and the movie they starred stays together.

An example input from the file would be:

Alex Weir   Stop Making Sense   1984
Steven Scales   Stop Making Sense   1984
Ruben Blades    Disorganized Crime  1989
Hoyt Axton  Disorganized Crime  1989
Fred Gwynne Disorganized Crime  1989

while the output would be

(actor)--[movie#@year]-->(actor)--...

As for the code part, there isn't much written except for what was explained above.

Solution

First off, you'd need to simplify the vertex identification process. I'd suggest enumerating them by storing them in an unordered map with the actor names as string keys and their corresponding indices as values. That would yield O(1) expected time to access the index of an actor. Then, you could form sets movies which contain the indices of actors that starred in the same movie. Of course, you'd have to take into account the year of the movie as well, as there can be two different movies with the name M which came out in different years. Once you form those sets, then you'd know that for a set s corresponding to movie M that came out in the year Y, all the pairs of vertices in s have an edge between them.

Enumeration process would look like the code below.

void populate(vector<string>& actorNames, unordered_map<string, int>& indexMap) {
    int nextIndexToAssign = 0;
    for(int i=0; i < input.size(); ++i) {
        if(indexMap.find(input[i]) == indexMap.end()) {
            indexMap[input[i]] = nextIndexToAssign++;
        }
    }
}

After running a code similar to the implementation above, you can access the index of an actor through indexMap[actorName] where actorName is the string variable that stores the name of an actor. You can do something similar with movies, where the key would be a pair<string, int> instead of string.

Once the enumeration is done, you can use a vector of sets as buckets to store the actors who have starred in the same movie.

typedef pair< string, pair<string, int> > InputLine;
vector<InputLine> input; // defined somewhere, has size N
...

M = movieIndices.size();
vector< set<int> > actorsInMovie(M);
for(int i=0; i < N; ++i) {
    string& actor = input[i].first;
    string& movie = input[i].second;
    int actorIndex = actorIndices[actor];
    int movieIndex = movieIndices[movie];
    actorsInMovie[movieIndex].insert(actorIndex);
}

Then, all you would need to do is to iterate over that actorsInMovie and for each pair of indices u and v in actorsInMovie[i], put an edge between u and v.

Note that the code above has not been tested, and due to the lack of any source code in the question statement, is intended to serve only as pseudocode.

Considering that there are N lines of input, there can be O(N) movies and O(N) actors. With that information, we can say that enumerating the edges (i.e., movie&year pairs) would take O(N) expected time. Additionally, enumerating each movie and forming a set for each of them would take O(N) expected time. Forming of the edges of the graph would, in the worst case scenario of having all the actors in the same movie, take O(N*N) time as you would need to set an edge between all pairs of O(N) actors. Overall, even if you had the worst case time complexity for an unordered_map, you would end up with a time complexity of O(N*N) in the worst case.

The specifics of the adjacency matrix does not alter that time complexity. You just need to initialize all the cells of an N x N matrix adjacencyMatrix with 0, indicating that there is no connection between any vertices a and b. Then you conduct the enumeration and grouping (i.e., sets) strategy I covered above. Finally, for each pair of actors with indices i and j, you set adjacencyMatrix[i][j] = adjacencyMatrix[j][i] = 1. The overall time complexity is still O(N*N). And since you would require O(N*N) time and space complexity to allocate and populate an NxN matrix, that is the optimal time and space complexity you can possibly achieve, unless any additional details/restrictions for your use case are provided.