I am having issues with a graph implementation using an adjacency matrix. As a little background, I have to read from a file each line containing an actor, a movie they played in, and the year it was made. My job is to create a graph from the file. The graph would then consist of vertexes of actors whose edge relation is that they starred in the same film together. In the file, the actor is the vertex, the movie and year it was published is the edge.
I have found a way to store the vertexes in an std::vector and have also created a struct
called MovieInformation
that stores a string movieName
and int movie year
. These are also stored in a vector< pair< int,MovieInformation>>
.
However, I don't know how to insert the information into the vector
s in a way such that the information of the actor and the movie they starred stays together.
An example input from the file would be:
Alex Weir Stop Making Sense 1984 Steven Scales Stop Making Sense 1984 Ruben Blades Disorganized Crime 1989 Hoyt Axton Disorganized Crime 1989 Fred Gwynne Disorganized Crime 1989
while the output would be
(actor)--[movie#@year]-->(actor)--...
As for the code part, there isn't much written except for what was explained above.
First off, you'd need to simplify the vertex identification process. I'd suggest enumerating them by storing them in an unordered map with the actor names as string keys and their corresponding indices as values. That would yield O(1)
expected time to access the index of an actor. Then, you could form sets movies which contain the indices of actors that starred in the same movie. Of course, you'd have to take into account the year of the movie as well, as there can be two different movies with the name M
which came out in different years. Once you form those sets, then you'd know that for a set s
corresponding to movie M
that came out in the year Y
, all the pairs of vertices in s
have an edge between them.
Enumeration process would look like the code below.
void populate(vector<string>& actorNames, unordered_map<string, int>& indexMap) {
int nextIndexToAssign = 0;
for(int i=0; i < input.size(); ++i) {
if(indexMap.find(input[i]) == indexMap.end()) {
indexMap[input[i]] = nextIndexToAssign++;
}
}
}
After running a code similar to the implementation above, you can access the index of an actor through indexMap[actorName]
where actorName
is the string variable that stores the name of an actor. You can do something similar with movies, where the key would be a pair<string, int>
instead of string
.
Once the enumeration is done, you can use a vector of set
s as buckets to store the actors who have starred in the same movie.
typedef pair< string, pair<string, int> > InputLine;
vector<InputLine> input; // defined somewhere, has size N
...
M = movieIndices.size();
vector< set<int> > actorsInMovie(M);
for(int i=0; i < N; ++i) {
string& actor = input[i].first;
string& movie = input[i].second;
int actorIndex = actorIndices[actor];
int movieIndex = movieIndices[movie];
actorsInMovie[movieIndex].insert(actorIndex);
}
Then, all you would need to do is to iterate over that actorsInMovie
and for each pair of indices u
and v
in actorsInMovie[i]
, put an edge between u
and v
.
Note that the code above has not been tested, and due to the lack of any source code in the question statement, is intended to serve only as pseudocode.
Considering that there are N
lines of input, there can be O(N)
movies and O(N)
actors. With that information, we can say that enumerating the edges (i.e., movie&year pairs) would take O(N)
expected time. Additionally, enumerating each movie and forming a set for each of them would take O(N)
expected time. Forming of the edges of the graph would, in the worst case scenario of having all the actors in the same movie, take O(N*N)
time as you would need to set an edge between all pairs of O(N)
actors. Overall, even if you had the worst case time complexity for an unordered_map
, you would end up with a time complexity of O(N*N)
in the worst case.
The specifics of the adjacency matrix does not alter that time complexity. You just need to initialize all the cells of an N x N
matrix adjacencyMatrix
with 0
, indicating that there is no connection between any vertices a
and b
. Then you conduct the enumeration and grouping (i.e., sets) strategy I covered above. Finally, for each pair of actors with indices i
and j
, you set adjacencyMatrix[i][j] = adjacencyMatrix[j][i] = 1
. The overall time complexity is still O(N*N)
. And since you would require O(N*N)
time and space complexity to allocate and populate an NxN
matrix, that is the optimal time and space complexity you can possibly achieve, unless any additional details/restrictions for your use case are provided.