Search code examples
rgooglevissankey-diagram

Making a Sankey Diagram with googleVis in R


I am trying to create a Sankey diagram in R, using the googleVis package. (The data.frame that I am using can be found below) What I want the diagram to do is go from the Type, to the Organization, then to the team (Tm), while the size represents the number of (Name) players. From what I have read, one can only three columns. I, therefore, did that using this code

BrewersDraft <- sqldf("SELECT Type, Organization, COUNT(Name) AS PLAYERS 
                      FROM df  
                      GROUP BY 1,2
                      UNION ALL 
                      SELECT Type, (Tm) AS MLB_TEAM, COUNT(Name) AS PLAYERS
                      FROM df 
                      GROUP BY 1,2") 

The data now looks like this:

Type                                          Organization
1 College/University                       Bradley University (Peoria, IL)
2 College/University California State University Fullerton (Fullerton, CA)
3 College/University                      Clemson University (Clemson, SC)
4 College/University    East Tennessee State University (Johnson City, TN)
5 College/University                  Faulkner University (Montgomery, AL)
6 College/University                           Felician College (Lodi, NJ)
  PLAYERS
1       1
2       1
3       1
4       1
5       1
6       1

The "Brewers" value is also in the Organization value. Then I used this code to create the Sankey Diagram:

plot(gvisSankey(BrewersDraft, from = "Type", to="Organization_Type", weight = "PLAYERS",
                options = list(height=800, width=850,
                               sankey="{
                               link:{color:{fill: 'lightblue'}}}")))

The problem is that the Brewers value in the Sankey diagram is with all of the Organization variables when I want the Organization variables to flow to the Brewers variable.

enter image description here

It should look similar to the example on this website, https://thedatagame.com.au/2015/12/14/visualising-the-2015-nba-draft-in-r/ Only difference being that all of the Organization is only going to one team, instead of many.

Can anybody help me? Thank you, it would be much appreciated.

The original data frame.

Year Rnd OvPck RdPck      Tm                      Name    Pos
1  2016   1     5     5 Brewers        Corey Ray (minors)  OF
2  2016   2    46     5 Brewers      Lucas Erceg (minors)  3B
3  2016   2    75    34 Brewers  Mario Feliciano (minors)   C
4  2016   3    82     5 Brewers      Braden Webb (minors) RHP
5  2016   4   111     5 Brewers    Corbin Burnes (minors) RHP
6  2016   5   141     5 Brewers       Zack Brown (minors) RHP
7  2016   6   171     5 Brewers     Payton Henry (minors)   C
8  2016   7   201     5 Brewers     Daniel Brown (minors) LHP
9  2016   8   231     5 Brewers Francisco Thomas (minors)  SS
10 2016   9   261     5 Brewers        Trey York (minors)  2B
11 2016  10   291     5 Brewers        Blake Fox (minors) LHP
12 2016  11   321     5 Brewers  Chad McClanahan (minors)  3B
13 2016  12   351     5 Brewers  Trever Morrison (minors)  SS
14 2016  13   381     5 Brewers   Thomas Jankins (minors) RHP
15 2016  14   411     5 Brewers   Gabriel Garcia (minors)   C
16 2016  15   441     5 Brewers  Scott Serigstad (minors) RHP
17 2016  16   471     5 Brewers       Louie Crow (minors) RHP
18 2016  17   501     5 Brewers    Weston Wilson (minors)  3B
19 2016  18   531     5 Brewers    Cooper Hummel (minors)   C
20 2016  19   561     5 Brewers       Zach Clark (minors)  CF
21 2016  20   591     5 Brewers       Jared Horn (minors) RHP
22 2016  21   621     5 Brewers Nathan Rodriguez (minors)   C
23 2016  22   651     5 Brewers      Cam Roegner (minors) LHP
24 2016  23   681     5 Brewers    Ronnie Gideon (minors)  1B
25 2016  24   711     5 Brewers Michael Gonzalez (minors) RHP
26 2016  25   741     5 Brewers     Blake Lillis (minors) LHP
27 2016  26   771     5 Brewers    Nick Roscetti (minors)  SS
28 2016  27   801     5 Brewers        Nick Cain (minors)  RF
29 2016  28   831     5 Brewers    Andrew Vernon (minors) RHP
30 2016  29   861     5 Brewers    Brennan Price (minors) RHP
31 2016  30   891     5 Brewers     Dalton Brown (minors) RHP
32 2016  31   921     5 Brewers     Ryan Aguilar (minors)  1B
33 2016  32   951     5 Brewers     Wilson Adams (minors) RHP
34 2016  33   981     5 Brewers    Emerson Gibbs (minors) RHP
35 2016  34  1011     5 Brewers       Matt Smith (minors) RHP
36 2016  35  1041     5 Brewers   Chase Williams (minors) RHP
37 2016  36  1071     5 Brewers      Parker Bean (minors) RHP
38 2016  37  1101     5 Brewers     Jomar Cortes (minors)  SS
39 2016  38  1131     5 Brewers     Caleb Whalen (minors)  CF
40 2016  39  1161     5 Brewers       Jose Gomez (minors)  CF
41 2016  40  1191     5 Brewers     Kyle Serrano (minors) RHP
                 Type                                          Organization
1  College/University             University of Louisville (Louisville, KY)
2  College/University                          Menlo College (Atherton, CA)
3         High School         Carlos Beltran Baseball Academy (Florida, PR)
4  College/University           University of South Carolina (Columbia, SC)
5  College/University         St. Mary's College of California (Moraga, CA)
6  College/University                University of Kentucky (Lexington, KY)
7         High School                Pleasant Grove HS (Pleasant Grove, UT)
8  College/University  Mississippi State University (Mississippi State, MS)
9         High School                            Osceola HS (Kissimmee, FL)
10 College/University    East Tennessee State University (Johnson City, TN)
11 College/University                         Rice University (Houston, TX)
12        High School              Brophy College Preparatory (Phoenix, AZ)
13 College/University               Oregon State University (Corvallis, OR)
14 College/University                       Quinnipiac College (Hamden, CT)
15     Junior College       Broward Community College (Fort Lauderdale, FL)
16 College/University California State University Fullerton (Fullerton, CA)
17        High School                        Buena Park HS (Buena Park, CA)
18 College/University                      Clemson University (Clemson, SC)
19 College/University                 University of Portland (Portland, OR)
20     Junior College       Pearl River Community College (Poplarville, MS)
21        High School                                 Vintage HS (Napa, CA)
22     Junior College                         Cypress College (Cypress, CA)
23 College/University                       Bradley University (Peoria, IL)
24 College/University            Texas A&M University (College Station, TX)
25        High School                              Norwalk HS (Norwalk, CT)
26        High School             St. Thomas Aquinas HS (Overland Park, KS)
27 College/University                    University of Iowa (Iowa City, IA)
28 College/University                  Faulkner University (Montgomery, AL)
29 College/University        North Carolina Central University (Durham, NC)
30 College/University                           Felician College (Lodi, NJ)
31 College/University                   Texas Tech University (Lubbock, TX)
32 College/University                    University of Arizona (Tucson, AZ)
33 College/University  University of Alabama in Huntsville (Huntsville, AL)
34 College/University                   Tulane University (New Orleans, LA)
35 College/University                Georgetown University (Washington, DC)
36 College/University                Wichita State University (Wichita, KS)
37 College/University                    Liberty University (Lynchburg, VA)
38        High School         Carlos Beltran Baseball Academy (Florida, PR)
39 College/University                 University of Portland (Portland, OR)
40 College/University             St. Thomas University (Miami Gardens, FL)
41 College/University               University of Tennessee (Knoxville, TN)

Solution

  • If I understand correctly you have 3 states: type, organization and team. Type is always the origin, team is the final destination and organization is at first a destination and then an origin.

    In the second SQL statement you use "Type" again as the origin, when the origin should be "Organization".

    Your SQL has to be modified to look like this:

    BrewersDraft <- sqldf("SELECT Type, Organization, COUNT(Name) AS PLAYERS 
                      FROM df  
                      GROUP BY 1,2
                      UNION ALL 
                      SELECT Organization, (Tm) AS MLB_TEAM, COUNT(Name) AS PLAYERS
                      FROM df 
                      GROUP BY 1,2")