The BUDA Social Graph

In Ultimate leagues, there are often two categories of league: hat (or rec), where you are placed on a team randomly from season to season, and club (or clique), where your team signs up as a group and can stay together from year to year. I thought this would make for a pretty interesting graph structure, especially since there are some players who play a LOT of disc.

Using the fantastic requests library, I collected every Boston Ultimate Disc Alliance roster all the way back to 1999. Then I turned all of that data into a graph network:

Each node (N ≈ 12000) is a player, sized by the number of distinct teammates they have had.

Each edge (N ≈ 330000) between two players is the number of times those two players have played together.

The players are colored according to the community they were assigned to in Gephi. The positions come from Gephi's OpenOrd layout, which considers community structure but not in exactly the same way as the community detection algorithm. They're all labeled too, but right now we are zoomed out too far. More on that in a bit.

Given that we didn't import any team information at all - we only have a list of players, and how many times each has played together - did we end up with useful communities?

Yes we did! I've labeled some of the groups that I could recognize while zoomed in, but I certainly don't know all of BUDA. There are also some unconnected components around the edges of the graph that I went ahead and looked up as well. There are at least three communities of primarily Hat players, and then groups containing a lot of the club league teams that stay together year to year.

I also collected some zoomed-in screenshots where you can see names, but I am going to have to link them since they are pretty large and there are quite a few:

I also computed some basic stats about the network:

Number of teammates people have had

This is a distribution of node degree - the raw number of links (ignoring weight) that connect to/from each player. Someone who has played just one season of Hat league should have around 15, as should the hypothetical person that played only club with the same 15 people for the past decade. Given that, the relative sizes of the peaks at 15, 30, and 45 are interesting - does BUDA have a lot of one-time league players?

On the other end of the spectrum (not shown on the graph), the highest node degree is 1084 - more than one thousand distinct teammates. There are another 10-15 players above 600.

Frequency of baggage

(Alternative title: histogram of edge weights)

The frequency that any two given players play together falls off pretty smoothly in log scale (so, rapidly overall). The rapid drop-off is not surprising to me given the popularity of Hat league, but the smoothness is - I thought we'd see some noticeable peaks from club teams, but I guess there is turnover there as well.

If you have played BUDA for a while, I bet you could guess a lot of the most-frequent teammates (usually, but not always, someone and their significant other).

Characteristics of the low-degree nodes

This is a more detailed look at node degree, to attempt to make some more sense of the players with few teammates. Each (translucent) dot is a player, plotted as the number of teammates they have had (Node degree) versus the average number of times they typically play with a player (Mean edge weight).

The low-teammate, low-recurrence group in the lower left is pretty dense compared to the rest of the plot, which makes it seem that a lot of players are just one-time league players. There are some players that have a low teammate count but high(er) recurrence, but not a significant portion of the low teammate population. I also like the smoothness with which the upper bound decays and the lower bound increases as you move up in node degree.

N degrees of me

In the spirit of six degrees of Kevin Bacon, I wanted to know how many degrees it took to connect me with anyone in BUDA. In graph terms, this is the maximum of the shortest paths between me and all the other players. We can also express it as how much of BUDA can be reached with K degrees of separation as well (or, the size of my ego network versus the size of the entire BUDA network).

  • 1 degree of me: I have played directly with only 1.25% of BUDA.
  • 2 degrees of me: Adding in teammates of my teammates gets me to 48% of BUDA. The most-connected players can get 85+% coverage with 2 hops.
  • 3 degrees of me: This connects me to 98% of players. Graph-wide, the average shortest path is around 3 so most players will see 90+% coverage at 3 degrees of separation.

Things taper off quickly after that, and since there are several unconnected components 100% coverage isn't possible.

I ran this for every recent BUDA player (anyone who has played in a league in 2013 or 2014). The plot below shows each recent player as a point, with the percentage of BUDA they have played with directly (i.e., their node degree) versus the percentage of BUDA covered if you include their teammates' teammates as well. It's a pretty clear relationship, albeit with some variation - who you play with matters, especially if you are connected to one of the very well-traveled players.

You can find the specific results for any recent BUDA player, via a search by name at the bottom of this page.

A Recommender System

Another common thing to play around with in networks is to find the "missing edges" - in this graph that would be players with similar teammates to you, but who you haven't yet played with. Inspired by this writeup by Edwin Chen, I defined similarity as the sum of weighted jaccard similarity and cosine similarity given two players' sets of edges. Jaccard similarity tends to favor nodes with similar absolute edge values, whereas cosine similarity cares only about the 'shape', and not necessarily the absolute scale. So this should still slightly favor players who play about as much as you, but also possibly include some players with the same teammates, but different volume.

In Python:

import numpy as np
def jaccard_similarity(vec_one, vec_two):
    '''
    weighted jaccard similarity between two vectors:
    jaccard similarity is: 
    len(intersection(x,y))/len(union(x,y))

    for weighted version against indexed
    vectors it's sum(min) / sum(max)
    '''
    vec_min = (min(i, j) for i, j in zip(vec_one, vec_two))
    vec_max = (max(i, j) for i, j in zip(vec_one, vec_two))
    return sum(vec_min) / max(1, sum(vec_max))


def cosine_similarity(vector_one, vector_two):
    '''
    cosine similarity between two vectors: 
    cosine of the "angle" between them

    all the div by 1 works here because the minimum
    edge weight is 1
    '''
    vec_one_norm = max(1, np.linalg.norm(vector_one))
    vec_two_norm = max(1, np.linalg.norm(vector_two))
    return (np.dot(vector_one, vector_two) / 
            max(1, vec_one_norm * vec_two_norm))

The results are search-able below, for every recent player in BUDA. For each player I saved the top ten most similar players as defined by the sum of the above two metrics, ranging from 0 to 2 (cosine similarity can't be negative between two vectors with purely positive values).

Anecdotally, the recommendations for me are pretty promising: due to not being listed on a few BUDA rosters that I actually played on, and due to my club team splitting into two for the Fall seasons, I know or have played with 9 out of the 10 recommendations made for me. A decent start, for such a simple system.

People you may know: BUDA edition

Enter a name below to see the most similar people they haven't played with, as well as the coverage of their ego networks for various degrees of separation.

Results for {{player.id}}:

Strongest missing edges:
Player
Score
{{sim.n}}
{{ sim.s }}
Percentage of players in K-degree ego network:
Degrees of separation
% of BUDA
{{ego.k}}
{{ ego.p }}%

As usual, all of this is on GitHub. You can also download a zipped-up .gexf file of the graph here, which you'll need Gephi to view.