This was my fifth season keeping a database of football related data for network analysis. Over the years I have tried more sophisticated methods and settled into a fairly comprehensive, simple approach that codes every win and loss in division one using directed edges. For those of you not familiar with network analysis, I produce a list of edges that are anchored to nodes. The edges are the win-loss relationships, the nodes are the teams. Thus an Alabama win over Concordia College would be Source: Concordia College; Target: Alabama. Each game in the college football season is entered in this way. Only games involving at least one division 1-A team are coded.
First, a general defense of my method. This method is optimal. It is comprehensive. I code every game of consequence. It compensates for teams playing weak competition, a game against a 1-AA team functionally doesn’t exist as that node has one edge. Worse, that conference will be saddled with a pendant node, diluting their network stats. Most important, this method is reflexive. As the season goes, the network changes shape and weight. Does anyone give much weight to the strength of schedule calculations during week 3? Not really, the season is still unfolding. In my method, the relative power of any individual node is dependent on the nodes connected to it. I am never attached to any judgements from the media, polls, or temporally bounded strength of schedule evaluations. As my network is constantly changing, I have no reason to apply a coefficient to reduce the impact of past games. Ontologically, my network depends on the assumption that a game is a game, the only way of describing a relationship in this system is the source-target relationship. I truly believe that this is the best data structure for evaluating football.
There are few measures of the network that are important. On the surface level is degree, this means how many inbound edges are headed toward any individual node. Currently, Alabama has an in-degree of 13. Betweenness (number of shortest paths through) and closeness (average shortest path to the rest of the network) are ways of evaluating the nodes, Eigenvector centrality is useful for measuring the flow control of any particular node. Finally, we can measure modularity, which detects communities in the data. Thus, no external identification of conference is required. We can read football as a network of wins and losses.
Graphically, this is the world of college football.
Running through our possible measures: betweenness in a football network can reveal teams that are fairly good that played important games out of conference. The top five between teams this season: Central Michigan, Pitt, Ok State, Wyoming, and Air Force. Closeness is more of an index of terrible football teams. Eigenvectors and modularities it is then. Also, degrees.
Here is a helpful table of data that will help us understand what is actually happening in that great big chart up there:
I have rounded the Eigenvector numbers to three digits. The numbers after Wisconsin decline fairly quickly, anything outside the top 25 has an E score of less than .4 which is also my way of saying that I am not interested.
From a raw E score perspective, we should be looking at a playoff of Bama, Clemson, OhSt, and Penn State. Wasn’t this an article about why the playoffs are right? Yes, and raw E scores important, but they measure the flow through the network, they don’t answer the question: which are the best four teams in college football directly.
Let’s re-render this to see individual sub-regions.
The Big Ten, Pac 12, Mac, and Big 12 appear as coherent conferences. The SEC splits in two and the top of the ACC leaves the rest. This gives us some insight into the E score based ranks. You can break this down into any number of modularities, but you likely shouldn’t. For the purposes of football analysis, a modularity method that finds 13 communities is optimal.
So, what is the process?
Assume that modularities, clusters detected with the Lourvian algorithm map the conferences. This is a safe assumption. Instead of relying on contrived processes like the conference championship games, the top for mod leaders should be in the playoffs. So that gives us? Alabama, Clemson, Ohio State, and Washington. And we are done here.
You can start with a record based process that takes all teams in that balance degree band (in minus out degree) with an eigenvector centrality above .4 — ok, so start with your 13 — Alabama, Western Michigan, 10 — Clemson, Ohio State, Washington, 9 — Penn State.
Yes, a program in Michigan has something to be mad about. Just not the Wolverines. Get Angry, Kalamazoo.
This would suggest that the formula is a little different, there was some method used to disqualify Western. Likely this was a conference SOS method, similar to the one used here. The problem for Western is that the Illinois win didn’t mean much and the MAC is a very weak football conference, the eastern MAC may be a 1-AA level group.
Connecting the modularities is important as we can see that they are separate. It gets better — the Pac 12 really is disconnected from the rest of the football world. In the world of the SEC-ACC there are many links, the Pac is literally physically separated. A key goal of a playoff is to finish connecting the network, this method is strong here.
Relies on few assumptions
Has a consistent ontology of basic units (games are games are games)
Is repeatable and verifiable
Facilitates the basic work of playoff systems
Now to pre-empt your questions…
What about Conference Championships?
Why should sweetheart TV deals gone wrong dictate our math? I am talking longhorn network here — the fallout in terms of the UT football schedule will be like an anchor on the Big 12 for years to come. The SEC West is almost always better than the SEC East, the Big Ten East is better than the Big Ten West. Conferences are weird, they are important but we should be careful about overvaluing them.
Before you accuse me of dismissing conference championships you need to know: my method incorporates conference championships. Any chance to win a game is powerful in my model. A conference championship game is another game. If your conference can’t provide you a championship adversary to battle at a level that catapults you into the final four, that is an argument against your conference. With the win over Wisconsin, Penn State moves ahead of Michigan into the #2 slot of the Big Ten. Without game 13, that just doesn’t happen. The future Big 12 championship is beautifully positioned. By taking the top two ranked teams they will amplify the 13th game. Unfortunately, this wouldn’t help them this season, another OK-OKState game won’t do much. I weigh conference championship games consistently, if you want to argue for double counting these games or not counting others, you can do that, but be clear about what you are actually arguing for.
Conference championships matter, this method accurately evaluates those games as what they are — games.
What about TCU a few years back?
This is a free-rider problem — some folks in the big 12 try to be the free rider who would sashay into the playoffs with a terrible non-conference schedule. This wasn’t TCU, but a signature win against free-rider Baylor didn’t exactly help. In short, any two-loss team is in danger, especially when the signature win is over the free-rider and you just took a loss to one of the other super-legit teams in your conference.
And not to be too mean here, but the Big 12 is not a power conference. The conference mean E score is .133. The American has a mean E of .146. For reference sake at .26 the Big Ten is quite strong. The Big 12 is a mid-major football conference.
What about Penn State?
Two losses. Signature wins are Wisconsin in a great game and a high-var game against Ohio State. Also, two losses. Let’s get mathy: Penn State is #10 in betweenness. They are powerful because they control access to the Ohio State node. As Ohio State has higher centrality even with that loss it follows that they should be in the playoff. This is also circular as Penn State lost to Michigan who was beaten by Ohio State — the “head-to-head” is baked into this model.
Why Washington? Because they offer access to the fourth key cluster, connecting the most important parts of the network. Seriously.
Big Ten, Pac-12, SEC, ACC, American, Big 12.
*Pac and SEC are functionally tied.
If you want to complain and you plan on using conference championships as a warrant be aware that you need to argue against Clemson, not Washington. Or you need to go straight ahead at Ohio State and argue that head-to-head and the conference title outweigh Ohio State’s boss schedule.
Shouldn’t a team control it’s own fate?
If you are a Western Michigan fan you should be annoyed and I am cool with your umbrage. Executive summary: don’t lose two games and the committee won’t really help you if you are in the worst 1-A conference. Some other folks stats indicate that Missouri Valley, a 1-AA conference is stronger than the MAC. They might be right.
Also, what does this even mean? There is no magical reference frame free world of football, you will always be evaluated based on who you played. Let me reiterate: only Western Michigan fans have a right to be mad about this.
Why not use this for rankings in the early season?
Because my model is silly until about week seven. This model has no seed rankings. There is no way for this model to really know anything about anything until the games are played, and given the way that the SEC delays playing weak teams their conference looks really important in this model, and the Big Ten/Pac 12/ACC by playing really weak teams to “warm-up” may not even appear as legit conferences in this model until after week 7.
This method makes the Fighting Irish look really important if they play well. Those dudes bring it every weekend, human rankers/voters might not take kindly to a loss or three, but this model would just see nine legit wins over major programs.
The committee got it right. Big Ten rules crowned a conference champion that was not the most important team in the conference. Big Ten East is the most dominant sub-region in all of college football. By the end of the season, the Pac-12 was also very strong, despite Oregon losing the Chip Kelly recruiting halo.
After five years, I am comfortable saying that connecting the top eigenvector centrality teams from the first four modularities presented is the best way to determine the playoffs.