Celebrities are right to be concerned — in many cases there is no particularly good reason why they are famous. The reason why we have the “a-list” is to allow us to make some sense of who is important and why that matters. Becker’s Art Worlds, supposes that ranking systems provide us with some leverage to make decisions with scarce resources, which in capital intensive creative industries is quite a problem. Education is similar to other experience and post-experience goods as we are not especially good at knowing the difference between quality levels but are very willing to put very intense numerical measures on it. Despite the fact that audits are literally impossible, we need the numbers.
The Rankings Are Not Great
A Former Dean at Temple University has been charged with systematically submitted false information to US News and World Report causing their program to appear to be the best of the best. It gets better, in the context of law school rankings, USNR had to make last second changes because they data were embarrassingly wrong. Even something as small as open hours of a library could substantially effect the rankings, which makes little sense. Data quality is only established by the schools themselves.
The buckets of categories used in the rankings these days aren’t super great. The biggest factors are: peer reputation (20), 6 year graduation (17.6), how much money they spent (10), long-term rechecked graduation rate (8), class size (8), faculty pay adjusted for region but not institution type (7) and student quality (7). An additional eleven factors round out the analysis under 7% influence with alumni giving at 3. If you combine reputation with dramatically higher salaries at ostensibly elite institutions and alumni giving you round the bend of a third of ranking being driven by social class and reputation. Add on the impact of socio-economic status in driving degree completions and that doesn’t seem to be that descriptive either.
Do these rankings mean anything? They do a good job of indexing access to power, which may be exactly what you are looking for in choosing how to build a life. It is well-established that your debut show says a lot about the artistic career to come. If you enjoy feeling gross, take a look at figure 2 in the link above: only .2% of folks with high-end debuts end with a low rank final show. While artists who start at the bottom do sometimes break through, it only makes sense to start as high in the system as you can.
When I say black box I don’t mean a flight data recorder, the black box of so many eighties stand-up routines, but semi-opaque machine learning processes. I tend to think that machine learning processes like these are good. Parametric statistics can lure us into error with their easy assumptions of “normal” distributions. Black boxes with vanilla assumptions can provide some real leverage for thinking differently, but they also bring their own challenges.
I was working on a long term machine learning project involving real data about relevant decisions. The first problem: data going into a real problem is not clean, it is messy, and heterogeneous. You might want an equal number of horses, unicorns, and alligators to train your animal classifier on, but if you have one unicorn in the data set it will never appear as a prediction or predicted. The solutions generally boil down to: cut everything in the data set to just be at the lowest training population level, look at the unicorn you have a bunch of times, or use a unicorn simulator to cook up dozens of things that could be unicorns. In the case of this project, I chose to allow an out-of-bag error to occur.
Why allow an error? Because the error tells us about the totality of the data and the framing of the problem. Iteration requires that you report something to consider and reflect on with relevant partners and stakeholders. Bringing the models as they are, with known explainable errors, offers a moment to deliberate. What if we want to segment our modeling project into three different models? Could the failure of model 2b help us understand our own questions better? The visual analysis of a scatter plot of model internals is incredibly rich, do we even need the automated predictor?
It would be so easy to be another ML miracle worker with a wonder machine that conceals so much computational effort and manipulation, stealing layers of deliberation from your publics. What is tricky is that so many methods are unable to provide meaningful test statistics, they rely on the amazing first result, not the depth and quality of the explanation to generate authority.
What we can’t audit
Americans like to find their own truths. Specifically, they want to discover their own facts, vulnerability to conspiracy thinking follows. The problem with college rankings, more than fraud, is compression. The schools know the formula so well that things are clumped which then allows less meaningful factors to decide the day. Everyone is already playing the game so hard and so well, Nash equilibrium has been reached.
From machine learning through the obviously rigged street ware market we have an inability to audit. Stock markets have become stonk markets where no new information is incorporated into the valuation of securities except which meme is at peak dank. It is a sure thing that there will be another hit documentary about a simple, brazen fraud. The most recent big collapse, even more recent than Melvin being squeezed by a Reddit swarm, is Archegos. I tend to say it as “arch egos” for a reason. Index funds and alt.coins are popular because they are so transparent, you know what you are getting. Increasingly the big funds taking the big private swings are ostensibly family businesses which are not subject to auditing. Finally the big egos are safe to work again in secret.
Power is maintained by these institutions, making impossible promises, that have a tinge of facility that can’t be audited. Secret investment strategies, numbers that are to be believed, and stilted machine learning processes infiltrating life that are to be taken at face value and not manipulated because that, not the opacity, is the crime.