How do you tell the future? Build a model.

E = MC^2 is a model. Einstein used this model to describe the relationship between fast moving objects and energy. It is perhaps the most famous model among those who don’t have to look at them on a daily basis. For those who do, the word model may take on a more nuanced meaning but to put it into as simple of language as possible, a model is a description of a pattern. This should be your “aha” moment, where you put the pieces together and can see how a model might be used to predict the future. It turns out there are a lot of products and even whole industries that rely on models of this sort. If you aren’t convinced of the value of models, consider some examples of industries that rely on the predictions made by models: insurance and models of the occurrence of flooding, hedge funds and models of the price of a stock, or a model of rush-hour traffic to transportation companies. These models could respectively help you to determine the price of home-owner’s insurance, know the time to buy or sell equity, or the ideal time to start your commute or whether you will take a train, plane, or automobile. The value of models is undeniable but they can be tricky to build and they come in a variety of forms. The forms of the model really depend largely on the goals of the modeling effort and the preciseness with which we can describe the phenomenon. Many models are merely estimations, where others are intended to be a governing theory.

Governing Theories

Many of the discoveries in science that later became laws are what we might call a Governing Theory. They describe a natural phenomenon so accurately that you could say the model “governs” the behavior. By following the model you can predict the exact amount of the behavior you will observe. For example, you probably built or saw a science fair volcano at some point in your primary school days—if you used vinegar, baking soda and food coloring for lava, then you were relying on a very well defined model of the chemical reaction between the materials. If, before you poured in the vinegar you had put pen to paper you could come up with a prediction (a very accurate one) of how much carbon dioxide would be released and perhaps with a few other useful models, how high your “lava” would shoot.

Governing theories usually involve extremely precise measurements and validation by several different methods. In the case of the vinegar and baking soda, a chemist would have captured the gas products of the reaction and measured them carefully, then used those measurements along with a knowledge of the other reactions those substances make to try and figure out what the exact chemical makeup of the beginning reactants were. Pulling together all of the crucial information needed to establish a governing theory can be exhausting; for this reason, it is more common that ideas become theories over the course of decades of observation and experiments and have dozens of scientists contributing to its discovery.

Estimations, Approximations, and Simplifications

Estimation is less like guessing than it sounds…the mere fact that I indicated that estimation models and governing theories are ends of the same spectrum, may have given you the impression that estimations are not to be trusted. On the contrary, the insights that can be gained by building these models can be priceless. Occasionally, I consult for a company who’s product is to provide an approximation of the liquidation value of heavy equipment. Their clients are major rental companies such as United, Hertz, and Sunbelt  Rental as well as a host of private equity companies and banks scoping out new deals—at any time, each company might own thousands of loader backhoes, cranes, and dump trucks. As their equipment gets older they are very concerned with when, where, and for how much they should sell. Before working with us, many of these companies had a process that was rather more like a wizened man with a camera and a clipboard than like a polished model. Using as much of their prior sales data and public auction sales data that we could get our hands on, we built a model of the ideal times, and places to sell equipment. While there are multiple advanced statistical or mathematical methods that go into making an excellent value recovery model or pricing model of this equipment, simply plotting average sale prices for each month over the prior year for each item or category of equipment could provide a much better idea of the real present value of a piece of equipment as well as what kind of depreciation you might expect in the coming months.

Silly Wand Waving

It would be irresponsible of me not to admit that some problems don’t actually lend themselves to the development of models and can actually lead to some truly ridiculous conclusions. Pretty high on the ridiculous list is a modeling practice best described as a “Fishing Expedition”. Now, I could let you just use your imagination as to what that might mean but I’ll give you a truly heinous example just so that we will be entirely clear what this might mean for a modeling effort. Let’s suppose you are given a pair of dice. Let’s further assume that this is the first time that you have ever seen dice and you want to figure out what the probability is that you roll a “snake-eyes” when you toss the dice. You determine to roll the dice ten times and count the number of snake-eyes that you roll—the probability of snake-eyes you report would be the number of times two ones were rolled as a ratio of the total times the dice were rolled. If you know nothing about the dice before-hand, there is nothing inherently wrong with this approach…in fact this might be the most common modeling approach used to describe a natural phenomenon over which we can exert some control. Getting back to the dice example, let me ask you question: given what you know about dice (six sides, etc.), how many times would I have to conduct this test such that you would expect to roll snake-eyes all ten times?

  • Odds of rolling a one = 1/6
  • Odds of rolling snake-eyes = 1/36
  • Odds of rolling snake-eyes ten times consecutively = 1/36 to the tenth power

1/36ths to the tenth power is a very small number—but  if  I attempt the test enough times I will inevitably end up with ten snake-eyes. This is the probability that this kind of test would yield incorrect results. If I were the tester and I desperately wanted to be able to say that the odds of rolling snake-eyes were really high I might conduct this test over and over, increasing the odds of getting 10 snake-eyes. Upon achieving my goal, I might even feel validated at this final outcome and run and tell the town. However, I would be guilty of having conducted a “Fishing Expedition”. If we were to test the odds of multiple unlikely pairs in series (one’s, two’s, three’s, etc.) we would still find it increasingly likely that one of the tests would have an unlikely outcome. The mere fact that we repeatedly ask questions of the same system (dice) increases the likelihood that we accidentally make an inaccurate conclusion. In this example it is quite clear that we couldn’t trust the conclusion of such a clumsy experiment but in other real-world examples the opportunity to go fishing is much more tempting. There is a new field of business intelligence that is especially prone to this kind of bias: Big Data Analytics. You may have heard of the supposed phenomenon of “Big Data,” the truth is that there is nothing inherently new about it…as the name suggests, it is simply when an analyst attempts to gain some insights from a large pool of data with varying degrees of organization.  The primary reason that big data analyses are particularly likely to have this sort of bias is that analysts typically don’t know what they are looking for in advance and so they tend to ask multiple questions, one after another, until an answer pops out of the data that appears interesting. The problem with this approach doesn’t become serious until the analyst decides to trust this conclusion as if it were the only test they ran; while, they should really be adjusting their level of confidence to account for having conducted multiple tests on the same data. Big Data Analysis and business analyses, because of the reliance on largely unreproducible, legacy data, are prone to mistaking coincidence for causality.

Conclusion: Modeling Tells the Future by Describing the Past

One of the things that keeps me waking up in the morning are the endless possibilities that each day holds. As humans, we are born with a certain degree of curiosity which keeps us interacting with the world around us. As a scientist, I find that I am especially interested in things that I don’t understand or have never seen before. As an engineer, I see value in understanding how things work. While we continue to develop an understanding of the world around us, fewer and fewer things remain a surprise…that is, we learn to expect certain outcomes and we have developed rules to predict the likelihood of those outcomes. Modeling, in its many forms, provides value to us as we seek to improve our quality of life. Whether we are avoiding catastrophic events, curing diseases, or creating technology that allows us to communicate across the globe in real-time, models are at the heart of our future.

Add some meat to your social media feed…follow The Public Brain Journal on Twitter

Clayton S. Bingham is a Biomedical Engineer working at the Center for Neural Engineering at University of Southern California. Under the direction of Drs. Theodore Berger and Dong Song, Clayton builds large-scale computational models of neurological systems. Currently, the emphasis is on the modeling of Hippocampal tissue in response to electrical stimulation with the goal of optimizing the placement of stimulating electrodes in regions of the brain that are dysfunctional. These therapies can be used for a broad range of pathologies including Alzheimer’s, various motor disorders, depression, and Epilepsy.

If you would like to hear more about the work done by Clayton, and his colleagues, in the USC Center for Neural Engineering he can be reached at: csbingha-at-usc-dot-edu.