Where All The Children Are Above Average
Physics and paleontology are vastly different sciences. In physics, the phrase "if you've seen one, you've seen them all" often applies. Physics is about finding the rules that allow you to predict things you haven't seen. Once you've determined Kepler's laws of motion, you'll find they apply to all planets. Physics enjoys rich data sets. There is no end to the physical phenomena that one can observe.
But if you are a paleontologist, the data can be quite sparse. What conclusions can you draw from skeletal fragments from a few specimens? If you find some teeth, you might be able to figure out what the animal ate, but you might not know if it is male of female, an adult or adolescent, typical or atypical. Imagine someone dicovering the skeletons of Hulk Hogan and Tom Thumb. Both are men, but neither are typical. And yet, for lack of better data points, we assume that limited observations are representative of larger data sets. The statistical margin of error can be huge when the data is sparse.
To draw conclusions about sparse data, we need metadata that somehow relates the data to a larger data set. For example, suppose you read a movie review by me. Absent other reviews from other viewers, you might have to rely on whether people who know me find me credible. For example, reading this blog entry on the O'Reilly Network, you assume I have a certain minimum credibility. Although evaluating the credibility of the source can help establish the reliability of a datum, additional data points are needed to establish whether the datum is typical. I'll revisit the issue of metadata in future blog entries.
### Get your Daily Bruce! ###
Read More Entries by Bruce A. Epstein.
