Stuff from 20 November, 2009

This is the archive of tumbledry happenings that occurred on 20 November, 2009.

Dan McKeown commenting on Prison

Just don’t cheat like the French.

When Information Overwhelms Facts

Friday, 20 November 2009

SquareTrade, a company that sells warranties for consumer electronics and appliances, recently published a summary of the failure rates of notebooks/netbooks (n=30,000). This study was then disseminated by large technology websites: Jeff Smykil at Ars Technica, Electronista Staff, Vladislav Savov at Engadget, and Danny Allen at Gizmodo.

This situation illustrates what is wrong with a large swath of online reporting.

In their “me-too” eagerness to promulgate any scrap of ersatz news, none of these tech websites pointed out that this SquareTrade study, in its current form, is completely worthless for drawing conclusions about laptop reliability, scientific or otherwise. The problem lies in the unscientific nature of SquareTrade’s research.

The SquareTrade study utterly fails to provide any meaningful statistical analysis of its numbers. It finds “the average total failure rate of laptops to be 31% over 3 years”. An average can not stand on its own: to be of statistical use, it must be accompanied by at least a standard deviation of the data from which the average was taken. (A range would be helpful, too). SquareTrade’s sloppiness becomes more evident with a closer reading.

“Apple is in fourth place in laptop reliability” was the headline used by many tech websites to push this SquareTrade study. But the graph upon which the headline ranking is based (on page 6 of the report) was created using a linear projection from TWO DATA POINTS. Here’s why this is a problem: fitting a curve that will have any prognostic capabilities requires more than two data points. More data points mean more complications. So, to handle the extra data, the researcher/underpaid intern at SquareTrade must understand curve fitting, specifically the coefficient of determination (calculated to make sure the curve fits well). When you only have two data points to model, however, two things happen: (1) you can easily model the two points with a linear curve that perfectly fits the data (R2=1). (2) Your model is capable of predicting nothing. So, the SquareTrade authors have formed an inappropriate model based on sloppy data to make fallacious projections.

Ostensibly, SquareTrade put this 3 year projection in their paper to make the reliability difference between the laptop companies they analyzed seem greater. Despite the statistically ignorant data massaging, the difference between the most reliable laptop makers and fifth-most reliable is 2.7%. There is no information to tell us if this difference is statistically significant. If SquareTrade had an interest in scientifically rigorous data, they would report the standard deviation (commonly called error bars when used on graphs) and the range of their data. Instead, they provide colorful arrangements of numbers that illustrate little, prove nothing, and distract from sloppy, unscientific methodology.

What’s more, SquareTrade is a warranty company looking for customers to purchase insurance against exactly the topic of this study: laptops/portables breaking. This obnoxiously blatant conflict of interest is only mentioned as a tiny caveat in the Ars Technica piece. But, the damage has already been done: the study has been broadcast as fact across a multitude of tech websites.

Dissemination of this type of information (too low in quality to be called factual) is a symptom of a larger problem on the internet: the quantity of information available continues to rise while the quantity of factual data grounded in scientific methods fails to keep pace.

As a result, the ratio of factual, useful research to that of generally worthless information continues to dwindle.

The “professional” blogs exacerbate the overload with their echo chamber effect.

The noise drowns out the signal.

14 comments left

11.20.09