When Information Overwhelms Facts

Friday, 20 November 2009

SquareTrade, a company that sells warranties for consumer electronics and appliances, recently published a summary of the failure rates of notebooks/netbooks (n=30,000). This study was then disseminated by large technology websites: Jeff Smykil at Ars Technica, Electronista Staff, Vladislav Savov at Engadget, and Danny Allen at Gizmodo.

This situation illustrates what is wrong with a large swath of online reporting.

In their “me-too” eagerness to promulgate any scrap of ersatz news, none of these tech websites pointed out that this SquareTrade study, in its current form, is completely worthless for drawing conclusions about laptop reliability, scientific or otherwise. The problem lies in the unscientific nature of SquareTrade’s research.

The SquareTrade study utterly fails to provide any meaningful statistical analysis of its numbers. It finds “the average total failure rate of laptops to be 31% over 3 years”. An average can not stand on its own: to be of statistical use, it must be accompanied by at least a standard deviation of the data from which the average was taken. (A range would be helpful, too). SquareTrade’s sloppiness becomes more evident with a closer reading.

“Apple is in fourth place in laptop reliability” was the headline used by many tech websites to push this SquareTrade study. But the graph upon which the headline ranking is based (on page 6 of the report) was created using a linear projection from TWO DATA POINTS. Here’s why this is a problem: fitting a curve that will have any prognostic capabilities requires more than two data points. More data points mean more complications. So, to handle the extra data, the researcher/underpaid intern at SquareTrade must understand curve fitting, specifically the coefficient of determination (calculated to make sure the curve fits well). When you only have two data points to model, however, two things happen: (1) you can easily model the two points with a linear curve that perfectly fits the data (R2=1). (2) Your model is capable of predicting nothing. So, the SquareTrade authors have formed an inappropriate model based on sloppy data to make fallacious projections.

Ostensibly, SquareTrade put this 3 year projection in their paper to make the reliability difference between the laptop companies they analyzed seem greater. Despite the statistically ignorant data massaging, the difference between the most reliable laptop makers and fifth-most reliable is 2.7%. There is no information to tell us if this difference is statistically significant. If SquareTrade had an interest in scientifically rigorous data, they would report the standard deviation (commonly called error bars when used on graphs) and the range of their data. Instead, they provide colorful arrangements of numbers that illustrate little, prove nothing, and distract from sloppy, unscientific methodology.

What’s more, SquareTrade is a warranty company looking for customers to purchase insurance against exactly the topic of this study: laptops/portables breaking. This obnoxiously blatant conflict of interest is only mentioned as a tiny caveat in the Ars Technica piece. But, the damage has already been done: the study has been broadcast as fact across a multitude of tech websites.

Dissemination of this type of information (too low in quality to be called factual) is a symptom of a larger problem on the internet: the quantity of information available continues to rise while the quantity of factual data grounded in scientific methods fails to keep pace.

As a result, the ratio of factual, useful research to that of generally worthless information continues to dwindle.

The “professional” blogs exacerbate the overload with their echo chamber effect.

The noise drowns out the signal.

14 comments left

11.20.09

Adam Fields

This report is also laughably lacking in actual data. There aren’t any numbers about how many laptops were brought in from each brand, and whether that does actually represent a statistically significant number relative to the total number sold (picking 1000 out of a hat as that target is meaningless). They don’t differentiate between different kinds of malfunctions, or even different model lines within the same manufacturer. If HP has a particular model that fails a lot, that would completely skew these numbers, and not say anything at all about whether you should buy one of the hundred other models they make.

I’m ashamed of anyone who would report this as meaningful.

GadgetGav

Absolutely..! Well said.

The via-via-via linking & blind re-reporting of tech web sites is getting out of control. These bloggers just post stuff without reading it or offering any real analysis. I guess that’s what you get when you pay per post and the site makes money per page view. Much better to have an attention grabbing headline than an informative article. It’s especially helpful to the business model if it is something controversial: “Apple reliability worse than Asus!” or “Apple reliability better than HP”. The more a story gets spread around, the more fractions of a cent the site makes.

Something needs to change if blogging is going to be taken seriously as a form of journalism. This one was particularly bad in my opinion because of the number of ‘serious’ sites, like Ars who jumped in with both feet into the territory usually reserved for the likes of Gizmodo.

Tonio Loewald

I’m not sure the results published are quite as worthless as you imply. First of all, the “linear projection” is only for accidental damage, not reliability issues, for which they appear to have solid figures. Since accidental damage is usually not covered by extended warranties, they aren’t really incented to lie here, and I would guess that a linear projection would underestimate the real figures since people are more careless with old stuff.

Second, their graph implies that their data are continuous (accumulated failure rate over time — look at the lines, they are squiggly).

The real problem — as you point out — is that we have are that there are no error bars (which would probably render the differences between the first four contenders “statistically insignificant”). But this doesn’t make the information worthless. Even if I have only two values — x = 10 and y = 5, the odds are higher that x is bigger than y than the reverse, I just don’t know by how much.

Laptop failure rates of 20-30% over three years certainly seem pretty accurate based on my experience (and that’s how Ars Technica looked at it), so the “value for money” equation for extended warranties pretty much comes down to 0.15 x purchase cost (since your laptop has about a 15% chance of failing due to reliability issues after its standard warranty expires).

Adam Fields

Also, I don’t understand why the 3-year failure rates are projections. They’ve been doing this for more than 3 years - don’t they have actual numbers for that?

Engin Kurutepe

I have to agree with the previous commenter, Tonio Loewald, here. While you are right that the report lacks in scientific rigor, I think this is a little beside the point here since this is not a scientific publication and they are not making any unexpected claims. It seems to be a case of stating the obvious.

Your assumption that their projection is based on two data points might be wrong. They should have data on when the failures occur based on their customer claims. At any rate the curves are not straight and seem to be the cumulative distribution of failures over time. In any case I am willing to give them the benefit of doubt, because otherwise this wouldn’t be empty marketing drivel but outright tampering with data.

Rus

SquareTrade is very accustom to conflicts if interest - they were formed initially to dispute eBay feedback where one had to PAY to get a comment removed.

Matt Nolker +1

Excellent points about the SquareTrade study. That said, I would disagree with your lede: namely, “This situation illustrates what is wrong with a large swath of online reporting.” Online journalists have considerable company in their lack of numeracy. Having spent some time at a large daily paper, I can assure you that it’s also endemic among journalists who still spread ink on dead trees. It would be interesting to see if you could track down some actual data indicating that online journalists have a higher rate of… error? credulity? than their print brethren.

John Corbin

Sounds exactly like the statistics used by Congress to promote so-called health reform. Useless statistics are worse than lies, forgery and bribery: People might actually buy statistics.

room34

Excellent post! I had a vague sense of misgiving when I read the article on laptop failure rates the other day on Ars Technica, but in passing I couldn’t quite put my finger on why it seemed… well, just WRONG.

It bears repeating that you can draw ANY curve through two points, essentially using the feeble data to “prove” whatever you want it to. And, if anything, the conflict of interest is even MORE disturbing.

It’s probably too much to hope that being called out like this will lead news sites to be more careful before posting such drivel in the future, but at least maybe this will help make readers more attentive and think to question what they’re being told.

Paul M. Watson

How effective is the echo chamber in a second wave of correction? Disseminating this article would seem a good thing to do but is the damage done and is the echo chamber unable to correct previous waves?

Steve

In the source page for the SquareTrade report, the first citation is literally “We’re guessing on all this data.”

John

You do realize that your article is invalid and shows that you have no appreciation for statistics?

You cannot get a negative mean to the failure rate. The up-side is capped presumably at 100%. For a mean of ~20%, this infers that a likely maximum stdev = around 10%. Factoring in the situation that the true mean may be on the top-side instead, the maximum stdev = around 40%. That means that an assumed stdev ~15% - 25% is extremely liberal, and can be used as proxy for the maximum actual stdev (much more likely to be smaller).
You have n = 30k.
You can calculate maximum stderr from assumed stdev and n, which proves that all numbers are actually statistically significant at 95%.

Do the calculation yourself.

Steven

All I can say is, my first reaction to this report was to wonder how many MacBook owners have actually purchased a “Square Trade” warranty. I wonder if it is enough to make this report statistically relevant with respect to Apple laptops.

And John, I’m not a statistician, but you used a lot of terms like “assumed,” and “likely,” and “infer,” and “around,” and “~.” It seems to me the author’s point was not to demonstrate his competency in statistics. If the amount of supposition you engaged in is required to make sense of the report, then perhaps there were not enough concrete numbers provided to support the conclusions which were reached.

Alexander Micek

Thank you for your comments, everyone. I appreciate your thoughtful reading of the piece and your critiques.

I noticed a few responses to my “two data points” conclusion. My concern is this: if SquareTrade did indeed have more than two data points, why did they fail to incorporate that data into a model more nuanced than a linear projection?

It is troubling that we, as evaluators of the study, are not supplied with rigorously supported facts.

tumbledry

When Information Overwhelms Facts

Comments

Adam Fields

GadgetGav

Tonio Loewald

Adam Fields

Engin Kurutepe

Rus

Matt Nolker +1

John Corbin

room34

Paul M. Watson

Steve

John

Steven

Alexander Micek

Essays Nearby