I'm no William Tell, but...
Zillow provides data about its accuracy on the website. Think of accuracy as an Archer trying to hit a target. In this case, a bullseye is hit when the Zestimate is the same as the Sales Price. The stats that Zillow provides is intended to inform the user how good they are at hitting the bullseye. But does this really give the consumer good information about Zillow’s accuracy?The table below is Zillow’s accuracy report for the Chicago area. Now I’m not a Statistician, but since I stayed at a Holiday Inn Express last night, I’m going to try to interpret the stats in this table.
The column on the right is basically the same as drawing a ring around the bullseye and determining how many times Zestimate was within that 10% ring. In Cook County, the Zestimate was within that ring 58% of the time and a little worse in Tazewell at only 44%. That’s not bad if your Zestimate was one of the fortunate ones inside the ring. But what if your Zestimate was one of the less fortunate that hit the target outside the 10% ring or what if you happen to live in Tazewell (I’m not sure where that is, but it sounds far). Wouldn’t you want to know, by how much did Zilllow miss the bullseye. Does it miss just a little, or does it miss badly.
The casual user of Zillow might think that the 1st column provides that. This is the % error. But when you look at the heading it says “Median”. This is rather odd. The use of median as a measure of central tendency is normally for data that is an ordinal variable (follows a natural order).
For example, the average housing price is reported as a median. After you line up the data in order from lowest to highest, the mid point is found. This is usually a good measure of central tendency for housing price since some extremely expensive homes in a data set could skew the average. The expensive houses would have a greater impact (relative to the least expensive houses) due to the magnitude of the number. The median gives a more accurate and realistic value of the prices faced by most people.
A % Error figure is not an ordinal data set. For this reason, it would be better to use a “mean” as the measure of central tendency. Consider the following example….
Let’s assume for the moment that I recently decided to take up Archery and the following are the results of me launching an arrow at a target in my backyard. The percent figure indicates by how much I missed hitting the bullseye. We can call this “% Error”.
1%
2%
3%
4%
5%
8.1%
50%
60%
70%
80%
90%
The median % error for my shots was 8.1%. But the mean was a dismal 34%. This is a significant difference in the measurement of my accuracy. I got really close to the bullseye the first five shots (within 5%) but then the last five shots, I missed badly. In fact, the last shot was very close to missing the target altogether and hitting the window of the neighbor’s house! At that point, I decided I should put the bow down and go back inside the house before I hurt someone.
If you were my neighbor and asked me about my newfound Archery hobby and I told you, “Don’t worry, I am accurate to within 8.1% of the bullseye”, would that be fair to you or your kid’s running around in the backyard? As my neighbor, would you feel like you should have been told about the couple of shots that were near the edge of the target?
Doesn’t the consumer deserve to know how often Zillow might badly miss the bullseye? Does the use of median really give the consumer this information? Is there some reason why ZIllow is not reporting the Mean?
Zillow would probably say that if it provided a mean, it would be misleading because the outliers would skew the results. Zillow would also probably say that when there is an outlier, it was not because the model is flawed, but because they were given bad information from the public record. In other words, they were given a “bad arrow”.
To apply this logic to my Archery hobby, I could justify not telling my neighbor about the bad shots by saying to myself, it was those lousy arrows I bought at the store down the street, “Bob’s Discount Arrows and Grenades”. The arrows I used for my 1st five shots were bought at “Dick’s Sporting Goods” and were very accurate. But doesn’t my neighbor deserve to know whether I am buying my arrows at Bob’s (they are cheaper than the ones at Dick’s) so he can hide his kids in the house when I emerge from my backdoor with my Bow & Arrow set in hand!
Suppose my wife get the credit card bill and flips out about the money I am spending at Dicks (on the “good” arrows). I could I tell her, “Honey, I am really good. On average, I’m within 8.1% of the bullseye and getting better when I buy the good arrows from Dick’s. In fact, I’m getting so good, I think I got a shot at the Olympic Archery Team”.
Now, if I want to mislead my wife and convince her that we should be taking a 2nd mortgage on the house to pay for more expensive arrows from Dick’s, that’s my business. But I think my neighbor deserves to know about my bad shots. If Zillow wants to internally talk about its median error margin, that is perfectly acceptable. But is this fair to the consumer?
Ask yourself, who benefits from the providing of a median error vs. providing a mean? If your Zestimate is one of the 42% (or 56% in Tazewell) that hit outside the 10% ring, don’t you want to know just how bad that could be? Don’t you want to know if you should be getting your kids out of the backyard!


0 Comments:
Post a Comment
<< Home