Tuesday, September 26, 2006

Cracking the Zillow Code

Recently, I spent a good portion of my morning debating with a homeowner why our recent appraisal of his home was less than Zillow's value. The mere fact that I was even having that conversation nearly had me banging my head against the nearest wall. The ideal that a homeowner was actually putting so much stock in a free instant valuation off the Internet was unbelievable and frustrating to me. I whole-heartedly thought that at any moment the likes of Ashton Kutcher would jump out from behind my desk yelling "You've been Punked!".

As appraisers, we should understand how Zillow calculates its values, so we can intelligently discuss the differences in our value when compared to a Zestimate. On the Zillow website under the heading, "How do we come up with the Zestimate", Zillow states it takes "zillions of data points" and enters them "into a formula". This formula is referred to as "a proprietary algorithm - a big word for secret formula". Well, let see if we can decode this "secret formula".

In the following two examples, I reviewed several recent real estate sales in two different locations in Kane County and compared theses Sales to Zillow's value estimates.

Example 1

The last seven sales in the Valley Creek subdivision were reviewed. All sales with the exception of Sale 3 appeared to be an arms-length transaction. Sale 3 was a foreclosure sale. Zillow could not find Sale 7 because this property was new and did not have a current assessed value or Assessment information. This is our 1st clue...Zillow can't calculate a value unless there is assessment data. Both Sale 3 and Sale 7 were eliminated from this analysis. Below is a comparison of the Assessed Value (AV) of each of the five remaining sales to the Zillow Value (ZV).



The calculated Correlation Coefficient is an indication of how much one string of data correlates to another. A perfect correlation is 1. Any number close to 1 indicates a very high degree of correlation. The indicated coefficient that compares Zillow's Value to the Assessor's Value is a .95 which indicates that Zillow's values are highly correlated with the Assessor's values (Clue #2).

Note: In this example, Zillow's Values are very close to the actual sales prices (within 2% +/-). This is Clue #3 - I will address the reason for Zillow's accuracy later in this article.

Example 2

Our 2nd example is of the last six sales in the small town of Hampshire. Again, Zillow could not find the new homes (3, 4, and 6). Zillow found the existing home sales 1, 2, and 5 which are analyzed below.



The above comparison of Assessed Values to Zillow Values shows a correlation coefficient of a perfect 1. How interesting! Notice how the Zillow/SP column mirrors the AV/SP (i.e. Low Zillow/SP = low AV/SP or High Zillow/SP = high AV/SP) This is Clue #4.

Note: In this sample, Zillow was less accurate. In the case of Sale 1, it was only off by 4% but in the case of sale 2 and 5, Zillow was off by 10-15%.

Calculation of the Zestimate

The above examples give us some indication of how Zillow arrives at its value estimates (or Zestimate). Quite simply, the Zestimate relies on a calculated relationship of assessed value to sale price. Zillow merely takes selected transactions and calculates the relationship between the Assessed Values and the Sales Prices. It then applies that ratio to the subject's assessed value (plus or minus some adjustments) and "whala", you have Zestimate!

The above examples show that even when Zillow has a large margin of error in its Zestimate of 10-15%, the Zestimate is still highly correlated with the Assessor's Values. We can conclude from this analysis, that the Zestimate is a derivative of the Assessor's Values. Zillow may be slightly modifying the data by some weighting or factor like time or distance. That "tweaking" of the data could be the "secret" part of its formula; but clearly, the Zestimate is based on the underlying Assessor's Values as indicated by the high correlation coefficient.

Accuracy of the Zestimate

Notice how the Zestimates were actually quite accurate in Example 1 (within 2%), but in Example 2, the Zestimate varied by as much as 10-15% from the actual sales prices. This raises the question "Why was Zillow so much more accurate in Elgin?"

The reason Zillow was so accurate in Elgin (Valley Creek), but missed the mark in Hampshire is directly related to the Assessor's accuracy. This is no reflection on competency of the Assessor, but rather the amount and quality of the data. In the case of Valley Creek, the Assessor has an abundance of data to draw from. This particular subdivision has an active market with short marketing time and a high number of transactions of similar homes. The local Assessor usually gets it right with regards to values in this particular subdivision. In Hampshire, the data is not nearly as plentiful. This is a smaller community with far less homogeneity in the data and fewer transactions.

Also, notice how the direction of the variance was the same direction as the Assessor's variance to the Sales Price. In the cases were the Assessor's Values were low, say 22-26%, Zillow's values were low, and vise versa. This illustrates that when the Assessor may have over-valued a property, so does Zillow. We can conclude from this that Zillow is only as accurate as the local assessor. If the Assessor is wrong, so is Zillow. If the Assessor is right on...so is Zillow.

Since the folks at Zillow know the accuracy of its Zestimate is directly related to how well the Assessor gets it right, what better way to improve its own accuracy than to encourage homeowners to add information about their own home into the Zillow site. Hopefully the Assessor will take advantage of this free information to improve the accuracy of its assessed valuations; thereby, dramatically improving Zillow's accuracy. This is possibly the motivation behind Zillow's recent opening of its database up for input, in much the same way Wikipedia is an open database.

Conclusions

So if you chose to use Zillow, be aware of the flaws in its model. Understand that when you use Zillow its accuracy is highly dependent upon the accuracy of the local Assessor. Even if you use Zillow's feature where you can choose your own comps, this will not necessarily improve accuracy unless, by pure chance, you chose comps that the Assessor more accurately assessed.

If in the future your appraisal is challenged on the basis of a Zestimate, hopefully you can more effectively deal that situation than banging your head on a wall and throwing Mr. Kutcher out of your office.

11 Comments:

Blogger David Gibbons said...

Hi Lee, It's David G from Zillow.com.

This is an interesting analysis and your conclusion is understandable given your findings. If however you repeat this analysis elsewhere, you would see that significantly more data fields are used to calculate most Zestimates than just sales records and tax assessed values.

I'll explain this in a bit more detail but first, my apologies for the head-banging experience with the homeowner you encountered. Zestimates have become yet another negotiating tool - just like cherry-picked comps and listings always have been a favorite way to argue for more or less value. In this case, it should have been easy to convince a "reasonable man" that a local Zestimate should be questioned. I'm assuming the property was in either Elgin or Hampshire.

On to the detail ... the reason you measure a strong correlation between tax assessed values and Zestimates is not because our algorithm is simplistic but becasue the data we have is incomplete in your area. If you review most (all?) houses in Elgin and Hampshire, you will see that the only data we have are sales records and tax assessments - we're missing the other information we typically get from the assessor, like sq. ft. and # of beds and baths. In cases like this, no other data is considered in the Zestimate - not becasue the algorithm is flawed - but because we don't yet have the info.

When we do have more information, we use it. The last time I saw a correlation coefficient for all of our data fields, the most correlated field was actually "finished sq. ft.". Our algorithm mashes up multiple valuation approaches. This allows us to both produce fairly accurate Zestimates with little data but also to significantly tighten up Zestimates when there is more data to be considered.

As you've noted, even this data-starved approach can produce highly accurate Zestimates ... but accuracy will decrease in areas where homes are sparsely distributed (making comps difficult to choose), and where most houses are new (yielding little sales history with which to compute trends). All three data challenges seem to apply to Hampshire, so no matter how accurate their assessor is, our accuracy should be markedly better in Elgin than it is in Hampshire.

I agree that Real Estate professionals need to be equipped to discuss Zestimates with consumers. That's something I try to do on ZillowBlog, and your post has suggested a few more that I need to write. Thank You. I hope my comment helps to demystify Zestimates a bit further.

1:58 PM  
Blogger Kevin Boer said...

Makes sense that Zillow's approach would be different in different areas, depending on the quantity and quality of the information available.

Here in California, thanks to Proposition 13, homes are rarely re-assessed, so it's doubtful the official assessor's values would weigh as heavily in the Zestimate as they appear to in Lee's area.

5:22 PM  
Blogger David Gibbons said...

Kevin -

Good point; in CA our math gives very little importance to tax assessments for that reason. Issues like Prop. 13 are most often dealt with in our data intake processes. So, scrubbing data that should be ignored is an important first step to calculating Zestimates. We're getting better at identifying the "outliers" that should be scrubbed but if you ever see a house where bad data is clearly throwing off the estimate, please send it to us via our site feedback form.

6:19 PM  
Blogger conwayblue said...

My major problem with Zillow is that the company does not have to play on a level playing field with appraisers. Zillow is allowed to give free "value" estimates with no liability and no reporting or record keeping requirements. Appraisers on the other hand are fully liable and have a huge burden of reporting and record keeping requirements. Since Zillow is providing estimates of market value they should be fully liable to USPAP requlations just like an appraiser is. If I had a website spitting out these estimates In the end, they will be the death of the appraisal industry.

2:19 PM  
Blogger bowerymarc said...

I hope it does mean the death of the Appraisal industry, at least as it exists now. Here's an industry without consumer review and little oversight. How do you know if they guy you get is good, or even competent, and unbiased? Often they redline neighborhoods (see http://www.nhi.org/online/issues/93/redline.html ) which takes the heat off the banks, and they suffer far less oversight than banks, etc.
One should be suspicious of any trade where you have to pay (cash!) before you get the product.
Just happened to me, and I'm frustrated because there's nowhere to turn.
Zillow, Propertyshark, and sites like that are actually levelling the playing field for the players who really count - the homeowners.

3:55 PM  
Blogger Jim said...

This post has been removed by the author.

8:27 PM  
Blogger Jim said...

bowerymarc,

Q: How do you know if they guy you get is good, or even competent, and unbiased?

A: Simple, appraisers have to go through anywhere from 1000 to 2000 hours of training and supervision just to be become certified.

Comment: One should be suspicious of any trade where you have to pay (cash!) before you get the product.

A: If someone is unhappy with the truth they stiff the appraiser. The problem is that if appraisers slant a report to suit your needs they face risk of losing certification, and if fraud, time in Federal Prison.

Comment: Just happened to me, and I'm frustrated because there's nowhere to turn.

A: If you feel you've been a victim of an appraiser you can take it up with your lender or the bureau in your state that regulates appraisers. If you feel your "zestimate" on Zillow is harming your house value and you'd like it removed they'll tell you "Tough Luck"...

Jim

8:31 PM  
Blogger mark said...

Real Estate is like the elevator business, it has its ups and downs. This is a great real estate blog. I hope that you will keep it going. Hey, did you hear that Donald Trump is going to have all celebrities on his show next season. WOW!

God Bless!

Elmo
Real Estate Professional

7:07 AM  
Blogger Omar Cruz said...

I like this blog is fantastic, is really good written. Congratulation. Do you want to see something more? Read it...:Great investment opportunity in Costa Rica: beach real estate, condo, condos for sale. Visit us for more info at: http://www.jaco-bay.com

7:44 PM  
Blogger Mark said...

One thing that I am curious about in the article is the author just listed the sale price of the house and used that as a comparison. In reality the sale price of the house and the value of the house could be two different things if the seller had to include closing costs or other fees as part of the sale contract. In the examples the author used most of the zillow values were less than the sale price. If only a few of those sales included the seller covering the closing costs, then the zestimate may not have been that far off either.

Granted, as David G pointed out, the more house information zillow has, the more tools it can use to place a value on the house. My point is simply that even if zillow uses only the assessment information, it still appears to be reasonable for someone just looking for a ballpark figure on the value of their house.

9:14 AM  
Anonymous A.M. Harris said...

I read your article with interest until I got to "whala" which completely destroyed your credibility for me.

Why do Americans think it is cute to misspell foreign words we use on a regular basis?

Voila. My comment.
A.M.

9:21 AM  

Post a Comment

<< Home