Comparing Zillow, Redfin, and Price Estimates in Time

A few months ago I built a time series of a house’s price estimate from Zillow and Redfin. But there were some problems:

When another nearby house put a sign out saying “coming soon”, I wrote a script to automate the scraping and collected a much denser time series from Zillow, Redfin, and Let’s see what we can learn with more complete data!

You can find the Jupyter notebook used to perform this analysis here (rendered on Github). The data can be found here.

Data Collection

I wrote a script to download the entire page for the specific house from each of the three sites. I ran it on my Raspberry Pi everyday using cron. I parsed the HTML using Python and a wrote the cleaned data to JSON. That parsing notebook can be found here (rendered on Github). I won’t include the raw data, you will have to collect some yourself.


Here is a plot comparing the estimates for the sales price from Zillow and Redfin in time:

A plot showing three time series, one is the estimated value of a house according to Zillow, one is the estimate for the same house from Redfin, and the last is the estimate from

The daily price estimates from Redfin are shown using red circles, the estimates for Zillow are shown using blue triangles, the estimates for are shown using purple diamonds.


I wrote a script to ensure that I would get data for every day, but as you can see there are still many missing points. is the most frustrating! As soon as the house was actually listed on the market they stopped providing estimates! starts estimating again only after the sale price is posted. This defeats the entire point! The estimate is most important when the house is actually for sale and just punts completely. Embarrassing.

Still we can see that, when the house is not for sale, they update their estimate roughly every two weeks. Their initially estimates are not too bad, just about 6% low from the final sale price.


Zillow similarly is missing estimates for most of the time when the home is actually for sale. Their page shows “Zestimate: None” with an error explanation blaming county transactional data.1 I am sure that’s true but I am unimpressed. Dealing with missing data is a key part of building a robust machine learning model.

Zillow’s model updates more frequently than’s. It slowly climbs until it abruptly stops estimating a few days after the listing is posted. This suggests they make use of a different model for currently on-the-market homes and that that model requires more and different data than the off-the-market model.

The Zillow estimate does return at the end of the pending period but… I do not have anything nice to say about it. Just look at that variance!

Zillow underestimates the final price by about 10%.


Redfin is the only company that keeps posting estimates once the house is actually for sale! Their pre-listing estimate is almost exactly right, but once the house is listed their on-the-market model over estimates by about 10%.

Before the listing is posted, Redfin updates its estimate roughly weekly, and like Zillow it takes 5 days to switch to the on-the-market model. This suggests that both sites get their data indicating the house is for sale from the same source. After the listing the model updates daily.

The on-the-market model trends upwards at first and then stabilizes after the house is pending, but because the time between listing and pending was so short it is impossible to tell if the stabilization was due to the listing going pending or not.


With the denser data and all three sites to compare, I conclude the following:

  1. The error message:

    Where’s the Zestimate?

    County transactional data for this home is insufficient so we cannot calculate a Zestimate. We are adding data all the time, so be sure to come back.