Explainable AI for Property Valuation
Asking our automated valuation model (AVM) what it was thinking
Predicting a home price is often not enough.
Our customers want to better understand how we arrived at a conclusion, and if they can trust that conclusion. Creating tools to provide this explanation is not only good for business, it has also opened up new ways for us to understand housing markets.
Introduction
At Clear Capital we spend a lot of time thinking about houses. What is a house worth? What attributes make it valuable? What does the market look like for a particular home? In this article I’d like to talk about three questions:
- What kinds of tools can we use to predict home prices?
- How do we explain a price prediction?
- Can we use that explanation to more deeply understand housing markets?
Predicting home prices
For illustration, we will be working with the California Housing Prices dataset (housing.csv) from . This represents home prices from the 1990s combined with key home features and census data. More details can be found in [1] and [2]. We will follow the analysis in [1] to load and minimally clean this data.
Load and clean the data.
Do a basic train/test split.
Fit and benchmark models.
We will fit and benchmark two models. A simple (and easy to understand) linear regression, and a more complex random forest.
Linear regression:
Call:
lm(formula = median_house_value ~ . - mpid, data = train)
Residuals:
Min 1Q Median 3Q Max
-548496 -42689 -10835 28790 752060
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.248e+06 9.774e+04 -22.999 < 2e-16 ***
longitude -2.659e+04 1.132e+03 -23.485 < 2e-16 ***
latitude -2.529e+04 1.115e+03 -22.673 < 2e-16 ***
housing_median_age 1.041e+03 4.904e+01 21.222 < 2e-16 ***
total_rooms -4.688e+00 8.648e-01 -5.420 6.03e-08 ***
total_bedrooms 6.788e+01 6.546e+00 10.370 < 2e-16 ***
population -3.741e+01 1.198e+00 -31.218 < 2e-16 ***
households 7.606e+01 7.322e+00 10.389 < 2e-16 ***
median_income 3.864e+04 3.729e+02 103.611 < 2e-16 ***
ocean_proximityINLAND -3.909e+04 1.944e+03 -20.105 < 2e-16 ***
ocean_proximityISLAND 1.738e+05 3.452e+04 5.036 4.81e-07 ***
ocean_proximityNEAR BAY -3.698e+03 2.134e+03 -1.733 0.08307 .
ocean_proximityNEAR OCEAN 5.587e+03 1.741e+03 3.209 0.00134 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 68950 on 16499 degrees of freedom
Multiple R-squared: 0.6428, Adjusted R-squared: 0.6426
F-statistic: 2475 on 12 and 16499 DF, p-value: < 2.2e-16
Random forest [3]:
Ranger result
Call:
ranger(median_house_value ~ . - mpid, data = train)
Type: Regression
Number of trees: 500
Sample size: 16512
Number of independent variables: 9
Mtry: 3
Target node size: 5
Variable importance mode: none
Splitrule: variance
OOB prediction error (MSE): 2366342760
R squared (OOB): 0.8220686
When benchmarking how well a model does for property values, the automated valuation model (AVM) industry uses a few common metrics. These tend to be based around percentage error, rather than raw error from the model.
- MdAPE: The median absolute percentage error of the prediction vs. value
- PPE10: The percentage of predictions that are within 10% of the actual property price.
Benchmarks:
The naive linear regression model gives a poor fit. Essentially, linear models often struggle with housing data. The non-linear nature of how features relate to price makes a naive linear model ineffective. Methods that can easily handle categorical variables and partition by variable combinations (such as latitude and longitude) are able to do a lot better with less customization.
For housing data, top published results tend to be dominated by nonlinear techniques such as XGBoost, random forests, neural networks, and so on. The problem with these more complex techniques is that their predictions are harder to explain. For example, the random forest model above uses a default of 500 decision trees to reach a value [3]. Visualizing and understanding a prediction based on 500 trees is not easy! There is a better alternative.
Explaining random forest predictions
In order to explain random forest predictions, the key insight is to realize that random forest regressions are actually equivalent to weighted k-nearest neighbor predictions [4]. The idea is that when we are predicting a value for a given property, we can look in the relevant leaves of the forest to see what training data is being used for the prediction. In real estate terms, these training data points are the “comparable” properties (or “comps” for short). Let’s see how this works.
We examine the first subject in our test data set and see what properties would be used to predict its value. In the grid below, the first row is the subject. The remaining rows represent the comps and what weight would be used in the random forest. (Yes, in this data set these are actually aggregate values. For discussion, we are pretending they are individual sales).
When we look at the list of comps for our subject, they make sense and help us understand how the model has arrived at a value. The top-ranked properties are very close in geographic location, age, and livable area (total_rooms). These are some of the most important attributes in valuing a home, known as the “golden characteristics.” Interestingly, almost half of the weight is assigned to the top property which is nearly identical to the subject. This tells us that the valuation may be too dependent on that one sale, which looks a little overpriced. In our commercial AVM model, when we make a prediction, we return a confidence score. But, these comps can give us a much deeper insight into how the model arrived at a conclusion. Is the prediction heavily dependent on a few property sales? Are there outlier comps that may be distorting the value? Was the model able to find comparable home sales or did it have to pull in relatively dissimilar homes? These are questions that we can answer with a more detailed breakdown.
Explaining other models
In fact, this explainability idea is not limited to random forests. Any model which implies a local similarity score can be used. For example:
- Linear regression with regression coefficients Bi. We can define the L¹ norm between a subject X and a training data point Y via
- Neural networks for regression. In general, these will define an embedding space in the last layer, then weights to form the prediction. We can proceed as per regression, using the weights over the embedding space.
There are a number of deep ties between machine learning methods and similarity metrics. [5] explores this topic more thoroughly.
Further applications
Comp selection: Once we have computed comps for a given subject, there are several interesting applications. For one thing, these comps can be used directly as part of a valuation review or appraisal. We can give these comps to our appraisers as candidate properties to look at when building a report. Hence, our model can act as a kind of “search engine” for similar properties.
Comp adjustments: Another application is that when comps are used in a formal appraisal, the prices are usually “adjusted” to account for various feature differences. For example, if a comparable property has a larger living area we would apply a negative adjustment when matching it to the subject [6]. Can we estimate and explain adjustments automatically by using the above technology? Yes we can! The result looks very much like what appraisers do when they find paired sales to estimate the value of various feature differences. We plan to describe this in a future article.
Competing sales index: A third application is that using these comps, we can more deeply understand the market for a given property. Specifically, price trends in a general area (zip or county) may not match what is happening for a particular home. For example, during COVID-19 “home shoppers are looking for more space, quieter neighborhoods, home offices, newer kitchens and access to the outdoors...” [qtd. Ratiu in 7]. As a result, some types of properties have strongly appreciated over the last year, while others (with less space, worse kitchens, . . . ) have not. If we build a model over a longer time period we can pull the historical comps, using the above technique, to find relevant sales through time. These comparables then form a competing sales index for a particular home.
Conclusion
In our industry, there has been a lot of discussion about when you should use an AVM for pricing a home, and when a full appraisal is needed. We think this is a false dichotomy. In fact, an AVM can be more than just a single price. It can also be used as a tool for humans to more deeply understand the market for a home and help them reach a value conclusion. We think that instead of AVMs replacing appraisers, these tools can be used to help appraisers by gathering relevant data, and explaining any conclusions in a transparent way.
Acknowledgments
I’d like the thank the Clear Capital research team and especially Matt McAnear for helpful comments and R code improvements.
References
- Blythe, Janda, Pettibone. “Predicting Housing Prices — Data Analysis Project.”
- Kaggle. “California Housing Prices.” http://www.kaggle.com/camnugent/california-housing-prices
- Wright, Wagner, Probst. “A Fast Implementation of Random Forests.”
- Hastie, Tibshirani, Friedman. “The Elements of Statistical Learning.” Second Edition. 587–601.
- Chen, Garcia, Gupta et. al. “Similarity-based Classification: Concepts and Algorithms.” Journal of Machine Learning Research 10 (2009) 747–776.
- Cleveland Appraisal Blog. “How Appraisal Adjustments Work.”
- Taylor. “COVID-19 Has Changed The Housing Market Forever. Here’s Where Americans Are Moving (And Why).” Forbes Oct. 2020.