If we start from new loans to train the computer, we have no real test until lots of defaults happen. If, as mentioned above, only 2.5% of mortgages default in normal situations, it will take a long time to accumulate more observations than there are variables to look at. The machine can’t test itself until there are hundreds or thousands of defaults to compare, even assuming there was not a special case like a financial crisis that skews the numbers. Our only real hope is that the defaults that do happen occur very quickly in the life of the mortgage, first 3-5 years or so, in which case in a decade we will probably have good amounts of data. I don’t know how long it takes the average mortgage default to happen, so it might work, or it might not.
Defaults do tend to occur early in the life of a mortgage., Over time, there is usually equity buildup due to paydown of principal and rising home values, so that seasoned loans tend to perform well. This was true in 2007 and 2008–loans originated in 2003 or earlier were not prone to default. But the commenter’s points (read the whole thing) are still well taken.
With chess, a database of games is probably very representative of all of the circumstances that the computer is going to encounter. That is not true with mortgage lending.
I read recently that average credit scores are currently the highest they have ever been. Does that mean that making a loan right now is safer than it’s ever been? I doubt it. If conditions are unprecedented, then obviously they cannot be represented in the database.
Russ Roberts’ podcast with Rodney Brooks also elaborates on AI skepticism.
AI should be reporting the error bands on its discoveries. The error bands are priceable insurance indicators. So mortgage bot gives its best guess and insurance bot can auto price the insurance.
What makes AI work is not sufficient data, but management of information flow under known, bounded uncertainty. Mosfly about quantifying what is missing in the data.
If we’re taking about neural networks under the banner of “AI”, their impressive achievements recently have come from one category of data. Lots of examples, millions usually, with a known outcome.
Example one, image recognition – millions of images, correctly classified by people, train neural networks. The results are outstanding but without the large volume of data and the “correct answer” this wouldn’t be possible.
Example two, the game of Go – hundreds of thousands of high level games were used to train the AlphaGo, then it improved further by playing itself millions of times. Later Deep Mind created AlphaZero which wasn’t trained on any games, just played itself millions of times. But the rules are definite. The outcome is not in doubt at the end (count up the territory to determine the winner).
It’s hard to see how this translates to some major breakthrough in all areas of life and business. New work by Deep Mind is impressive – this time for seeing cancers of different types. Beats top experts and obviously can lead to much cheaper medicine, better outcomes for humans. But it’s image recognition, trained on a large dataset with the correct answers attached. That’s how neural networks function.