Benefit-cost integrated assessment models (BC-IAMs) inform climate policy debates by quantifying the trade-offs between alternative greenhouse gas abatement options. They achieve this by coupling simplified models of the climate system to models of the global economy and the costs and benefits of climate policy. Although these models have provided valuable qualitative insights into the sensitivity of policy trade-offs to different ethical and empirical assumptions, they are increasingly being used to inform the selection of policies in the real world. To the extent that BC-IAMs are used as inputs to policy selection, our confidence in their quantitative outputs must depend on the empirical validity of their modeling assumptions. We have a degree of confidence in climate models both because they have been tested on historical data in hindcasting experiments and because the physical principles they are based on have been empirically confirmed in closely related applications. By contrast, the economic components of BC-IAMs often rely on untestable scenarios, or on structural models that are comparatively untested on relevant time scales. Where possible, an approach to model confirmation similar to that used in climate science could help to build confidence in the economic components of BC-IAMs, or focus attention on which components might need refinement for policy applications. We illustrate the potential benefits of model confirmation exercises by performing a long-run hindcasting experiment with one of the leading BC-IAMs. We show that its model of long-run economic growth-one of its most important economic components-had questionable predictive power over the 20th century.