All hail the null hypothesis

From a couple of years ago, by Jon Baron of the Arnold Foundation (no relation).

Business: Of 13,000 RCTs conducted by Google and Microsoft to evaluate new products or strategies in recent years, 80 to 90 percent have reportedly found no significant effects.[iv]

Medicine: Reviews in different fields of medicine have found that 50 to 80 percent of positive results in initial clinical studies are overturned in subsequent, more definitive RCTs.[v] Thus, even in cases where initial studies—such as comparison-group designs or small RCTs—show promise, the findings usually do not hold up in more rigorous testing.

Education: Of the 90 educational interventions evaluated in RCTs commissioned by the Institute of Education Sciences and reporting findings between 2002 and 2013, close to 90 percent were found to produce weak or no positive effects.[vi]

Employment/training: In Department of Labor-commissioned RCTs that reported findings between 1992 and 2013, about 75 percent of tested interventions were found to have found weak or no positive effects.[vii]

Pointer from Michael Goldstein.

Re-framing David Cutler’s proposal

David Cutler writes,

Administrative costs in the health-care system are a classic public good. Payers and providers may together agree that standardizing billing codes and quality reporting would be valuable, but no single actor has an incentive to pay for standardization when others will benefit as well. For example, if insurer A chooses to harmonize its policies with insurer B, that lowers administrative costs across the board and thus fees that all insurers collectively need to pay. However, insurer A will not take these cost savings to other insurers into account. As a result, insurer A will be discouraged from investing in harmonization.

Pointer from Tyler Cowen.

As if there were no incentives anywhere for the private sector to solve this problem. But let me re-frame this from the perspective of an entrepreneur making a pitch to a venture capitalist.

“Doctors and hospitals have a big pain point in that their staff needs to fill out different claim forms for different insurance companies. CutlerMedForms has the solution. We will provide a software application that allows administrative staff to fill out a single, easy-to-understand on-line form. They simply check which insurance payer to whom to submit the bill, and our software fills out that insurance company’s form with the proper insurance codes. We estimate that providers can save $X billion of dollars in administrative costs using our software, making this a large profit opportunity for CutlerMedForms.”

Someone reading this might be skeptical that the profit opportunity actually exists. By the same token,, one should also be skeptical that the “classic public good” really exists.

Why not more women CEOs?

From a article in the WSJ;

Over the past year, 307 companies in the Russell 3000 Index appointed new CEOs, according to Equilar. Only 26 of those were women—and 17 female CEOs stepped down or were ousted during that time.

We are supposed to believe that the absence of female CEOs indicates the business discriminates against women, or at least is unfair to them. I doubt that this is the case.

1. I have an alternative hypothesis, which is that the CEO job in a large company requires that, among other talents, facility with working with abstract systems. Feel free to re-read my essay on how corporations are managed.

the corporate CEO, operating with a limited information set that arrives indirectly, must use more abstract thinking. We may think of the CEO as trying to navigate in a confusing forest using only little scraps of a map. The CEO operates with a theory of the business and fits those little map scraps into the theory.

This sort of thinking is sometimes called “systemizing.” You find it more often in men than in women. Women tend to be better at empathizing, which is more helpful in small businesses or in functions like human resources. Of course, this does not mean that any given man is better suited to being a CEO than any given woman. But perhaps, just as in chess, you can have unequal gender outcomes at the top that are not the result of discrimination.

2. If women do face discrimination, then there ought to be a profit opportunity in selecting women to be CEOs. Just as the first baseball teams willing to break the color barrier were more successful than the laggards, so companies that shatter the glass ceiling should have an advantage.

I don’t know what the data say about comparative CEO performance. But the fact that almost as many women exited CEO jobs as entered CEO jobs suggests to me that corporations that hire women as CEOs don’t get an automatic windfall.

A sub-Dunbar business mindset

I enjoyed listening to Erik Torenberg interview entrepreneur Zack Kanter. I had a hard time following the software jargon, and I would welcome explanation, because it sounds as if there were some important ideas there. I was interested in the speculation that software applications that are easy to build have been built, and what remains are more challenging applications. I found that hypothesis difficult to evaluate.

But what struck me most is that Kanter thinks he has discovered some great insight that he can run his company without meetings, product plans, or other formalities, and that a few great engineers is better than a lot of good engineers. But he is running a company with a number of workers way under the Dunbar number of 150. If you can build a functioning business with that small a team, then good for you. But some businesses require getting large organizations to cooperate or buy, and that means you need a sales force. Some businesses operate in or near highly regulated industries, so that means you need many lawyers and lobbyists.

Once your business requires more than about 150 people to operate, tell me how well your informal management methods are working.

Evaluating organizational effectiveness

Tyler Cowen writes,

The US funds more science research than any other country — about $35 billion per year on the NIH and $8 billion per year on the NSF. How exactly do these institutions work? How have they changed over time and have these changes been for good or bad? Based on what we now know, how might we better structure the NIH and NSF? What experiments should we run or what kind of studies should we perform?

This is the first in a long and varied list of areas he thinks are worthy of further study. One more example:

Indonesia is a large, populous middle-income country. It faces no major near-term security threats. It has a small manufacturing base and no major non-commodity export sectors. What is the best non-bureaucratic 10 page economic development briefing document and set of prescriptions that one could write for Indonesia’s president? For Indonesia, substitute Philippines, Chile, or Morocco.

Many of the topics in Tyler’s list involve attempts to improve or evaluate organizational effectiveness. I would say that in evaluating an organization, look for common flaws, listed below. Give high marks to organizations that are able to avoid these pitfalls.

1. A good mission statement will serve to narrow the purpose of an organization. It will remind everyone what the organization will not attempt to do. In badly-run organizations, the scope of the organization is unclear.

2. The organization should have a formal planning process. About once a year, or once every other year, the organization should evaluate past performance and set future goals. Middle management as well as top management should be involved in this planning process, in order to try to achieve alignment between strategic goals and departmental activities. In badly-run organizations, departments run on auto-pilot without any strategic direction.

3. Borrowing terminology from Morrisey, et al, The planning process should include Key Results Areas and Indicators of Performance. For example, a city could have a Key Result Area that is reducing traffic congestion, and an Indicator of Performance that is the number of workers who are able to commute during rush hour in less than 30 minutes. Middle managers strongly resist KRAs and IOPs. Instead, they prefer to be measured on the basis of activities–how many traffic lights they installed, or how many potholes they filled. A grant-making organization that measures how many grants get approved rather than anything related to the results from making those grants is operating on auto-pilot. In badly-run organizations, departments do not articulate meaningful KRAs and IOPs.

4. Organizations need to periodically adjust their incentive systems. Top management wants maximum effort with minimum outlays. Employees and other recipients of funds want the opposite. Over time, the compensation system degrades, due to changes in organizational goals and due to recipients learning how to game the system. Badly-run organizations leave ineffective compensation systems in place.

5. Some departments or projects falter. Can the floundering projects or departments be put back on track at a reasonable cost? If not, then they probably should be shut down. Badly-run organizations are unwilling or unable to identify and deal with low-achieving activities.

6. Organizations need periodic adaptation, including restructuring. The environment changes–think of the effect of new computer and communications technologies on many areas. Badly-run organizations fail to adapt to changes.

My guess is that you could use this framework to evaluate many of the institutions mentioned in Tyler’s list. But in the case of government agencies or non-profits, will such evaluation make a difference?

Essay backup: Reid Hoffman and Patrick Collison, annotated

Below is the last of the essays from Medium that I need to back up. I have found Medium to be a major disappointment in every respect. My own essays there got zero promotion, as far as I can tell–they would have been as widely read if I had written them as blog posts.

The essays that are promoted to me by Medium (I am not clear on how they mix human curation with an algorithm) in their daily letter are almost invariably insipid. My impression is that 95 percent of the writers on the site cater only to the dogmatic and extreme left.

In the days when the blogosphere was the main form of self-published writing on the Internet, I think that the decentralized editorial curation process worked pretty well. With the demise of the blogosphere, and what are we left with? Twitter? Ugh.

A few months ago, a commenter on this site predicted that the intellectual failure of Medium would be followed by financial failure. That possibility alarmed me because, unlike all of my other essays, my Medium essays were not originally written on my computer. So I have no backups should Medium suddenly shut down. Hence, these backups.

I hope that Medium continues and that my essays stay there, where they are properly formatted. But going forward, I will place my essays elsewhere, primarily on this blog.

Continue reading

On wicked problems and public policy

Following a trail from this comment, I got to a 1973 paper by Horst W. J. Rittel and Melvin M. Webber.

The problems that scientists and engineers have usually focused upon are mostly
“tame” or “benign” ones. . .the mission is clear. It is clear, in turn, whether or not the problems have been solved.

Wicked problems, in contrast, have neither of these clarifying traits; and they include nearly all public policy issues-whether the question concerns the location of a freeway, the adjustment of a tax rate, the modification of school curricula, or the confrontation of crime.

The paper is filled with insights, such as

In the sciences and in fields like mathematics, chess, puzzle-solving or mechanical engineering design, the problem-solver can try various runs without penalty. Whatever his outcome on these individual experimental runs, it doesn’t matter much to the subject-system or to the course of societal affairs. A lost chess game is seldom consequential for other chess games or for non-chess-players.

With wicked planning problems, however, every implemented solution is consequential. It leaves “traces” that cannot be undone. One cannot build a freeway to see how it works, and then easily correct it after unsatisfactory performance. Large public-works are effectively irreversible, and the consequences they generate have long half-lives.

The paper is also notable for the way in which it describes–in 1973–the fallibility of experts relative to technocratic expectations.