WHERE’S THE ROI?

IS THERE A DATA PARADOX?

In 1987, Robert Solow[1] famously quipped that “You can see the computer age everywhere but in the productivity statistics.”[2]  His observation was that, despite the immense investments in computer and communications technology, official productivity growth had slowed considerably.  More generally, and paraphrasing a bit, the investments were obvious, but not necessarily the return on those investments (ROI).  Now, more than 50 years into Moore’s law (loosely put as the power of computers – more precisely the number of components per integrated circuit – would double every 1-2 years), you might think this “productivity paradox” had been put to rest.

It was resolved, at least for a time.  Productivity growth picked up in the late 1990s and many people pointed out that major technological advances take time and learning before their benefits are realized.  Parallels with the diffusions of electricity and the steam engine appear to support this.  Further, measured productivity may not reflect many of the benefits of new technologies, as they permit new goods and services to be created.  More recent studies, and the sluggishness of productivity gains, have reinvigorated the debate.  For example, Acemoglu, et al published a study “Return of the Solow Paradox? IT, Productivity and Employment in US Manufacturing.”[3]  This view has been popularized by Tyler Cowen in his book The Great Stagnation.[4]  Work at the Federal Reserve System specifically addresses measurement issues with productivity, concluding that “We find little evidence that the slowdown arises from growing mismeasurement of the gains from innovation in IT-related goods and services.”[5]

I will briefly reexamine the arguments in the context of my subject here:  is there a data paradox:  we see big data everywhere but in the economic statistics?  In many ways, data technologies (big data, data science, analytics, artificial intelligence, etc.) are extensions of the technological changes associated with computers, IT, and communications.  Virtually every think tank and government agency has pronounced artificial intelligence (and associated data technologies) as a primary engine for economic growth.[6]  For technologies with such far-reaching effects, shouldn’t we be seeing these effects in productivity or other economic statistics?[7]

There are several reasons why productivity data may not reflect fundamental changes – and these arguments were raised in response to the 1987 productivity paradox.  First, it takes time for people and organizations to learn how to utilize new technologies.  It doesn’t happen overnight.  Second, the way we measure productivity has a difficult time accounting for changes in the quality of goods and services, or the development of new products and services.  Think about using your smartphone when you are lost, rather than asking a stranger for directions.  How would this show up in the economic data?[8]  Third, as the economy continues to evolve from manufacturing to services, productivity gains become more difficult – this is the low-hanging fruit metaphor.  It is getting harder and harder to create productivity gains in sectors that are labor intensive, e.g., health services, education, personal fitness, etc.

These same considerations apply to data.  It takes time to develop access to data and then learn how to use it and benefit from it.  Entirely new uses are being developed (e.g., AI potentially will cause our transportation infrastructure to be reinvented with large scale development of autonomous vehicles).  Initial uses of data will target the low-hanging fruit (e.g., some financial services), leaving the harder areas (e.g., medical services) to experience the diminishing marginal returns to investments in data.  Experience with electronic health records (EHRs) is instructive.  The potential improvements in health care, and potential cost efficiencies, are undeniable.  But the experience to date is quite different.  Installation of EHRs, entry and maintenance of the data, and training of health care professionals have, arguably, increased the cost of health care services at this point in time.

Still, for all the hype around Big Data, we should be seeing some evidence in the productivity statistics.  I’ve looked for it and will show what I’ve found.  First, a short digression in how productivity is measured.  There is a large area of research into measurement of productivity.  The core concept for any productivity measure is how much output is obtained for each unit of an input used?  The details get complex quickly since both outputs and inputs are more than just a physical counting of the number of units produced or consumed.  Are all automobiles equal, so that we can just count how many are produced?  Are all workers the same so that we only need to count how many hours of work were involved in producing those automobiles?

At the risk of oversimplification, there are two major types of measurements:  labor productivity, which measures the amount of output per unit of labor; and multifactor productivity, which measures the output per unit of combined inputs (usually labor, capital, and perhaps energy and information property).

Output is difficult to measure.  Services are harder to quantify than products, due to difficulties in measurement of output quality.  Usually, output is measured by dollars of value rather than physical units, in order to partially address these issues.  Measuring output in dollars also requires that we account for changes in prices, so a price index is used to get a dollar measure of output.  Then output is measured for industries and sectors of the economy (the Bureau of Labor Statistics primarily uses NAICS codes, which come in a variety of disaggregated “industries,” such as automobiles, or automobile parts).  Measuring outputs, prices, and accounting for quality changes is inherently problematic.  This accounts for many differing analyses regarding the (in)accuracy of official government statistics on productivity.

The problems continue with the measurement of inputs.  Labor is usually measured as the number of workers or the number of hours of work.  However, all labor is not equal, so labor productivity measures do not reflect whether the hours are worked by a minimum wage worker or the CEO (and I’m not saying which is more productive).  For multifactor productivity measurement, some adjustment is made for education and experience levels of workers, aside from just using the number of hours worked.  Measuring capital has its own challenges, but generally the idea is to measure how much of the capital stock is actually used in production during a given time period (e.g., the rental price to use a crane for the 1000 hours it was used during the year).  Then, for some measures, material inputs (e.g., commodities), energy, and services (e.g., legal services) are included.  The result is a measure of multifactor productivity.

With that overly simplified introduction, what does the productivity data look like?  Here are broad measures of annual productivity (private nonfarm sector) changes over the past 20 years:

The Bureau of Labor Statistics provides more disaggregated industry productivity data, although only labor productivity is generally available.

A few industries are highlighted here:  Computer manufacturing, and electronic shopping show large productivity gains in the 1990s, and wireless telecommunications following that, while hospitals (blue) show little or no gains and the relatively flat “all other industries” (black) closely matches the previous graph.  In general, labor productivity growth is modest, but the few industries with large gains shown here provide tempting conclusions that data technologies might be showing up in these IT-intensive industries.  However, I would note that accounting services and gambling do not show similar gains, matching closely the overall trend in the other industries.  Also, note the particularly slow to no gains in the hospital sector, where IT and data technologies have been invested in heavily (even Personal care services show larger labor productivity gains than hospitals).

 

Given all the measurement difficulties with productivity, perhaps these broad brush pictures are not capable of revealing the impacts of data technologies.  So, let’s look at some more targeted analysis – instead of productivity, let’s look at areas where the availability and analysis of data has been particularly advanced.  In financial services, predictive modeling is widely used and we might expect that financial institutions would have a good idea of who to loan money to and the risks of default on various types of loans.

The Federal Reserve Bank publishes charge-off rates (default rates) for various types of loans issued by commercial banks.  Here is the seasonally adjusted data for the past 23 years:

Agricultural loans were very risky (high default rates) in the mid-1980s.  Consumer loans and credit cards are much riskier than the other types of loans and the spikes during the recessionary periods are apparent.  What is not apparent is any systematic reduction in default rates.  Despite the prevalent use of sophisticated predictive modeling, why haven’t default rates been reduced?

Several reasons are possible.  Perhaps more loans are being made to more people, so that the relatively static default rate reflects increased risk associated with this increased loan volume.  It may be that the interest rates have been increasing to reflect increased risk of default.  I looked at both loan volumes and interest rates but did not find any evidence of this.  Perhaps the overall industry performance has not improved, but the use of analytics has been primarily a competitive weapon.  Every financial institution invests in data technology in order to perform better than their competitors, with little overall improvement for the whole industry.  Analytics may have also decreased the costs on the industry – this would permit more loans (and even more risky loans) to be made without necessarily increasing interest rates.  So, the lack of declining default rates is not conclusive evidence of anything.

The difficulties of finding evidence of the value of data technologies for loan performance may be related to focusing on default rates, since default rates will naturally vary for different levels of risk.  One of the primary benefits of predictive modeling is supposed to be that it enables more precise evaluation of risk.  If so, interest rates should vary with risk more precisely due to more widespread and sophisticated data modeling.  Data on interest rates by risk class are hard to find – publicly, at least.  One commercial provider (creditcards.com) does show interest rate trends for different types of credit cards, from 2007 to the present.  The graph on the left shows the interest rates for people with bad credit (high credit risk) compared with the average interest rate on credit cards, and the graph on the right shows business credit cards (presumably low risk):

The differentiation of rates according to risk is clear.  However, the differential appears to be fairly constant since around 2010.  As big data has become more readily available, and as predictive modeling techniques have advanced, we might expect to see these differentials widen.  We do not, but perhaps more granular data on credit cards issued to particular individuals is needed to see this.  Unfortunately, that data is proprietary.[9]

Another area where we might expect to see tangible benefits of data technologies is with recommendation systems.  These are data science applications that recommend products to consumers on the basis of past choices that they and others have made.  We are all familiar these applications, from product recommendations to movie ratings.  Amazon provides data on ratings for a variety of products.  I looked at the largest category – books – which was the original business line for Amazon.  They provide over 22 million book ratings over a 20 year period.  I would expect the average book rating to increase over time if these recommendation systems are working as intended – readers should be drawn towards books with higher ratings.   Here is the average book rating over time and the variability (measured by the standard deviation of the ratings) of these ratings over time:

Improved ratings are again not apparent.  The standard deviation of the book ratings by year does show a decrease over the last 10 years in the data.  This is consistent with ratings becoming more accurate over time so that new reviews don’t deviate as much from older reviews:[10]

But it is complicated to unravel the multitude of things that can influence ratings over time.  It is possible that many more people are using Amazon (the number of ratings has increased significantly) and that they are reading more widely, even books that are not recommended for them.  Further, the distribution of ratings is highly skewed:  there are over 8 million different books in this data and almost half of these received only a single review (possibly written by the author?).  Perhaps a more meaningful analysis would focus on the books with a large number of reviews spread over a number of years.  I selected the top 10% of books showing the most reviews, and then only looked at books with at least 10 years of reviews.  The following heatmap shows these 37,848 books organized by the year of their first review and the average rating they received in each of the years:

Darker regions reflect higher average ratings.  The dark diagonal boxes at the bottom show that these books receive high ratings in their first year.  As you move upwards for any year, you see that ratings drop off but then increase in the last few years.  Perhaps this is an indication that the ratings are steering people towards books that they are likely to like.  This is hardly a definitive analysis, but it is not compelling proof that this recommender technology is working as intended.

But the Amazon data is ratings data, not recommendations data.  In other words, Amazon shows how past readers rated these books, but these are not personalized recommendations.  A recommendation system uses information about past reviewers and each individual to make personalized predictions about how an item will be rated.  These are especially data rich applications of big data technologies.  Unfortunately, most data on recommendation systems is proprietary, due to the potential commercial value embodied in such systems.  The best known publicly available data was the data released for the Netflix prize competition.  That million dollar prize competition tasked competitors with achieving a 10% increase in the accuracy of movie recommendations:  accuracy was measured by how close the recommended rating was to the actual rating of the movie by the viewer.  The data and the task are enormously complex, and you can find detailed analyses elsewhere.[11]  The Netflix competition winners achieved an accuracy that, on average, missed the actual movie ratings by approximately 0.85 on a 5 point scale.  In other words, the system might have predicted a viewer would give a movie 4 stars, when in fact, the viewer rated it with 3 stars (0.85 means it was on average a bit better than that).  This is quite an achievement for such a complicated task, but the degree of accuracy does beg the question of how much such applications will move the needle of productivity measures.

Conclusions

So, is there a Data Paradox?  I would propose that there is, to an extent.  For all the investments that have been made, the evidence for a positive return on that investment is proving elusive.  We should expect to see more concrete improvements in productivity (however measured) for all the attention that big data has received.  Instead, there are plenty of anecdotes, too many of which are provided by the vendors of Big Data products.[12]

After 30 years, the productivity paradox has not been laid to rest.  Perhaps the data paradox will take less time.  We would still like to see the advances appear in the economic data.  Just as with other major technological advances, such as computing technologies (and electricity before that), it may well show up given enough time.  I am not suggesting that data analysis is not productive.  What I want to point to is that reaping the benefits of data technologies will require more than just investing in them.  Much of the investment has been in the systems to collect and provide access to data.  What may be lacking is sufficient investment in the skills needed to produce actionable insights from data.  This includes evaluation and improvement of data quality, and relevant business knowledge combined with analytical skills.  This is what will resolve the Data Paradox.

DALE LEHMAN, DIRECTOR, EMBA IN BUSINESS ANALYTICS, LORAS COLLEGE


[1] Winner of the 1987 Nobel Prize in Economics – four of his PhD students went on to win Nobel Prizes as well.

[2] Robert Solow, “We’d better watch out,” New York Times Book Review, July 12, 1987, page 36.

[3] Acemoglu, Daron; Autor, David; Dorn, David; Hanson, Gordon; Price, Brendan (May 2014). “Return of the Solow Paradox? IT, Productivity, and Employment in US Manufacturing”. American Economic Review. 104 (5): 394–99.

[4] Tyler Cowen, The Great Stagnation:  How America Ate All the Low-Hanging Fruit of Modern History, Got Sick, and Will (Eventually) Feel Better, Dutton publisher, 2011.

[5] Byrne, David M., John G. Fernald, and Marshall B. Reinsdorf (2016). “Does the United States have a Productivity Slowdown or a Measurement Problem,” Finance and Economics. Discussion Series 2016-017. Washington: Board of Governors of the Federal Reserve System, http://dx.doi.org/10.17016/FEDS.2016.017.

[6] For a good summary of official US and international governmental statements on the role of artificial intelligence, see “A perspective on Administration Activities in Artificial Intelligence,” by Jim Kurose, Assistant Director for Artificial Intelligence, Office of Science and Technology Policy, June 5, 2018, available at https://www.nist.gov/document/iii-bjkadminaipdf.

[7] A recent NBER study, “AI and the Economy,” by Furman and Seamans, 2018, NBER Working Paper No. 24689 documents the large impacts of AI on the economy – using data on the technological investments being made.  This begs the question of whether these large investments will witness correspondingly large productivity gains.  There are plenty of proponents that they will, as well as some vociferous skeptics.

[8] Proper measurement of changes in productivity due to cellular communications technologies would need to account for a variety of new services it enables, the replacement of marketed goods by non-market goods (e.g., maps being replaced by your phone), and the myriad ways in which these technologies impact the use of our time and energy (not all of which are positive – consider the “nonproductive” uses of smart phones).

[9] An industry insider has told me that their default risk models do a fairly good job of ranking individuals according to risk, but are far less successful at predicting overall risk probability.

[10] The decrease in the late 1990s remains unexplained, but I would note that relatively few reviews occurred in those early years.

[11] For a thorough analysis, see “Predicting Movie Ratings and Recommender Systems,” by Arkadiusz Paterek, one of the leading entrants in the Netflix prize competition, available at http://arek-paterek.com/book/.

[12] For a well-articulated and extremely skeptical view of vendor-generated hype, see Stephen Few, Big Data, Big Dupe, Analytics Press, 2018.