Checking in on the latest advancements, and the challenges that remain.

There’s been no shortage of hype about the relationship between cities and data, especially so-called big data. For large numbers of tech companies, cities, and even a growing number of urbanists, data promises to solve all manner of urban problems, from predictive policing to improving traffic flow to promoting energy efficiency.

An even bigger potential role for new kinds of data lies in helping researchers and policy-makers better understand how cities and neighborhoods grow and evolve—but only if done right.

The legitimately exciting use of new data

A growing number of researchers are using data from internet sources such as Google, Twitter, and Yelp to develop new insights into cities and urban change. The sociologists Robert Sampson and Jackelyn Hwang have used Street View images to examine the role of race in the process of gentrification and neighborhood transformation. Similarly, a study from the U.K. Spatial Economics Research Centre used geo-tagged photos on Flickr to determine levels of urbanity in London and Berlin. Mobility data from Uber and Lyft—and even taxicabs—has also been used in several recent studies, which my CityLab colleague Laura Bliss and former colleague Eric Jaffe have chronicled. Data from real estate sites such as Zillow and Trulia is also being used to analyze housing price trends across neighborhoods, cities, and metro areas.

Other research has used reviewer data from Yelp to study gentrification and unequal urban consumption patterns. One study used Yelp reviews to shed light on the connection between gentrification and race in Brooklyn. Another NBER study employed Yelp data to find out how ethnic and racial segregation affects consumption levels in New York City.

Twitter data has been used to chart regional preferences and patterns of behavior. A study from the Oxford Internet Institute mapped the flow of online content and ideas across cultures. The cartography blog Floating Sheep has used data from Twitter, Google, and Wikipedia to map everything from beer and pizza to weed, bowling, and strip clubs. And my own team has used data from MySpace to track the leading centers for popular music genres across the U.S. and the world.

....

While big data may ultimately be able to advance our observation of and theories about cities, a growing number of scholars urge caution in using it. A 2014 workshop, which brought together 40 or so leading urban social scientists and data users, identified six key issues surrounding big data, spanning data quality and compatibility, the use of new analytical techniques, and questions of privacy and security. As the workshop summary notes:

Developing theory to go with the new methods and data is critical, and is often sidelined. Engineering and control theory (or big data “without theory”) work well when there is a measurable outcome, a simple policy to correct for it, and fast enough reaction time that the correction can be implemented while it is still appropriate. In cities, this is the process used to optimize service delivery. But this theory does not work well for complex systems with long time horizons, like most social systems.

In other words, big data and new data analytics are only as good as the questions we pose and theories we generate to better understand them. No matter how powerful they may be, new data sources and analytic techniques are no real substitute for nuanced human reasoning about cities. The real power of course lies in using these new tools to test and deepen the insights of cutting-edge urban theory. My own hope is that we can eventually combine them in ways that deepen our understanding of the underlying “urban genomics” of neighborhoods, cities, and urban areas.