How much of the Earth’s land surface would we need to cover in solar panels to provide the entire population’s energy needs? Not much, according to the Land Art Generator Initiative, who in 2009 calculated that we would need to tile approximately 496,805 square kilometres of the world’s landmass with solar panels to provide humanity’s projected 2030 energy needs. That sounds like a lot, but it is in fact only 0.35% of land not covered by water.

That’s pretty incredible, and seems like a completely achievable goal, though it’s a little simplistic. We can’t just put all those panels in one place, and many regions may be more suitable for different clean power sources. In fact, in an update, the original article points out the need to consider the diversity and distribution of renewable energy production. Nevertheless, the thought that it could be so “easy” is tantalising, and the idea of 100% renewable energy is being seriously pushed by organisations such as The Solutions Project.

So how do we go about figuring out what types of renewables should go where? It’s actually an incredibly complex decision that needs us to think about geophysical factors, including the weather, climate, land type, elevation, slope, and human factors such as population density, infrastructure, grid characteristics, energy markets, and more. If you’re the person making decisions about where to put the next power plant, that’s a lot to consider. As a result, there’s a whole industry around this problem, which relies on a great deal of human expertise.

This seemed to me like a perfect opportunity to see if we might be able to make better, faster decisions using machine learning. There’s thousands of existing renewables sites that we could train a computer to learn from. Why not just try to teach an algorithm to understand what characteristics at each location made those sites good places to build renewables, apply the formula to the rest of our available land, and see where the best sites energy production are across the planet?

Features of existing renewable energy sites.

At the University of Washington’s GeoHackWeek 2016, I pitched the idea and formed a project to test it out. Within the time constraints of a few hacking days, the aim was to take the locations of renewable energy sites across the US with a small number of geographical features, and use the information to train a computer to decide which types of renewables should be placed across the entire country. To make things a little easier, we limited ourselves to looking only at large scale solar, wind and geothermal power.

Our choice of machine learning method was the random forest classifier. A classifier is an algorithm that learns to predict a class (a choice from a limited number of options). The random forest technique does this by building a series of decision trees. In our case we tell it the features of every location where there is an clean energy site, and what type of renewable is there. It then makes multiple different decision trees that can all be used to figure out which features lead to each renewable. This is our forest. When we give it new data point with an unknown class - in our case, the data about a particular location at which we want to know the best renewable energy suitability - it runs through all of the decision trees and takes the majority vote from their choices to decide which renewable is best for that location.

Single decision tree for site suitability prediction.

Data for the renewable energy locations came from the EIA, and we gathered elevation, slope, aspect, solar irradiance, wind speed, thermal gradient, and population data. Our platform of choice for the geospatial machine learning problem was Google Earth Engine. It takes advantage of Google’s hardware resources to complete huge geospatial computations at very high speeds, and has built in machine learning capabilities. We deployed their random forest algorithm, to produce a map that predicts the suitability for wind, solar, and geothermal energy production at an incredible 4 kilometre resolution across the entire country. We also set certain sites as off limits, such as cities, forests, and bodies of water.

The final map shows both existing renewables sites, and the entire US colour coded for the the best choice of renewable out of the three considered at each point on the 4 kilometre grid. Even just considering a few variables, we can see things that make sense; the Midwest is mostly classified as best for wind energy, while the South has more regions suitable for solar, and the algorithm predicts the optimal sites for geothermal are centred around mountains and volcanoes. From the magnified map section we can also see that on a local level there are high resolution distinctions between areas chosen for each type of energy production.

Map of the USA classified by suitability for wind, solar or geothermal energy.

This was just a prototype hack, but it was still pretty cool to see how it might be possible to determine renewable energy suitability for an entire country at a high spatial resolution within a few clicks. It fits in nicely with the vision of being able to come up with a renewable energy plan for the whole world, while giving decision makers the power to create optimised energy strategies on a local level. In the real world, the situation is definitely more complicated. We didn’t include many relevant features, take into account when the existing renewables sites were constructed, and we definitely didn’t consider complex factors such as economics, local policy, or power distribution.

As to whether a classification machine learning approach is the best method for this problem, I can think of some advantages and limitations. Firstly, any machine learning technique is building understanding from past decisions about where to build power plants. On the one hand this is good, because we are automatically building in human expertise and concerns into our model without explicitly defining them, however there’s no guarantee that previous decisions were always good ones, or are applicable to future scenarios. The modern grid, politics, energy markets, and renewable technologies themselves are all in a state of evolution, but maybe with carefully selected features this can all be accounted for.

There’s also the issue that the end results from a classification model only tell us which renewable type from a selection is best suited to a location. It tells us nothing about how the different types compare at each site, or whether one location is better for a particular type than another. It’s not all that useful to only know where in your country is more suited to one type of energy than another, without then knowing where the best possible sites to build your power plants are, or what mix of energy types you should use. We had a brief look at regression algorithms to achieve this (those that yield a quantity, rather than a simple choice), with the vision of creating a heatmap of relative site suitability, and it would definitely be interesting to explore that further. Another option might be to use formulae defined by human experts, and apply global optimisation methods to generate a map of relative site suitability.

There’s certainly interesting directions that the project can take in the future, and hopefully I’ll find time to come back to it! More technical details on the hack week project, the data, and the code can be seen on its GitHub repository. Many thanks to my fellow team members, Laura Hinkelman, Sam Hooper, Julia King, and Rachel White for agreeing to pursue the idea, and to Catherine Kuhn for all her assistance. Also to Anthony Arendt and the rest of the GeoHackWeek team for putting on a brilliant event!

Here’s a list of other renewable mapping projects and organisations that served as inspiration for this project.