The prospect of increased arctic shipping has drawn much attention in recent years. Commercial interests anticipate a new transit route between the Atlantic and the Pacific. However, shipping in the Arctic is risky, and requires reliable forecasts. Here we describe an attempt to improve them.
The decline of sea ice in the Arctic may open up for new shipping routes over the polar oceans, and render various raw materials in the Arctic more easily accessible. While this development has obvious commercial potential, it also raises concerns for safety and environmental risks, issues related to monitoring and governing the traffic, infrastructure investments, search and rescue capacity and more. Forecasts for arctic shipping are therefore of interest to international bodies, individual states, local authorities, industry, and civil society.
Several analyses have been done on the development of shipping in the Arctic, most of them qualitative studies of possible drivers or scenario exercises, cost and profitability calculations based on engineering or natural sciences, or quantitative studies of ice conditions and the feasibility of sailing.
To the best of our knowledge, nobody has previously used statistical analysis of real data to forecast arctic shipping. This is likely because no long time series of actual data about vessels operating in arctic waters have been available until now. With enough data, we can use statistical methods to make short- to medium-term shipping forecasts.
These new data rely on the Automatic Identification System (AIS), a tracking system that most ships are required to use. Ships automatically report their position via satellite or land-based instruments, and the information is stored in databases. Our study is based on data available from the Havbase database (www.havbase.no/havbase_arktis), which is updated monthly and has covered all of the Arctic since 2013. The number of crossings over defined passing lines and distance sailed in different regions is available for 13 different vessel types in 7 different size classes. The database also lists port calls. The main statistic our study uses is the distance sailed in each of 18 regions denoted Large Marine Ecosystems (LMEs) in Havbase.
With 18 LMEs, 13 vessel types and 7 size classes, there could potentially be 1638 time series, but just under 700 series currently include data. We aggregate data from different LMEs to create a “High Arctic” series, which we also split into one aggregated series along the Northwest Passage and one along the Northern Sea Route. The Central Arctic Ocean has no activity registered in Havebase (yet) and is left out. The Barents Sea LME is treated as a standalone area. Within these defined areas, we evaluate time series characteristics, such as the strength of trends and seasonality, to measure the forecastability of the series. We have noted high seasonality in the traffic pattern and increased forecastability over time as more data accumulate.
We are testing three different forecasting models and an ensemble forecast that combines the three. Traffic data for at least 36 months are used to generate a forecast for the 12 months ahead. We evaluate the models’ accuracy by comparing the forecasts with actual data for that period. Then we repeatedly add one more month’s worth of data and make a new 12-month forecast. At each iteration, we increment the dataset by one month and shift the forecast period an equal amount until no more data are available for comparison. This procedure gives more robust accuracy measures. The forecasts are also compared against a baseline model – a naïve forecast, which repeats the previous 12 months’ observations.
For all the areas evaluated, our forecasting models outperform the naïve forecast. In addition, the ensemble forecasts perform best for three out of the four areas. The mean percentage error of all the forecasts is less than 10% (traffic is underestimated by 8% for the High Arctic, by 0.03% for the Barents Sea, and by less than 2% in the other areas). When the mean percentage error is calculated, over- and under-forecasts cancel each other out, so in absolute values, the errors are naturally higher. Nonetheless, we observe that automated forecasting models can give fairly accurate 12-month forecasts for the Arctic regions we have studied.
Being able to forecast traffic in these areas is of interest in itself. In addition, the results suggest that a hierarchal statistical model can be applied to the entire dataset, which might make forecasts of low-level series more accurate than if they are forecast individually. For example, using aggregated data traffic within an entire LME (high-level) might allow more accurate prediction of vessel types and sizes (low-level). This is because low-level series contain less activity and are more prone to noise.
The traffic data in Havbase may also allow us to identify and assess drivers that can further increase the accuracy of forecasts. This would be particularly useful for longer-term scenario-based predictions.