Guest Column | January 29, 2015

Solving Uncertainty: Using Big Data To Predict Urban Water Demand

bigdataslide

Data-rich utilities have the tools to forecast urban water demand, but some key considerations are often overlooked when creating models. Learn the pitfalls and how to resolve them.

By John Quilty

Urban water demand (UWD) fore­casting is an important task that can be utilized by water resources managers to help mitigate consequences stemming from fluctuations in demand and properly respond to water system dynamics at varying scales and lead times, ensuring appropriate operational, tactical, and strategic management of the water system. UWD forecasting can be applied to solve a number of issues faced by water supply system manage­ment such as: 1) understanding the dynamics and underlying factors that affect water use; 2) managing and opti­mizing the operation of pumps, wells, reservoirs, and mains, among other things; 3) developing effective water demand management programs; 4) set­ting meaningful water rate schedules; and 5) providing information regard­ing when peak water demand is likely to occur (Adamowski, et al., 2012).

Recently, House-Peters & Chang (2011) explored the UWD modeling literature from the past three decades and identi­fied four themes that are important. In this regard, we discuss two of these top­ics that are relevant to this article: (1) interactions within and across multiple temporal scales, and (2) acknowledge­ment and quantification of uncertainty. To understand the broad context of UWD forecasting applications at differ­ent scales, Donkor, et. al (2014) sum­marized their findings related to the planning level that the forecasts can support as the following three catego­ries: operational, tactical, and strategic (presented in Table 1). Most recently, it was identified through a survey of wa­ter utilities under the Water Research Foundation’s co-funded project on short-term UWD forecasting (Fullerton, Forthcoming) that water utilities’ most crucial need for short-term forecast­ing is tied to revenue and expenditure projections. Generally, the most preva­lent billing cycles occur at monthly, bimonthly, and quarterly periods.

Undoubtedly, we are operating water systems in the age of Big Data, where sensing technology is prevalent and the ability to store and process large amounts of data are advancing at expe­dited rates (Courtney, 2014). Further­more, with advances in the domain of machine learning, data-based forecasting is often used in water resources applica­tions to solve challenging problems with non-trivial mathematical expressions (Maier, et al., 2010). To merge the on­going advancements in the UWD mod­eling domain with the forecast require­ments of the water utility and the water industry trend of Big Data analytics, it is clear that we require a refined forecast­ing procedure that is not only accurate and reliable but suitable for Big Data applications. It is essential that this fore­casting procedure captures the changing dynamics of the water supply system at multiple scales, identifies sources of uncertainty in derived forecasts, and is amenable to Big Data scenarios (i.e., a method that requires little user interven­tion and rapid computing potential).

In this article we provide a tool named Multiscale-Bootstrap-Extreme Learning Machine (MBELM) that can be used to solve all of the above-mentioned is­sues plaguing water resources manag­ers and the water supply systems they manage. MBELM incorporates UWD interactions across multiple temporal scales, quantifies uncertainty in model forecasts, and is perfectly suited to Big Data scenarios. To show the benefits in accuracy and reliability of the MBELM, we apply this model to a real-world da­taset from the water supply system in London, Ontario, Canada. In this brief case study, we focus on one-month-ahead UWD forecasts, which can be used as accurate and reliable tools for monthly revenue projections because they align with the water utility’s bill­ing periods (City of London, 2013).

MBELM Background

The MBELM model is a refinement of the model Wavelet-Bootstrap-Artificial Neural Network (WBANN), introduced to the UWD forecasting literature by Ti­wari & Adamowski (2013). The authors utilize the discrete wavelet transform (DWT) (Mallat, 1989; Daubechies, 1990), which builds on the well-known Fourier transform, to decompose the original water demand record (time se­ries) into multiple components that each characterize different temporal scales. The DWT captures high-frequency time series components and isolates these components into what are called “detail series” representing the fast-changing characteristics of the original record. Successive “details” may be extracted from the original record. The remaining signal content is called the “approxima­tion” series and generally contains trends and slower-moving time series compo­nents. If one were to extract all possible details from the original record, the ap­proximation series would represent the mean of the original UWD time series.

After performing DWT analysis on the original UWD time series, the authors then use a nonlinear machine learning technique, Artificial Neural Networks, or simply Neural Networks (ANN) (Zurada, 1992), to calculate forecasts for each isolated signal (the detail[s] and the approximation series). Finally, the authors use the statistical bootstrap technique (Efron & Tibshirani, 1993) to derive confidence intervals address­ing uncertainty in the final forecast.

When partitioning the data we usually consider three separate sets: calibration, validation, and testing.

Our modification to the WBANN method replaces the ANN model with a new machine learn­ing technique called Extreme Learning Machines (ELM) (Huang, et al., 2006). The benefit of ELM when compared to ANN is that ELM can be built automati­cally without user in­tervention; the ANN requires signifi­cant user intervention with different user settings creating large variations in forecast performance. Another at­tractive feature of the ELM is the rapid computation time, which is on the or­der of 50 times faster than to the ANN in our simulations. Thus, the ELM is practical for Big Data applications.

Materials And Methods

First, we will describe the procedure for developing the MBELM, and after­wards we will briefly describe the dataset used in this article. We end this section explaining the data partitioning used to generate the forecasts and how the forecasts are evaluated. We defer theo­retical development for the reader to explore in the referenced publications.

A simple flow chart is provided in Figure 1 to illustrate the MBELM development procedure.

City Of London, Ontario UWD Time Series

In this article we choose a UWD time series from Hipel & McLeod (1994) to present the efficacy of the MBELM. The dataset represents average monthly water demand mea­sured in megalitres per day (ML/D) between January 1967 and December 1988 for the municipality of London, Ontario. This dataset can be used to provide one-month-ahead revenue projections for the water utility.

Generally, the accuracy of the forecast will depend upon the availability of historical records and the complexity of the UWD time series under study.

Data Partitioning

Data partitioning is a method used to build and improve the overall accuracy of the developed forecast model so that it can be used with high accuracy and reliability on new data as it becomes available. When partitioning the data we usually consider three separate sets: calibration, validation, and testing. Vali­dation and testing data are extracted from the original record and reserved to judge the suitability of the model parameters. For the dataset in this ar­ticle, observations from January 1967 to December 1986 are used to calibrate model parameters; data from January 1987 to December 1987 are used to select the best model parameters (validation); and the best model parameters are tested independently for model evaluation throughout the time­frame January 1988 to December 1988.

Model Evaluation

Using the Coefficient of Determina­tion (COD) and the Mean Absolute Error (MAE), one can evaluate the per­formance of the MBELM. The COD is a dimensionless quantity (square of Pearson’s correlation coefficient) that describes the amount of variance of the original time series explained by the MBELM prediction, while the MAE evaluates the mean error independent of sign, measured in ML/D. Gener­ally, the accuracy of the forecast will depend upon the availability of his­torical records and the complexity of the UWD time series under study.

Results and Discussion

The results for step 2 of the MBELM development procedure (Figure 1) are presented in Figure 2, while steps 3 to 5 are shown in Figure 3. Table 2 pro­vides Pearson’s correla­tion coefficient (CC) matrix examining sig­nificant correlation between each wavelet decomposed sub-series (WDSS) and the origi­nal time series at the 0.05 significance level.

It is clear from Figure 2 that there are cyclical components embedded within the origi­nal time series. This highlights the use­fulness of the DWT at extracting these important time series characteristics and exposing their periodic nature through time, which is not fully apparent in the original UWD record. Perhaps the most important characteristic revealed through the DWT decompositions for this time series is the extremely periodic nature of the approximation sub-series, which coincides with a 12-month pe­riodicity superimposed on an upward trend. The two details series exhibit random, and at times sharp, changes from their regular periodic patterns.

Examining the CC matrix (Table 2), one can see that there is significant cor­relation between the original time series and each WDSS. Likewise, there are also cases of significant CC between WDSS members. This nuance supports the exploration of cross-scale dependen­cies in the urban water supply system to better understand how these scales interact and their influence on UWD variation through time (which is im­plicitly integrated into the MBELM approach presented in this article) sup­porting the findings of House-Peters & Chang (2011). Searching for dependen­cies between temporal scales is a worth­while endeavor for a water manager because it can reveal system character­istics that may be exploitable for opti­mal pump scheduling, multi-objective reservoir operation, management and maintenance strategies, setting effective water-use rates, or determining when peak UWD events are likely to occur (among many other applications).

By deriving forecasts for each WDSS member using the ELM framework and aggregating their individual predictions, one is able to produce the “Forecast” presented in Figure 3: (a) for the com­plete time series record and (b) for the independent test-set evaluation. The un­certainty assessment through bootstrap­ping is then applied at the 95 percent confidence level and also presented in both (a) and (b) represent­ing the complete MBELM framework.

For the independent test set, the COD for the MBELM is 0.9679, mean­ing that the MBELM forecasts explain more than 96 percent of the variance in one-month-ahead observed UWD. The MAE for the same period is 3.8821 ML/D. Both scores indicate the highly accurate nature of the MBELM fore­casts. The confidence intervals also embody all observations within the independent test set, highlighting the reliability of using the MBELM for one-month-ahead revenue projections.

Conclusion

This article has introduced a new multi­scale UWD forecasting tool with uncer­tainty assessment, namely the MBELM, and tested its abilities on monthly UWD time series for one-month-ahead revenue forecasting in London, Ontario. The case study revealed the MBELM can be applied to generate accurate and reliable one-month UWD forecasts that can be used by water utilities for revenue projections. It is important to note that the MBELM method presented in this article is amendable to different fore­casting periods, such as hourly, weekly, quarterly, etc., extending its applicability to a host of other important water utility functions (some of which are mentioned in the Introduction and again in the Results and Discussion). Furthermore, the MBELM is not only accurate and reliable (through uncertainty assess­ ment) but also adapts to fluctuations in temporal scales embedded within the original UWD time series. Building on this last point, climate variables and eco­nomic indicators can be incorporated in the MBELM framework to study how UWD interacts with climate variability and economic momentum at different scales. This exciting factor allows water managers to incorporate climate and economic status into their water management decisions. The most en­ticing feature of the MBELM is its suitability to Big Data applications. Future Big Data research in the UWD forecasting domain should investigate the usefulness of the MBELM for op­erational forecasting of UWD time series extracted from Automated Me­tering Infrastructure (AMI) systems for automation and optimization of water utility system components.

About The Author


John Quilty (B.Eng., civil engineering with concentration in management, Carleton University) is an engineer with the City of Ottawa’s automated metering infrastructure project group and a researcher at McGill University, where he is pursuing his Ph.D. in bioresource engineering. John has contributed to water resources forecasting projects in Australia, Canada, Ethiopia, India, and Poland.

 


 

References

Adamowski, J., H. Fung Chan, S. O. Prasher, B. Ozga-Zielinski, and A. Sliusarieva (2012). Comparison of multiple linear and nonlinear regression, autoregressive integrated moving average, artificial neural network, and wavelet artificial neural network methods for urban water demand forecasting in Montreal, Canada, Water Resour. Res., 48, W01528, doi: 10.1029/2010WR009945.

City of London, 2013. Water By-law. [Online]
Available at: https://www.london.ca/residents/Water/water-bill/Pages/default.aspx
[Accessed 19 09 2014].

Courtney, M., 2014. How utilities are profiting from Big Data analytics. [Online]
Available at: http://eandt.theiet.org/magazine/2014/01/data-on-demand.cfm
[Accessed 03 09 2014].

Daubechies, I., 1990. The Wavelet Transform, Time-Frequency Localization and Signal Analysis. IEEE Transactions on Information Theory, 36(5), pp. 961-1005.

Donkor, E. A., Mazzuchi, T. A., Soyer, R. & Roberson, J. A., 2014. Urban Water Demand Forecasting: Review of Methods and Models. Journal of Water Resources Planning and Management, 140(2), pp. 146-159.

Efron, B. & Tibshirani, R. J., 1993. An Introduction to the Bootstrap. London, U.K.: Chapman and Hall.

Fullerton, T., Forthcoming. Analysis of the Effectiveness of Short-term Demand Forecasting and Recommendations for Improvement, Denver, CO: Water Research Foundation.

Hipel, K. W. & McLeod, A. I., 1994. Time Series Modelling of Water Resources and Environmental Systems. Amsterdam, The Netherlands: Elsevier.

House-Peters, L. A. & Chang, H., 2011. Urban water demand modeling: Review of concepts, methods,and organizing principles. Water Resources Research, 47(W05401), p. 10.1029/2010WR009624.

Huang, G.-B., Zhu, Q.-Y. & Siew, C.-K., 2006. Extreme learning machine: Theory and applications. Neurocomputing, 70(1-3), pp. 489-501.

Maier, H. R., Jain, A., Dandy, G. C. & Sudheer, K. P., 2010. Methods used for the development of neural networks for the prediction of water resource variables in river systems: Current status and future directions. Environmental Modelling and Software, 25(8), pp. 891-909.

Mallat, S. G., 1989. A Theory for Multiresolution Signal Decomposition: The Wavelet Representation. IEEE Transactions on Pattern Analysis and Machine Itelligence, 11(7), pp. 674-693.

Tiwari, M. K. & Adamowski, J., 2013. Urban water demand forecasting and uncertainty assessment using ensemble wavelet-bootstrap-neural network models. Water Resources Research, 49(doi:10.1002/wrcr.20517), pp. 6486-6507.

Zurada, J. M., 1992. Introduction to Artificial Neural Systems. St. Paul: West Publishing Company.