Reliable and accurate weather forecasts are essential for planning personal leisure activities as well as for numerous companies and organizations. Mobile World Information Systems GmbH (MOWIS) provides weather data services, including operating the wetter.at and wetter-deutschland.com websites.
The central data management system for these services was developed by RISC Software GmbH on behalf of MOWIS. The system has been continuously maintained and further developed as needed since 2011. The central component is the NoSQL database HBase, which is based on a Hadoop cluster. Current weather data from various sources, such as the results of various weather forecast models as well as measured values from weather stations, are continuously fed into this database. This makes it possible to interactively retrieve weather forecasts worldwide up to fourteen days in advance. The system dynamically selects the most suitable database for each forecast and combines data from different sources. In addition, data already retrieved is cached to further speed up queries. An xml-based interface allows data to be retrieved by web-based services. Furthermore, using Apache Hive, SQL access to the data was enabled, allowing flexible queries to the stored location data. However, these queries are not used to export data, but are used for interactive data quality control by MOWIS.
Ongoing data import and export
The data stock in HBase is kept up to date by the ongoing import of forecast model results as well as measurement data from weather stations. Here, the entire delivered weather data of a model is converted into a structured text representation, which subsequently enables the use of Hadoop MapReduce as well as HBase bulk imports. This allows a high-resolution weather forecast model for Austria over several hours to be imported within five minutes and thus made available for forecasts in a timely manner. Similarly, a global weather data model with forecast values for one day can be imported in fifteen minutes. A comparable data import required several hours using the replaced legacy SQL database.
Design of a suitable data model
In order to perform queries interactively, the data model was adapted to the queries, allowing, for example, a quick query for a location. To make a NoSQL database effectively usable, the design of a data model optimized for the planned queries is central. Therefore, at the beginning of the project, the planned queries were defined together with the domain experts of the company MOWIS. On this basis, the data model for HBase was defined, which in particular allows fast queries on individual locations and, on the other hand, enables automated removal of data that is no longer required. In order to enable efficient access via other attributes, numerous lookup tables were also implemented. The use of a Big Data system allows the flexible adaptation or extension of the data model, if new queries are needed.
Acceleration and cost savings compared to legacy SQL database.
By switching to a Hadoop-based NoSQL solution, an additional forecast model for worldwide weather data could be introduced, as well as data imports as well as exports accelerated by a factor of seven. This makes it possible to retrieve worldwide weather forecasts interactively or to export current weather forecasts for the whole of Austria and Germany for the above-mentioned websites. For this purpose, both the imports of the different weather models and the exports of the data updates for wetter.at use Hadoop Map-Reduce jobs to execute the creation of the current weather forecasts for all of Austria and Germany on the Hadoop cluster in parallel.
The use of Hadoop brings the following advantages here:
As the amount of data grows, simple and inexpensive scaling of the system by adding new compute nodes to the cluster.
The use of an OpenSource technology makes it possible to save significant licensing costs compared to classic commercial database offerings.