GIS, business intelligence, and other contemporary technologies dealing with geographical and business information utilize database tables.Information systems allow the use of tables: single flat tables, multidimensional tables, and related tables.The simplest single flat table is a two-dimensional array consisting of records (rows) and fields (columns).If a given column is split into a few field categories, another dimension is added to such a table.Let us imagine a table showing one thousand records and five fields.This is still a two-dimensional table.However, if each of these five fields is split into three categories (low, medium, and high), the resulting table will have five dimensions, since each categorized variable represents one dimension.Fully expanded multidimensional tables are difficult to display and manipulate.However, the number of dimensions in displaying multidimensional table can be reduced by adding layers to a table.As a result, the table structure can be defined by rows, columns, and layers.
What is OLAP?
OLAP is a buzzword today.The acronym stands for On-Line Analytical Processing.In fact, this term does not correspond very well to the meaning of OLAP tools.OLAP tools allow the user to query, browse, and summarize information in a very efficient, interactive, and dynamic way.OLAP tools represent a vital component of both the business intelligence and data mining technology.They provide an aggregated approach to analyzing large amounts of detailed data.
OLAP databases are referred often as "cubes" since they have a multidimensional nature.Each result of querying, browsing, and summarizing can be viewed and stored as a separate cube.A cube is a visual representation of a multidimensional table and has just three dimensions: rows, columns and layers.OLAP cubes are very flexible because they allow the user to move information between these three dimensions.OLAP cubes are easy to create and manipulate. Since they provide insight into various aspects of data, these tools also represent data mining technology.Users can have multiple cubes for their business data: one cube for customers, one for sales, one for production, one for geography, etc.This paper presents various examples of cubes for the same data set.
Brief history of OLAP cubes
The history of OLAP cubes goes back more than forty years. In 1962, the Ken Iverson's classical book, A Programming Language, was published.This book presented the APL language for multidimensional transformations.Although the practical implementation of APL models was constrained by computer hardware and software limitations, the main ideas of APL were incorporated into the first OLAP prototypes in the 1970s. One of the first prototype was Express (1970), acquired by Oracle in 1995.However, the first OLAP products were likely Comshare System W (1982) and Metaphor (1984, allied later with IBM).These systems were used primarily for marketing and financial applications.Some multidimensional functionality was also available some time ago for spreadsheets, including Compete, SuperCalc, MetaCube, Lotus, and finally Excel.There are more then ten Excel extensions created by third parties for Microsoft Analysis Services.Microsoft launched the OLE DB for OLAP in 1997 and OLAP Services in 1999 (known as Analysis Services since 2000).Today there are more than a dozen books on Microsoft OLAP. Four years ago, the first XML-based OLAP was implemented.Two years ago, Oracle released the functional Oracle 9i OLAP technology.Many vendors offer OLAP servers in addition to RDBMS technology.
Today, there are a few dozen OLAP products on the market, including products from vendors such as IBM, SAP, Hummingbird Communications, SAS Institute, Computer Associates, and SPSS.All business intelligence vendors offering OLAP products showed very significant revenues in 2004.The OLAP market is growing much faster than other software sectors.The whole OLAP market was estimated as 3.7 billion dollars in 2003.
Using specialized OLAP software packages vs.creating OLAP cubes with spreadsheets and statistical software packages
Users wanting to analyze large and detailed databases have at least two options: to purchase a specialized OLAP product or to use OLAP cube functionality available in other commonly used products such as spreadsheets or statistical software packages.Selecting the most suitable OLAP product is a very complex process.Since these products are rather very expensive, it has to be done using the more formal implementation process.The selection of an optimal OLAP product should follow the user needs analysis, system selection, cost/benefit analysis and the whole product life-cycle methodology elaborated for other IT products, including the implementation of GIS.
However, there is a low cost alternative.Excel users can create OLAP cubes using pivot tables and SPSS users can generate OLAP cubes with SPSS reports.Both Excel and SPSS users can easily create OLAP cubes and manipulate them in a very efficient way.In addition, each of these products offers some specific functionality.For example, in Excel, users can create graphs corresponding to OLAP cubes, whereas SPSS provides more statistics for analyzing data in OLAP cubes.However, without using additional technology, available in more specialized software packages, these users will find the following limitations:
- Multiple related tables cannot be used for creating OLAP cubes
- Active OLAP data cannot be stored in relational databases; commercially available products offer relational OLAP cubes (called ROLAP)
- Data warehousing and OLAP technologies are not fully integrated
- SQL queries cannot be directly used from within OLAP cubes
- Multidimensional server engines cannot be used for storing active OLAP data; commercially available products offer multidimensional cubes (called MOLAPs)
- OLAP cubes cannot be
available to other Web users.
OLAP cubes, like other data mining tools, allow users to reveal trends and structures in data sets.Similar to other data mining tools, OLAP cubes are not application based or scale based.The list of OLAP applications is very extensive.Probably, the most common use of OLAP cubes is in marketing, including database marketing and Web marketing.Other typical applications are retail sales and profitability analysis, quality control analysis, financial and budgeting applications, and management reporting.OLAP technology is a vital part of management and decision making in numerous banks (including the World Bank), airlines (British Airways), media and entertainment (Time Warner), information technology (Hewlett-Packard, Sun Microsystems), insurance (Thomas Cook), auto industry (Subaru), etc.
Integrating OLAP with GIS
Locational data represent an essential part of analysis and solutions in many applications.First of all, many other non-spatial attributes can be aggregated geographically.Then, additional statistics can be computed allowing the user to analyze every variable (dimension) in a geographical context.
So far, the integration of OLAP technology and geographical analysis has been very limited.The majority of integration took place within the building and using of so-called spatial warehouses for visualization of results.The methodology based on similarities between OLAP cubes and map cubes (Shekhar et al., 2001) is very promising.Whereas non-spatial warehouses utilize OLAP cubes mostly as summary tables or spreadsheets, the spatial warehouses provide map cubes (collections of maps).The different nature of OLAP cubes and map cubes also results from different types of aggregation operations.Such statistics as arithmetic mean, median, mode, standard deviation, minimum, maximum, count, or sum have their equivalents in so-called map algebra for raster data types.In addition, spatial queries utilizing geometric operators (area, perimeter, centroid) and topological operators (inside, within, intersect, contains, connects, borders), supplement significantly non-spatial SQL queries.
It is only a matter of time before a bridge between OLAP and GIS will be established allowing OLAP users, who deal with locational data, to display OLAP cubes as maps using GIS.The foundation for going in the opposite direction (from GIS to OLAP) should also be established.GIS users could create SQL queries with geometric and topological operators and then pass this information to OLAP cubes.The final step of the integration would be to allow the user to interactively browse OLAP cubes and to simultaneously view the results on maps.Also, the user should be able to query a map and view corresponding data within OLAP cubes.
OLAP cube examples
The examples provided in this paper represent the low-cost approach to using the OLAP cubes technology.All of them were created with the Base module of the statistical software package SPSS (Figure 1). Similar cubes can be also created with the Pivot Table and Pivot Chart Wizard in Excel (Figure 2).The data set used for illustrating the power of OLAP cubes consists of more than 1,500 polygons representing the basic census units for the province of Nova Scotia, Canada.The following Census variables were used:
- Total population
- Number of dwellings
- Area (in sq km)
- Population 65 years old and older
- Number of single-detached houses
- Number of households with six persons and above
- Population with French mother tongue
- Aboriginal population
- Unemployment rate
- Unemployment rate for females
- Employment in agriculture and related serviced industries
- Employment in fishing and trapping industries
- Employment in logging and forestry industries
- Population with a bachelor degree or higher
- Average household income
- Average number of rooms per dwelling
- Average value of dwelling
- County name
- Unemployment rate category (low, medium, high)
- Average household income category (low, medium, high)
- Average value of dwelling category (low, medium, high)
- Arithmetic mean
- Standard deviation
- Percent of total sum
The following are basic operations that can be performed with OLAP cubes.
- Drill down
Figure 1.Creating OLAP cubes in SPSS.Click image for larger view.
Figure 2.Designing the OLAP cube layout with the Pivot Table and Pivot Chart Wizard in Excel.Click image for larger view.
Figure 3.Initial pivot table created in SPSS.Click image for larger view.
Figure 4.Initial pivot table created in Excel.
When grouping variables are placed in layers, the full list of categories for each grouping variable is available.Figure 5 shows the partial list of categories for the variable County.
The slice operation is based on selecting one dimension and focusing on a portion of a cube.For example, Figure 6 presents seven statistics for one variable: Population 65 years old and above.Numbers refer to one selected county (Lunenburg), low unemployment rate, medium household income and high average values of dwelling.Figure 7 shows the values for another variable: Total population.
Figure 6.Slicing OLAP cubes for Population >= 65 years. Click image for larger view.
Figure 7 Slicing OLAP cubes for Total population.Click image for larger view.
The dice operation creates a sub-cube by focusing on two or more dimensions.Figure 8 presents seven statistics for two variables (dimensions): French mother tongue and aboriginal population.Numbers refer to the entire province (County = Total) and to areas with a high unemployment rate and both low household income and average values of dwelling.Figure 9 demonstrates results of the same settings for two other variables: the total population and numbers of dwellings.
Figure 8.Dicing for two dimensions: French mother tongue and Aboriginal population. Click image for larger view.
Figure 9 Dicing for two dimensions: Total population and Total dwellings.Click image for larger view.
Roll-up, also called aggregation or dimension reduction, allows the user to move to the higher aggregation level.For example, instead of aggregating data by county, the user can select the whole province level, as it was shown in figures 8 and 9.Figure 10 shows numbers for the entire province (County = Total), for all unemployment categories (Unemployment class = Total) and for all average values of dwelling categories (Average value of dwelling = Total), still holding the low household income class category (Household income = Low).
The drill-down operation is the reverse of a roll-up and represents the situation when the user moves down the hierarchy of aggregation, applying a more detailed grouping.For example, if the Unemployment rate category is changed from Total (Figure 10) to Medium, another sub-cube is created as it is shown on Figure 11.One statistic (Sum) is hidden because some variables refer to average values (for example, the average household income) and the sum of means can be confusing for such variables.
Pivoting, or rotation, changes the perspective in presenting the data to the user.The following figures (12-14) demonstrate various perspectives after pivoting the identical initial table.All three figures show multidimensional tables.All of them show the same information in a different way.For example, the arithmetic mean of the average values of dwelling for the areas with a low unemployment rate, high values of dwelling, and low household income in Annapolis County is the same: $98,985.00.
Figure 12.OLAP cube after rotation: the first perspective.Click image for larger view.
Figure 13.OLAP cube after rotation: the second perspective.Click image for larger view.
Figure 14.OLAP cube after rotation: the third perspective.Click image for larger view.
Shekhar S.et al., 2001.Map cube: a visualization tool for spatial data warehouses.[In:] Miller H.J.and J.Han (eds.) Geographic Data Mining and Knowledge Discovery.Taylor & Francis, pp.74-109.