Product Overview: WebQL - Searching the Web, Acquiring Insight

QL2 Software, Inc.
316 Occidental Ave. S.
Suite 410
Seattle, WA 98104-3859
Tel: 206.443.6836
Toll free: 800.750.8830
Fax: 206.269.0694
www.ql2.com

(Ed note: Some very sophisticated tools are used in the process of collecting data, turning it into information, then into knowledge and finally into understanding. WebQL is one of those tools; it can automate and expand the process of data collection and transformation. If you still think that performing a radius or buffer search in your desktop mapping program is a cool way to retrieve information, you really need to read further.)

WebQL technical specifications.

WebQL is a highly focused Web search application that is used in several industries. Competitive intelligence, traditional intelligence and the travel industry are just a few examples. In the travel industry, for instance, this product is used to help extract fares and schedules, and provide the comparisons that we have all come to expect from online travel websites. The search functions are done via agents, directed by SQL statements or other programming interfaces, and the search results come back in near real-time. The search criteria, targeted toward specific websites (unlike Google, Ask or Yahoo) can be tuned on the fly via the programming interfaces as discoveries are made from the search.

WebQL is designed to locate data from publicly available sources, but does not traverse firewalls or seek unauthorized access to information that is located or linked within a public website.

Unlike Google, Ask or Yahoo, WebQL can extract information from free form text, tables, PDF files, hyperlinks, all kinds of file types, and it can even extract text from within graphic files such as .jpg, .tif and .gif. Because of the versatility in creating agents, the information retrieved can be geographic in nature. The agents can be configured to fill out the typical forms needed to get to desired information, which are found on many websites. Multiple agents can run at the same time, too. There is an anonymous option that does the search through a third party, leaving no trace back to the source and no “footprints in the sand” on the targeted websites.

The product requires a fairly high skill level in writing SQL statements or the other programming interfaces in order to make it work. But there is a robust set of tools to aid in data extraction. In addition, if you need just a few specific searches, the company provides the service of creating agents.

Examples of file types and data sources from which WebQL can extract data are listed below (this list is from the company’s literature).

PDF files
Image files
HTML
Databases
RSS feeds
PowerPoint files
Word documents
E-mail
Zip files

Here’s a list of examples of searches and applications of retrieved data from QL2 literature.

Data mining for customer and competitor information to support CRM applications.
Web mining for business activity monitoring (BAM), business process management (BPM) and executive dashboard applications.
Gathering large amounts of unstructured text for indexing, data mining and “fingerprinting” by advanced text analysis tools.
Aggregating content for enterprise information portals.
Harvesting online competitive pricing data for revenue management and price optimization systems.
Collecting and indexing data for enterprise search and metadata management.
Watching the competition, pricing news, promotions, patents, hiring trends, SEC filings, licenses, expansion plans, etc.
Automating research, locating, extracting and organizing information from multiple scientific information portals.
Performing primary research, monitoring public opinion from cyber forums, blogs and RSS feeds about products and competitors.
Internet monitoring of partners, resellers and the gray market for resale authorization price accuracy, logo usage, logo positioning, links to and from partner sites, intellectual property violations, etc.

The User Interface
The user interface is not unlike other software that allows you to create code. In the example below, you can see the code in the background and a block diagram of the query that has been constructed. This example includes a geographic search of countries. In the lower left you can see the country codes.

An example of the user interface. (Click for larger view)

Some of the tools available. The list has been split into two to make it easier to see all the available choices.

Another example of the user interface. (Click for larger view)

In geographical searches, WebQL can extract addresses, ZIP Codes and coordinates if they are part of the available source content. While WebQL does not include a geocoder, it can set the data up to be processed by one.

Creating an agent can take anywhere from ten minutes to several hours, depending on the complexity of the search. The agents run in real-time and can be tuned on the fly as data are retrieved and problems are encountered. For example, the agent may report, “I have found a couple of tables; do you want all the data or just some of it?” WebQL reads most database formats natively and can extract data from them even on the Web. The search process can be fairly fast and lots of data can be retrieved quickly. For example, think about how quickly Orbitz or CheapTickets can find fare information from several airlines in response to you launching an agent.