June 28, 2006
QL2
Software, Inc.
316 Occidental Ave. S.
Suite 410
Seattle, WA 98104-3859
Tel: 206.443.6836
Toll free: 800.750.8830
Fax: 206.269.0694
www.ql2.com
(Ed note: Some very sophisticated tools are used in the process
of collecting data, turning it into information, then into knowledge
and finally into understanding. WebQL is one of those tools; it can
automate and expand the process of data collection and transformation.
If you still think that performing a radius or buffer search in your
desktop mapping program is a cool way to retrieve information, you
really need to read further.)
WebQL is a highly focused Web search application that is used in
several industries. Competitive intelligence, traditional intelligence
and the travel industry are just a few examples. In the travel
industry, for instance, this product is used to help extract fares and
schedules, and provide the comparisons that we have all come to expect
from online travel websites. The search functions are done via agents,
directed by SQL statements or other programming interfaces, and the
search results come back in near real-time. The search criteria,
targeted toward specific websites (unlike Google, Ask or Yahoo) can be
tuned on the fly via the programming interfaces as discoveries are made
from the search.

WebQL is designed to locate data from publicly available sources, but
does not traverse firewalls or seek unauthorized access to information
that is located or linked within a public website.
Unlike Google, Ask or Yahoo, WebQL can extract information from free
form text, tables, PDF files, hyperlinks, all kinds of file types, and
it can even extract text from within graphic files such as .jpg, .tif
and .gif. Because of the versatility in creating agents, the
information retrieved can be geographic in nature. The agents can be
configured to fill out the typical forms needed to get to desired
information, which are found on many websites. Multiple agents can run
at the same time, too. There is an anonymous option that does the
search through a third party, leaving no trace back to the source and
no “footprints in the sand” on the targeted websites.
The product requires a fairly high skill level in writing SQL
statements or the other programming interfaces in order to make it
work. But there is a robust set of tools to aid in data extraction. In
addition, if you need just a few specific searches, the company
provides the service of creating agents.
Examples of file types and data sources from which
WebQL can extract data are listed below (this list is from the
company’s literature).
- PDF files
- Image files
- HTML
- Databases
- RSS feeds
- PowerPoint files
- Word documents
- Zip files
- Data mining for customer and competitor information to support CRM applications.
- Web mining for business activity monitoring (BAM), business process management (BPM) and executive dashboard applications.
- Gathering large amounts of unstructured text for indexing, data mining and “fingerprinting” by advanced text analysis tools.
- Aggregating content for enterprise information portals.
- Harvesting online competitive pricing data for revenue management and price optimization systems.
- Collecting and indexing data for enterprise search and metadata management.
- Watching the competition, pricing news, promotions, patents, hiring trends, SEC filings, licenses, expansion plans, etc.
- Automating research, locating, extracting and organizing information from multiple scientific information portals.
- Performing primary research, monitoring public opinion from cyber forums, blogs and RSS feeds about products and competitors.
- Internet monitoring of partners, resellers and the gray market for resale authorization price accuracy, logo usage, logo positioning, links to and from partner sites, intellectual property violations, etc.
The user interface is not unlike other software that allows you to create code. In the example below, you can see the code in the background and a block diagram of the query that has been constructed. This example includes a geographic search of countries. In the lower left you can see the country codes.
![]() |
![]() |
Creating an agent can take anywhere from ten minutes to several hours, depending on the complexity of the search. The agents run in real-time and can be tuned on the fly as data are retrieved and problems are encountered. For example, the agent may report, “I have found a couple of tables; do you want all the data or just some of it?” WebQL reads most database formats natively and can extract data from them even on the Web. The search process can be fairly fast and lots of data can be retrieved quickly. For example, think about how quickly Orbitz or CheapTickets can find fare information from several airlines in response to you launching an agent.
|
Your Comments Post a comment All comments provided in this section are those of the individual who has created the post. These are not the opinions of Directions Media, its editors, staff or owners unless otherwise noted. Directions Media retains the right to edit or delete any comments posted herein.
|
|
||||||
| [SNIP] "...it can even extract text from within graphic files such as .jpg, .tif and .gif". [/SNIP] ...nice article, just a quick FYI: There are not text within images; parts of images just look like text and can therefore be optically recognized. --Pi http://www.spatiallink.org/gistools/discuss/weblogs/blogs/pi.php |
||||||
|
||||||
| There is indeed text in most images. Many image formats include metadata embedded in the file header that includes useful information. In the commercial photography realm that often includes copyrights, usage terms, what camera the photo was shot with, exposure settings, capture date, and even latitude and longitude if it was captured by a GPS equipped camera. Google "EXIF metadata" for more info. -Mike |
||||||












