Technical details:

Indexing using Linear Filtering Method

1. Overview of database indexing

Indexing is the most common and effective means of improving database performance.  Hash and B-tree data structures are used widely in standard database systems.

Indexing the items in a database greatly increases the speed of querying such items and other tasks. At the same time, it can lead to reduced speed because the index needs to be updated when one of those items of data is modified or updated. For this reason, it is common practice to index only a small number of items.

The design of such an index is extremely important to improving performance when developing a database. Decisions on which items to index need to be made by taking into consideration the balance with processing requirements. If the details of processing are changed later, then the index will need to be redesigned, requiring a lengthy process of redevelopment.

In addition, since new items resulting from database processing of existing items will not be indexed, they will need to be added to the index as needed after such processing.

2.  Indexing using Linear Filtering Method

The Linear Filtering Method employed by Zap-In/Zap-Over embeds the index in the data structure naturally, so that all items are indexed. This makes it possible to process all items at high speed. In addition, since there is no need for index design database development can be carried out efficiently.

Also, even if new items are generated as a result of data processing those new items are added to the index automatically. This enables subsequent processing to be conducted at high speed, with no need for redesigning the index.

3. Comparison of different methods

The table below compares speeds of database processing among the three methods of indexing using a hash data structure, indexing using a B-tree data structure, and linear filtering.

Component analysis delivers powerful advantages in each type of processing and has overwhelming advantages particularly when handling large volumes of data. In addition, it also enables effective multithreading using multicore CPUs.

Processing Hash B-Tree LFM (p: num. of threads)
Columns to add index Primary keys only Small number All items
Time to generate index Not short Long Initial only
Memory needed O(n) O(n*log(n)) None
Perfect match,one item only
m: number of hits
O(n/p): default
O(log(n)/p): high-speed
SEARCH (within subset)
s: size of subset
Perfect match only
O(s)+O(m*log(n)) O(s/p)
SORT ineffective O(n*log(n)) o(n/p)
SORT (within subset)
s: size of subset
ineffective O(n*log(n)) o(n/p)
n1, n2: size of table 
ineffective ineffective o((n1+n2)/p)
UNION (subset)
s1, s2: size of subset
ineffective ineffective o((s1+s2)/p)
n1, n2: size of table
s1, s2: size of subset
ineffective ineffective o((s1+s2)/p)
Totaling ineffective ineffective o(n/p)
Totaling (subset) ineffective ineffective o(n/p)
Updating index Successive Troublesome Unnecessary