Pocket OLAP in Javascript and IndexedDB performance

Hello, Habr! 3r33333.  
3r33333.  
Recently, I decided to test jаvascript performance using the example of creating a simple WEB application that can build pivot tables using weakly structured data as a source. I did not intend to repeat all the functionality of Excel or adult OLAP systems, but I wanted to test the performance of jаvascript in general and IndexedDB in particular on various desktop and mobile browsers. Looking ahead, I’m saying that having completed the first stage of the work - building a pivot table with a single-pass algorithm for storing facts (indexing frequently used sections and caching computed aggregates is postponed for the future) - I was disappointed with the reading performance of IndexedDB, surprised that mobile browsers were not are lagging behind the desktop, and puzzled by the epic failure of my beloved Firefox in one of the tests. There were a total of 2 tests with different variations:
 
3r33333.  
 
forming a pivot table, where the basis of the algorithm is a single cycle on the IndexedDB cursor, working with Objects, Array, Set, Map objects (key extraction, insertion, iteration), string concatenation and simple arithmetic;
 
decoding (drillthrough) rows of the pivot table with the output of the result in the DOM, where the basis of the algorithm is multiple (in a cycle) extracting one record from the IndexedDB by key, and the subsequent output of the results in the html table in groups of 100 rows using the insertAdjacentHTML ('beforeEnd', html) ).
 
3r33333.  
Testing was carried out on a JSON file containing 20 thousand facts, of which 9 records were a reference book of products, the rest represented purchase /sale operations. The plate with the results of testing on a netbook and phone (time in seconds), as well as implementation details and conclusions - under the cut. 3r33333.  
3r33333.  
3r3148.  
3r3r1616.  
3r33177. 3r3178.  
3r33177. 3r3178.  
3r33177. Firefox linux ???r3r3178.  
3r33177. Chromium linux ???.80
 
3r33177. Falkon linux QtWebEngine ???
 
3r33180.  
3r3r1616.  
3r33177. 1 3r3178.  
3r33177. Full calculation of 20 thousand facts - filter, grouping lines, calculating aggregates 3r3178.  
3r33177. 12.3? 10.2? ???r3178.  
3r33177. 4.7? 4.3? ???r3r3178.  
3r33177. 5.0? 4.7? ???r3r3178.  
3r33180.  
3r3r1616.  
3r33177. 2
 
3r33177. Without filter, calculated attributes and aggregates - only grouping of lines
 
3r33177. 8.0? 8.1? ???r3178.  
3r33177. 3.? 3.? ???r3r3178.  
3r33177. 3.3? 3.3? ???r3r3178.  
3r33180.  
3r3r1616.  
3r33177. 3 3r3178.  
3r33177. Without a filter, groupings and calculations - a “naked” cycle according to the IndexedDB
cursor.  
3r33177. 7.8? 7.7? ???r3r3178.  
3r33177. 3.3? 3.3? ???r3r3178.  
3r33177. 3.2? 3.1? ???r3178.  
3r33180.  
3r3r1616.  
3r33177. 4
 
3r33177. Decoding (drillthrough) with the output in the html table of 20 thousand lines
 
3r33177. 108
 
3r33177. ???r3178.  
3r33177. 100 3r3178.  
3r33180.  
3r3r1616.  
3r33177. 5
 
3r33177. Drillthrough without outputting to DOM is a “bare” loop over an array of keys, and extracting records one by one 3r3178.  
3r33177. 11.5? 14.7? ???r3r3178.  
3r33177. 18.6? 18.9? ???r3r3178.  
3r33177. 01/18/19/0? 18/11-3r3178.  
3r33180.  
3r3182. 3r33333.  
3r3148.  
3r3r1616.  
3r33177. 3r3178.  
3r33177. Firefox android ???
 
3r33177. Chrome android ???.98
 
3r33177. Edge android ???.2804
 
3r33180.  
3r3r1616.  
3r33177. Full calculation of 20 thousand facts on the phone 3r3178.  
3r33177. 13.5? 13.1? ???r3r1717.  
3r33177. 5.8? 5.8? ???r3r3178.  
3r33177. 6.4? 6.4? ???r3r3178.  
3r33180.  
3r3182. 3r33333.  

Baseline

3r33333.  
Since our database is NoSQL, the data schema and relationships between entities are not previously described. You can submit to the input any JSON file containing an array of objects where elements of directories, transactions, document lines, etc. can be mixed. Links are established at the time of grouping lines (or calculating formulas) based on matching attribute values, so you can say that human-readable text keys are used. For example, product information and buy /sell transactions will be presented in the fact database with the following records: 3r33333.  
3r33333.  
[
{"product": "milk-2.5", "EAN-code": "4770148229477-3"},
{"product": "petrol-95", "EAN-code": "74820123490016-3"},
{"product": "fish-pollock", "EAN-code": "4640014465660-3"},
{"operation": "purchase", "partner": "nalchik-moloko", "product": "milk-2.5", "price": 15.5, "quantity": 50},
{"operation": "purchase", "partner": "lukoil", "product": "petrol-95", "price": 35, "quantity": 200},
{"operation": "purchase", "partner": "china-fish", "product": "fish-pollock", "price": 90, "quantity": 100},
{"operation": "sale", "partner": "ivanov-petr", "product": "milk-2.5", "price": 20.30, "quantity": 3.5},
{"operation": "sale", "partner": "smith-john", "product": "petrol-95", "price": 42, "quantity": 30},
{"operation": "sale", "partner": "ivanov-petr", "product": "fish-pollock", "price": 120, "quantity": 2}
]
3r33333. 3r33333. 3r33333.  

The syntax of the formulas is

3r33333.  
jаvascript syntax is used, and 3 additional functions:
 
3r33333.  
 
fact (columnname) - returns the value of the attribute (including the calculated one) of the current fact;
 
row (columnname) - returns the current (intermediate) row value of the pivot table to which this fact falls;
 
selectlast (columnname, where) - returns the value of the attribute of the last fact from the set of facts that satisfy the where condition.
 
3r33333.  
Example of a formula for the filter:
 
3r33333.  
['purchase', 'sale'].includes (fact ('operation'))
3r33333. 3r33333. 3r33333.  
Example of a calculated fact attribute:
 
3r33333.  
"amount": "round (fact ('price') * fact ('quantity'), 2)"
3r33333. 3r33333. 3r33333.  
Examples of formulas for pivot table columns (aggregates, post-aggregate calculations, queries to reference books):
 
3r33333.  
{
"quantity-sum": "row ('quantity-sum') + fact ('quantity')",
"amount-sum": "round (row ('amount-sum') + fact ('amount'), 2)",
"quantity-avg": "round (row ('quantity-sum') /row ('count'), 2)",
"price-max": "fact ('price')> row ('price-max')? fact ('price'): row ('price-max')",
"price-min": "row ('price-min') == null || fact ('price') < row('price-min') ? fact('price') : row('price-min')",
" price-sum ":" row ('price-sum') + fact ('price') ",
" Price-avg ":" round (row ('price-sum') /row ('count'), 2) ",
" Price-avg-weight ":" round (row ('amount -sum ') /row (' quantity-sum '), 2) ",
" EAN-code ":" selectlast (' EAN-code ',' fact ("product") == row ("product") ') "
}
3r33333.
 

Features of the algorithm

3r33333.  
Since we have a single pass algorithm, at each moment of time we have one current fact and one row of the pivot table into which this fact falls. Thus, using the fact () and row () functions, it is possible to calculate simplest aggregates such as sum, min /max, average, etc. More complex statistical functions that require the storage of the entire array of values ​​are not yet ready. 3r33333.  
3r33333.  
It was more difficult to implement the left join (selectlast () function) for the purpose of extracting additional attributes from the directories (essentially from other facts) in a single cycle of storing facts. I made the assumption that the number of directory entries will always be orders of magnitude smaller than the amount of real-time data, and solved the problem as follows — simultaneously with grouping lines and calculating aggregates — all directories are selected into a separate sandbox (that is, the facts that have the desired attribute). After the formation of the rows of the pivot table - the second pass through the sandbox, we pull up the missing attributes in the corresponding columns of the pivot table. Thus, we have only one heavy cycle. 3r33333.  
3r33333.  
The last optimization was the conversion of all formulas in the JS function at the start, to avoid using eval () in a loop. 3r33333.  
3r33333.  

Performance testing

3r33333.  
To my surprise, the insertion of 20 thousand facts into the non-indexed ObjectStore took about a second, but there are significant problems with extracting data. 3r33333.  
3r33333.  
Poor performance in row 4 is understandable - output to the DOM, and even to the table - a predictably difficult operation, moreover, rather meaningless on the prod, since it’s impossible to work with such a table normally. 3r33333.  
3r33333.  
Of interest are lines 3 and ? representing the "bare" performance of the sample from IndexedDB. In these tests, I commented out the entire algorithm, leaving only the operations with the base - in line 3 one big cursor was used and iterated over it, in line ? vice versa - iteration over the key array (previously prepared), and extracting records one by one. The key is integer, auto-increment. 3r33333.  
3r33333.  
Here are the code snippets for tests 3 and ? respectively: 3r33333.  
3r33333.  
//single cycle on facts
db.transaction ('facts', 'readonly'). objectStore ('facts'). openCursor (undefined, 'prev'). onsuccess = (ev) => {
let cur = ev.target.result;
if (cur) {
cfact = cur.value;
;
//next fact
cur.continue ();
}
}
3r33333. 3r33333. 3r33333.  
rowout (0);
function rowout (i) {
if (i < ids.length) {
db.transaction ('facts', 'readonly'). objectStore ('facts'). get (ids[i]) .success = (ev) => {3r33370. cfact = ev.target.result;
;
Rowout (i + 1);
}
}
}
3r33333. 3r3341.  
Since the IndexedDB interface is asynchronous, in both cases we observe tail recursion, which must be optimized by the engine (otherwise it would have fallen from the stack overflow), but the reason for such epic brakes is incomprehensible to me. Moreover, Firefox, the loser in all tests, significantly won in the test for single requests, which gives us hope that it will definitely cope with queries on the index. 3r33333.  
3r33333.  

Conclusion

3r33333.  
I felt sad, and I'm not sure that I will continue to experiment. Of course, you can build indexes and abandon fullscan, but you still have to summarize the units in a cycle, and, as we see, a sample from the cursor is the most expensive operation. It may be worth refusing to use IndexedDB at all, store the data on disk in JSON format (good, parsing takes seconds), and rewrite the algorithm to wasm. Or wait for the implementation of Rust in browsers (just kidding). 3r33333.  
3r33333.  
The application is available
here 3r33359. , file with test data
here 3r33359. . 3r33333.  
3r33333.  
I would be grateful for the advice and just clever thoughts, since the jаvascript language is not my first language. 3r33333.
3r33333.
+ 0 -

Add comment