Skip to content

Commit ec20040

Browse files
author
Arseny Kositsyn
committed
[PGPRO-12159] Added a description of rum_debug_funcs in README.md
Tags: rum
1 parent 61704b4 commit ec20040

File tree

1 file changed

+128
-0
lines changed

1 file changed

+128
-0
lines changed

README.md

Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -302,6 +302,134 @@ For type: `anyarray`
302302
This operator class stores `anyarray` elements with any supported by module
303303
field.
304304

305+
## Functions for low-level inspect of the RUM index pages
306+
307+
The RUM index provides several functions for low-level research of all types of its pages:
308+
309+
### `rum_metapage_info(rel_name text, blk_num int4) returns record`
310+
311+
`rum_metapage_info` returns information about a RUM index metapage. For example:
312+
313+
```SQL
314+
SELECT * FROM rum_metapage_info('rum_index', 0);
315+
-[ RECORD 1 ]----+-----------
316+
pending_head | 4294967295
317+
pending_tail | 4294967295
318+
tail_free_size | 0
319+
n_pending_pages | 0
320+
n_pending_tuples | 0
321+
n_total_pages | 87
322+
n_entry_pages | 80
323+
n_data_pages | 6
324+
n_entries | 1650
325+
version | 0xC0DE0002
326+
```
327+
328+
### `rum_page_opaque_info(rel_name text, blk_num int4) returns record`
329+
330+
`rum_page_opaque_info` returns information about a RUM index opaque area: `left` and `right` links, `maxoff` -- the number of elements that are stored on the page (this parameter is used differently for different types of pages), `freespace` -- free space on the page.
331+
332+
For example:
333+
334+
```SQL
335+
SELECT * FROM rum_page_opaque_info('rum_index', 10);
336+
leftlink | rightlink | maxoff | freespace | flags
337+
----------+-----------+--------+-----------+--------
338+
6 | 11 | 0 | 0 | {leaf}
339+
```
340+
341+
### `rum_internal_entry_page_items(rel_name text, blk_num int4) returns set of record`
342+
343+
`rum_internal_entry_page_items` returns information that is stored on the internal pages of the entry tree (it is extracted from `IndexTuples`). For example:
344+
345+
```SQL
346+
SELECT * FROM rum_internal_entry_page_items('rum_index', 1);
347+
key | attrnum | category | down_link
348+
---------------------------------+---------+------------------+-----------
349+
3d | 1 | RUM_CAT_NORM_KEY | 3
350+
6k | 1 | RUM_CAT_NORM_KEY | 2
351+
a8 | 1 | RUM_CAT_NORM_KEY | 4
352+
...
353+
Tue May 10 21:21:22.326724 2016 | 2 | RUM_CAT_NORM_KEY | 83
354+
Sat May 14 19:21:22.326724 2016 | 2 | RUM_CAT_NORM_KEY | 84
355+
Wed May 18 17:21:22.326724 2016 | 2 | RUM_CAT_NORM_KEY | 85
356+
+inf | | | 86
357+
(79 rows)
358+
```
359+
360+
RUM (like GIN) on the internal pages of the entry tree packs the downward link and the key in pairs of the following type: `(P_n, K_{n+1})`. It turns out that there is no key for `P_0` (it is assumed to be equal to `-inf`), and for the last key `K_{n+1}` there is no downward link (it is assumed that it is the largest key (or high key) in the subtree to which the `P_n` link leads). For this reason (the key is `+inf` because it is the rightmost page at the current level of the tree), in the example above, the last line contains the key `+inf` (this key does not have a downward link).
361+
362+
### `rum_leaf_data_page_items(rel_name text, blk_num int4) returns set of record`
363+
364+
`rum_leaf_data_page_items` returns information that is stored on the entry tree leaf pages (it is extracted from compressed posting lists). For example:
365+
366+
```SQL
367+
SELECT * FROM rum_leaf_entry_page_items('rum_index', 10);
368+
key | attrnum | category | tuple_id | add_info_is_null | add_info | is_postring_tree | postring_tree_root
369+
-----+---------+------------------+----------+------------------+----------+------------------+--------------------
370+
ay | 1 | RUM_CAT_NORM_KEY | (0,16) | t | | f |
371+
ay | 1 | RUM_CAT_NORM_KEY | (0,23) | t | | f |
372+
ay | 1 | RUM_CAT_NORM_KEY | (2,1) | t | | f |
373+
...
374+
az | 1 | RUM_CAT_NORM_KEY | (0,15) | t | | f |
375+
az | 1 | RUM_CAT_NORM_KEY | (0,22) | t | | f |
376+
az | 1 | RUM_CAT_NORM_KEY | (1,4) | t | | f |
377+
...
378+
b9 | 1 | RUM_CAT_NORM_KEY | | | | t | 7
379+
...
380+
(1602 rows)
381+
```
382+
383+
Each posting list is an `IndexTuple` that stores the key value and a compressed list of `tids`. In the function `rum_leaf_data_page_items`, the key value is attached to each `tid` for convenience, but on the page it is stored in a single instance.
384+
385+
If the number of `tids` is too large, then instead of a posting list, a posting tree will be used for storage. In the example above, a posting tree was created (the key in the posting tree is the `tid`) for the key with the value `b9`. In this case, instead of the posting list, the magic number and the page number, which is the root of the posting tree, are stored inside the `IndexTuple`.
386+
387+
### `rum_internal_data_page_items(rel_name text, blk_num int4) returns set of record`
388+
389+
`rum_internal_data_page_items` returns information that is stored on the internal pages of the posting tree (it is extracted from arrays of `PostingItem` structures). For example:
390+
391+
```SQL
392+
SELECT * FROM rum_internal_data_page_items('rum_index', 7);
393+
is_high_key | block_number | tuple_id | add_info_is_null | add_info
394+
-------------+--------------+----------+------------------+----------
395+
t | | (0,0) | t |
396+
f | 9 | (138,79) | t |
397+
f | 8 | (0,0) | t |
398+
(3 rows)
399+
```
400+
401+
Each element on the internal pages of the posting tree contains the high key (`tid`) value for the child page and a link to this child page (as well as additional information if it was added when creating the index).
402+
403+
At the beginning of the internal pages of the posting tree, the high key of this page is always stored (if it has the value `(0,0)`, this is equivalent to `+inf`; this is always performed if the page is the rightmost).
404+
405+
At the moment, RUM does not support storing (as additional information) the data type that is pass by reference on the internal pages of the posting tree. Therefore, this output is possible:
406+
407+
```SQL
408+
is_high_key | block_number | tuple_id | add_info_is_null | add_info
409+
-------------+--------------+----------+------------------+------------------------------------------------
410+
...
411+
f | 23 | (39,43) | f | varlena types in posting tree is not supported
412+
f | 22 | (74,9) | f | varlena types in posting tree is not supported
413+
...
414+
```
415+
416+
### `rum_leaf_entry_page_items(rel_name text, blk_num int4) returns set of record`
417+
418+
`rum_leaf_entry_page_items` the function returns information that is stored on the leaf pages of the postnig tree (it is extracted from compressed posting lists). For example:
419+
420+
```SQL
421+
SELECT * FROM rum_leaf_data_page_items('rum_idx', 9);
422+
is_high_key | tuple_id | add_info_is_null | add_info
423+
-------------+-----------+------------------+----------
424+
t | (138,79) | t |
425+
f | (0,9) | t |
426+
f | (1,23) | t |
427+
f | (3,5) | t |
428+
f | (3,22) | t |
429+
```
430+
431+
Unlike entry tree leaf pages, on posting tree leaf pages, compressed posting lists are not stored in an `IndexTuple`. The high key is the largest key on the page.
432+
305433
## Todo
306434

307435
- Allow multiple additional information (lexemes positions + timestamp).

0 commit comments

Comments
 (0)