Skip to content

Commit 763916e

Browse files
arseny114Arseny Kositsyn
authored andcommitted
[PGPRO-12159] Added functions for low-level inspect of the RUM index pages.
This commit adds six functions for low-level inspect of the RUM index pages: 1. rum_metapage_info() -- returns information about a RUM index metapage. 2. rum_page_opaque_info() -- returns information from the opaque area of the index's RUM page. 3. rum_internal_entry_page_items() -- returns information that is stored on the internal pages of the entry tree. 4. rum_leaf_entry_page_items() -- returns information that is stored on the leaf pages of the entry tree. 5. rum_internal_data_page_items() -- returns information that is stored on the internal pages of the posting tree. 6. rum_leaf_data_page_items() -- returns information that is stored on the leaf pages of the posting tree. To extract information, all these functions need to pass the index name and the page number. These functions are described in more detail in README.md Tags: rum
1 parent d666401 commit 763916e

File tree

11 files changed

+2397
-12
lines changed

11 files changed

+2397
-12
lines changed

Makefile

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,17 +2,17 @@
22

33
MODULE_big = rum
44
EXTENSION = rum
5-
EXTVERSION = 1.3
5+
EXTVERSION = 1.4
66
PGFILEDESC = "RUM index access method"
77

88
OBJS = src/rumsort.o src/rum_ts_utils.o src/rumtsquery.o \
99
src/rumbtree.o src/rumbulk.o src/rumdatapage.o \
1010
src/rumentrypage.o src/rumget.o src/ruminsert.o \
1111
src/rumscan.o src/rumutil.o src/rumvacuum.o src/rumvalidate.o \
12-
src/btree_rum.o src/rum_arr_utils.o $(WIN32RES)
12+
src/btree_rum.o src/rum_arr_utils.o src/rum_debug_funcs.o $(WIN32RES)
1313

1414
DATA = rum--1.0--1.1.sql rum--1.1--1.2.sql \
15-
rum--1.2--1.3.sql
15+
rum--1.2--1.3.sql rum--1.3--1.4.sql
1616

1717
DATA_built = $(EXTENSION)--$(EXTVERSION).sql
1818

README.md

Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -306,6 +306,134 @@ For type: `anyarray`
306306
This operator class stores `anyarray` elements with any supported by module
307307
field.
308308

309+
## Functions for low-level inspect of the RUM index pages
310+
311+
The RUM index provides several functions for low-level inspect of all types of its pages:
312+
313+
### `rum_metapage_info(rel_name text, blk_num int4) returns record`
314+
315+
`rum_metapage_info` returns information about a RUM index metapage. For example:
316+
317+
```SQL
318+
SELECT * FROM rum_metapage_info('rum_index', 0);
319+
-[ RECORD 1 ]----+-----------
320+
pending_head | 4294967295
321+
pending_tail | 4294967295
322+
tail_free_size | 0
323+
n_pending_pages | 0
324+
n_pending_tuples | 0
325+
n_total_pages | 87
326+
n_entry_pages | 80
327+
n_data_pages | 6
328+
n_entries | 1650
329+
version | 0xC0DE0002
330+
```
331+
332+
### `rum_page_opaque_info(rel_name text, blk_num int4) returns record`
333+
334+
`rum_page_opaque_info` returns information about a RUM index opaque area: `left` and `right` links, `maxoff` -- the number of elements that are stored on the page (this parameter is used differently for different types of pages), `freespace` -- free space on the page.
335+
336+
For example:
337+
338+
```SQL
339+
SELECT * FROM rum_page_opaque_info('rum_index', 10);
340+
leftlink | rightlink | maxoff | freespace | flags
341+
----------+-----------+--------+-----------+--------
342+
6 | 11 | 0 | 0 | {leaf}
343+
```
344+
345+
### `rum_internal_entry_page_items(rel_name text, blk_num int4) returns set of record`
346+
347+
`rum_internal_entry_page_items` returns information that is stored on the internal pages of the entry tree (it is extracted from `IndexTuples`). For example:
348+
349+
```SQL
350+
SELECT * FROM rum_internal_entry_page_items('rum_index', 1);
351+
key | attrnum | category | down_link
352+
---------------------------------+---------+------------------+-----------
353+
3d | 1 | RUM_CAT_NORM_KEY | 3
354+
6k | 1 | RUM_CAT_NORM_KEY | 2
355+
a8 | 1 | RUM_CAT_NORM_KEY | 4
356+
...
357+
Tue May 10 21:21:22.326724 2016 | 2 | RUM_CAT_NORM_KEY | 83
358+
Sat May 14 19:21:22.326724 2016 | 2 | RUM_CAT_NORM_KEY | 84
359+
Wed May 18 17:21:22.326724 2016 | 2 | RUM_CAT_NORM_KEY | 85
360+
+inf | | | 86
361+
(79 rows)
362+
```
363+
364+
RUM (like GIN) on the internal pages of the entry tree packs the downward link and the key in pairs of the following type: `(P_n, K_{n+1})`. It turns out that there is no key for `P_0` (it is assumed to be equal to `-inf`), and for the last key `K_{n+1}` there is no downward link (it is assumed that it is the largest key (or high key) in the subtree to which the `P_n` link leads). For this reason (the key is `+inf` because it is the rightmost page at the current level of the tree), in the example above, the last line contains the key `+inf` (this key does not have a downward link).
365+
366+
### `rum_leaf_entry_page_items(rel_name text, blk_num int4) returns set of record`
367+
368+
`rum_leaf_entry_page_items` returns information that is stored on the entry tree leaf pages (it is extracted from compressed posting lists). For example:
369+
370+
```SQL
371+
SELECT * FROM rum_leaf_entry_page_items('rum_index', 10);
372+
key | attrnum | category | tuple_id | add_info_is_null | add_info | is_posting_tree | posting_tree_root
373+
-----+---------+------------------+----------+------------------+----------+------------------+--------------------
374+
ay | 1 | RUM_CAT_NORM_KEY | (0,16) | t | | f |
375+
ay | 1 | RUM_CAT_NORM_KEY | (0,23) | t | | f |
376+
ay | 1 | RUM_CAT_NORM_KEY | (2,1) | t | | f |
377+
...
378+
az | 1 | RUM_CAT_NORM_KEY | (0,15) | t | | f |
379+
az | 1 | RUM_CAT_NORM_KEY | (0,22) | t | | f |
380+
az | 1 | RUM_CAT_NORM_KEY | (1,4) | t | | f |
381+
...
382+
b9 | 1 | RUM_CAT_NORM_KEY | | | | t | 7
383+
...
384+
(1602 rows)
385+
```
386+
387+
Each posting list is an `IndexTuple` that stores the key value and a compressed list of `tids`. In the function `rum_leaf_entry_page_items()`, the key value is attached to each `tid` for convenience, but on the page it is stored in a single instance.
388+
389+
If the number of `tids` is too large, then instead of a posting list, a posting tree will be used for storage. In the example above, a posting tree was created (the key in the posting tree is the `tid`) for the key with the value `b9`. In this case, instead of the posting list, the magic number and the page number, which is the root of the posting tree, are stored inside the `IndexTuple`.
390+
391+
### `rum_internal_data_page_items(rel_name text, blk_num int4) returns set of record`
392+
393+
`rum_internal_data_page_items` returns information that is stored on the internal pages of the posting tree (it is extracted from arrays of `RumPostingItem` structures). For example:
394+
395+
```SQL
396+
SELECT * FROM rum_internal_data_page_items('rum_index', 7);
397+
is_high_key | block_number | tuple_id | add_info_is_null | add_info
398+
-------------+--------------+----------+------------------+----------
399+
t | | (0,0) | t |
400+
f | 9 | (138,79) | t |
401+
f | 8 | (0,0) | t |
402+
(3 rows)
403+
```
404+
405+
Each element on the internal pages of the posting tree contains the high key (`tid`) value for the child page and a link to this child page (as well as additional information if it was added when creating the index).
406+
407+
At the beginning of the internal pages of the posting tree, the high key of this page is always stored (if it has the value `(0,0)`, this is equivalent to `+inf`; this is always performed if the page is the rightmost).
408+
409+
At the moment, RUM does not support storing (as additional information) the data type that is pass by reference on the internal pages of the posting tree. Therefore, this output is possible:
410+
411+
```SQL
412+
is_high_key | block_number | tuple_id | add_info_is_null | add_info
413+
-------------+--------------+----------+------------------+------------------------------------------------
414+
...
415+
f | 23 | (39,43) | f | varlena types in posting tree is not supported
416+
f | 22 | (74,9) | f | varlena types in posting tree is not supported
417+
...
418+
```
419+
420+
### `rum_leaf_data_page_items(rel_name text, blk_num int4) returns set of record`
421+
422+
`rum_leaf_data_page_items` the function returns information that is stored on the leaf pages of the postnig tree (it is extracted from compressed posting lists). For example:
423+
424+
```SQL
425+
SELECT * FROM rum_leaf_data_page_items('rum_idx', 9);
426+
is_high_key | tuple_id | add_info_is_null | add_info
427+
-------------+-----------+------------------+----------
428+
t | (138,79) | t |
429+
f | (0,9) | t |
430+
f | (1,23) | t |
431+
f | (3,5) | t |
432+
f | (3,22) | t |
433+
```
434+
435+
Unlike entry tree leaf pages, on posting tree leaf pages, compressed posting lists are not stored in an `IndexTuple`. The high key is the largest key on the page.
436+
309437
## Todo
310438

311439
- Allow multiple additional information (lexemes positions + timestamp).

expected/security_1.out

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,5 +18,5 @@ CONTEXT: SQL statement "CREATE FUNCTION rum_anyarray_similar(anyarray,anyarray)
1818
RETURNS bool
1919
AS '$libdir/rum'
2020
LANGUAGE C STRICT STABLE"
21-
extension script file "rum--1.3.sql", near line 1530
21+
extension script file "rum--1.4.sql", near line 1530
2222
DROP FUNCTION rum_anyarray_similar(anyarray,anyarray);

meson.build

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
# of the contrib source tree.
55

66
extension = 'rum'
7-
extversion = '1.3'
7+
extversion = '1.4'
88

99
rum_sources = files(
1010
'src/btree_rum.c',
@@ -49,6 +49,7 @@ install_data(
4949
'rum--1.0--1.1.sql',
5050
'rum--1.1--1.2.sql',
5151
'rum--1.2--1.3.sql',
52+
'rum--1.3--1.4.sql',
5253
kwargs: contrib_data_args,
5354
)
5455

rum--1.3--1.4.sql

Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
/*
2+
* RUM version 1.4
3+
*/
4+
5+
/*--------------------RUM debug functions-----------------------*/
6+
7+
CREATE FUNCTION rum_metapage_info(
8+
IN rel_name text,
9+
IN blk_num int4,
10+
OUT pending_head bigint,
11+
OUT pending_tail bigint,
12+
OUT tail_free_size int4,
13+
OUT n_pending_pages bigint,
14+
OUT n_pending_tuples bigint,
15+
OUT n_total_pages bigint,
16+
OUT n_entry_pages bigint,
17+
OUT n_data_pages bigint,
18+
OUT n_entries bigint,
19+
OUT version varchar)
20+
AS 'MODULE_PATHNAME', 'rum_metapage_info'
21+
LANGUAGE C STRICT PARALLEL SAFE;
22+
23+
CREATE FUNCTION rum_page_opaque_info(
24+
IN rel_name text,
25+
IN blk_num int4,
26+
OUT leftlink bigint,
27+
OUT rightlink bigint,
28+
OUT maxoff int4,
29+
OUT freespace int4,
30+
OUT flags text[])
31+
AS 'MODULE_PATHNAME', 'rum_page_opaque_info'
32+
LANGUAGE C STRICT PARALLEL SAFE;
33+
34+
CREATE OR REPLACE FUNCTION
35+
rum_page_items_info(rel_name text, blk_num int4, page_type int4)
36+
RETURNS SETOF record
37+
AS 'MODULE_PATHNAME', 'rum_page_items_info'
38+
LANGUAGE C STRICT;
39+
40+
CREATE FUNCTION rum_leaf_data_page_items(
41+
rel_name text,
42+
blk_num int4
43+
)
44+
RETURNS TABLE(
45+
is_high_key bool,
46+
tuple_id tid,
47+
add_info_is_null bool,
48+
add_info varchar
49+
)
50+
AS $$
51+
SELECT *
52+
FROM rum_page_items_info(rel_name, blk_num, 0)
53+
AS rum_page_items_info(
54+
is_high_key bool,
55+
tuple_id tid,
56+
add_info_is_null bool,
57+
add_info varchar
58+
);
59+
$$ LANGUAGE sql;
60+
61+
CREATE FUNCTION rum_internal_data_page_items(
62+
rel_name text,
63+
blk_num int4
64+
)
65+
RETURNS TABLE(
66+
is_high_key bool,
67+
block_number int4,
68+
tuple_id tid,
69+
add_info_is_null bool,
70+
add_info varchar
71+
)
72+
AS $$
73+
SELECT *
74+
FROM rum_page_items_info(rel_name, blk_num, 1)
75+
AS rum_page_items_info(
76+
is_high_key bool,
77+
block_number int4,
78+
tuple_id tid,
79+
add_info_is_null bool,
80+
add_info varchar
81+
);
82+
$$ LANGUAGE sql;
83+
84+
CREATE FUNCTION rum_leaf_entry_page_items(
85+
rel_name text,
86+
blk_num int4
87+
)
88+
RETURNS TABLE(
89+
key varchar,
90+
attrnum int4,
91+
category varchar,
92+
tuple_id tid,
93+
add_info_is_null bool,
94+
add_info varchar,
95+
is_postring_tree bool,
96+
postring_tree_root int4
97+
)
98+
AS $$
99+
SELECT *
100+
FROM rum_page_items_info(rel_name, blk_num, 2)
101+
AS rum_page_items_info(
102+
key varchar,
103+
attrnum int4,
104+
category varchar,
105+
tuple_id tid,
106+
add_info_is_null bool,
107+
add_info varchar,
108+
is_postring_tree bool,
109+
postring_tree_root int4
110+
);
111+
$$ LANGUAGE sql;
112+
113+
CREATE FUNCTION rum_internal_entry_page_items(
114+
rel_name text,
115+
blk_num int4
116+
)
117+
RETURNS TABLE(
118+
key varchar,
119+
attrnum int4,
120+
category varchar,
121+
down_link int4)
122+
AS $$
123+
SELECT *
124+
FROM rum_page_items_info(rel_name, blk_num, 3)
125+
AS rum_page_items_info(
126+
key varchar,
127+
attrnum int4,
128+
category varchar,
129+
down_link int4
130+
);
131+
$$ LANGUAGE sql;

rum.control

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# RUM extension
22
comment = 'RUM index access method'
3-
default_version = '1.3'
3+
default_version = '1.4'
44
module_pathname = '$libdir/rum'
55
relocatable = true

0 commit comments

Comments
 (0)