Skip to content

Commit 97026d0

Browse files
docs update (#237)
- etcd support removed - Aggregation cache metrics - not match(), not str_match - new patterns for match_all()
1 parent b9714fa commit 97026d0

File tree

8 files changed

+267
-74
lines changed

8 files changed

+267
-74
lines changed
222 KB
Loading
220 KB
Loading
192 KB
Loading

docs/operator-guide/etcd_restore.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,10 @@ description: >-
33
Restore a broken etcd cluster in OpenObserve by restarting pods, resetting
44
data, and rejoining members using CLI and updated Helm configs.
55
---
6-
# Etcd Cluster Restore (Deprecated)
6+
# Etcd Cluster Restore (Removed)
7+
8+
!!! warning "Removal notice"
9+
Etcd support has been removed. Use NATS instead.
710

811
Many users ran into the case only one of the 3 pods of etcd cluster can works. The other 2 pods always restart and can't back to work.
912

docs/sql_reference.md

Lines changed: 87 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ description: >-
55
---
66
This guide describes the custom SQL functions supported in OpenObserve for querying and processing logs and time series data. These functions extend the capabilities of standard SQL by enabling full-text search, array processing, and time-based aggregations.
77

8-
## Full-text Search Functions
8+
## Full-text search functions
99
These functions allow you to filter records based on keyword or pattern matches within one or more fields.
1010

1111
### `str_match(field, 'value')`
@@ -26,6 +26,31 @@ This query filters logs from the `default` stream where the `k8s_pod_name` field
2626

2727
![str_match](./images/sql-reference/str-match.png)
2828

29+
### `not str_match(field, 'value')`
30+
**Description**:<br>
31+
32+
- Filters logs where the specified field does NOT contain the exact string value.
33+
- The match is case-sensitive.
34+
- Only logs that do not include the exact characters and casing specified will be returned.
35+
- Can be combined with other conditions using AND/OR operators.
36+
37+
**Example**: <br>
38+
```sql
39+
SELECT * FROM "default" WHERE NOT str_match(k8s_app_instance, 'dev2')
40+
```
41+
![not str_match](./images/sql-reference/not-str-match.png)
42+
43+
**Combining multiple NOT conditions with AND:**
44+
```sql
45+
SELECT * FROM "default" WHERE (NOT str_match(k8s_app_instance, 'dev2')) AND (NOT str_match(k8s_cluster, 'dev2'))
46+
```
47+
![not str_match with AND operator](./images/sql-reference/not-str-match-with-and.png)
48+
49+
**Combining NOT conditions with OR:**
50+
```sql
51+
SELECT * FROM "default" WHERE NOT ((str_match(k8s_app_instance, 'dev2') OR str_match(k8s_cluster, 'dev2')))
52+
```
53+
![not str_match with OR operator](./images/sql-reference/not-str-match-with-or.png)
2954
---
3055
### `str_match_ignore_case(field, 'value')`
3156
**Alias**: `match_field_ignore_case(field, 'value')` (Available in OpenObserve version 0.15.0 and later)<br>
@@ -65,6 +90,51 @@ This query returns all logs in the `default` stream where the keyword `openobser
6590

6691
![match_all](./images/sql-reference/match-all.png)
6792

93+
**More pattern support**
94+
The `match_all` function also supports the following patterns for flexible searching:
95+
96+
- **Prefix search**: Matches keywords that start with the specified prefix:
97+
```sql
98+
SELECT * FROM "default" WHERE match_all('ab*')
99+
```
100+
- **Postfix search**: Matches keywords that end with the specified suffix:
101+
```sql
102+
SELECT * FROM "default" WHERE match_all('*ab')
103+
```
104+
- **Contains search**: Matches keywords that contain the substring anywhere:
105+
```sql
106+
SELECT * FROM "default" WHERE match_all('*ab*')
107+
```
108+
- **Phrase prefix search**: Matches keywords where the last term uses prefix matching:
109+
```sql
110+
SELECT * FROM "default" WHERE match_all('key1 key2*')
111+
```
112+
### `not match_all('value')`
113+
**Description**: <br>
114+
115+
- Filters logs by excluding records where the keyword appears in any field that has the Index Type set to Full Text Search in the stream settings.
116+
- This function is case-insensitive and excludes matches regardless of the keyword's casing.
117+
- **Important**: Only searches fields configured as Full Text Search fields. Other fields in the record are not evaluated.
118+
- Provides significant performance improvements when used with indexed fields.
119+
120+
**Example**:
121+
```sql
122+
SELECT * FROM "default" WHERE NOT match_all('foo')
123+
```
124+
This query returns all logs in the `default` stream where the keyword `foo` does NOT appear in any of the full-text indexed fields. Fields not configured for full-text search are ignored.
125+
126+
**Combining NOT match_all with NOT str_match**:
127+
```sql
128+
SELECT * FROM "default" WHERE (NOT str_match(f1, 'bar')) AND (NOT match_all('foo'))
129+
```
130+
This query returns logs where field `f1` does NOT contain `bar` AND no full-text indexed field contains `foo`. In other words, it excludes records that match either condition.
131+
132+
**Using NOT with OR conditions**:
133+
```sql
134+
SELECT * FROM "default" WHERE NOT (str_match(f1, 'bar') OR match_all('foo'))
135+
```
136+
This query returns logs where BOTH conditions are false: field `f1` does NOT contain `bar` AND no full-text indexed field contains `foo`. In other words, it excludes records that match either condition.
137+
68138
---
69139
### `re_match(field, 'pattern')`
70140
**Description**: <br>
@@ -113,7 +183,7 @@ This query returns logs from the `default` stream where the `k8s_container_name`
113183

114184
---
115185

116-
## Array Functions
186+
## Array functions
117187
The array functions operate on fields that contain arrays. In OpenObserve, array fields are typically stored as stringified JSON arrays.
118188
<br>For example, in a stream named `default`, there may be a field named `emails` that contains the following value:
119189
`["jim@email.com", "john@doe.com", "jene@doe.com"]` <br>
@@ -302,7 +372,7 @@ In this query:
302372

303373
---
304374

305-
## Aggregate Functions
375+
## Aggregate functions
306376
Aggregate functions compute a single result from a set of input values. For usage of standard SQL aggregate functions such as `COUNT`, `SUM`, `AVG`, `MIN`, and `MAX`, refer to [PostgreSQL documentation](https://www.postgresql.org/docs/).
307377

308378
### `histogram(field, 'duration')`
@@ -324,7 +394,7 @@ FROM "default"
324394
GROUP BY key
325395
ORDER BY key
326396
```
327-
**Expected Output**: <br>
397+
**Expected output**: <br>
328398

329399
This query divides the log data into 30-second intervals.
330400
Each row in the result shows:
@@ -416,7 +486,7 @@ ORDER BY request_count DESC
416486
- Each core maintains hash tables during aggregation across all partitions
417487
- Memory usage: 3M entries × 60 cores × 60 partitions = 10.8 billion hash table entries
418488

419-
**Typical Error Message:**
489+
**Typical error message:**
420490
```
421491
Resources exhausted: Failed to allocate additional 63232256 bytes for GroupedHashAggregateStream[20] with 0 bytes already allocated for this reservation - 51510301 bytes remain available for the total pool
422492
```
@@ -434,7 +504,7 @@ ORDER BY request_count DESC
434504
**Scenario** <br>
435505
Find the top 10 client IPs by request count from web server logs distributed across 3 follower query nodes.
436506

437-
**Raw Data Distribution** <br>
507+
**Raw data distribution** <br>
438508

439509
| Rank | Node 1 | Requests | Node 2 | Requests | Node 3 | Requests |
440510
|------|---------|----------|---------|----------|---------|----------|
@@ -450,7 +520,7 @@ ORDER BY request_count DESC
450520
| 10 | 192.168.1.150 | 440 | 192.168.1.150 | 520 | 192.168.1.150 | 450 |
451521

452522

453-
**Follower Query Nodes Process Data** <br>
523+
**Follower query nodes process data** <br>
454524

455525
Each follower node executes the query locally and returns only its top 10 results:
456526

@@ -467,7 +537,7 @@ ORDER BY request_count DESC
467537
| 9 | 203.0.113.80 | 460 | 10.0.0.25 | 560 | 172.16.0.30 | 490 |
468538
| 10 | 192.168.1.150 | 440 | 192.168.1.150 | 520 | 192.168.1.150 | 450 |
469539

470-
**Leader Query Node Aggregates Results** <br>
540+
**Leader query node aggregates results** <br>
471541

472542
| Client IP | Node 1 | Node 2 | Node 3 | Total Requests |
473543
|-----------|---------|---------|---------|----------------|
@@ -482,7 +552,7 @@ ORDER BY request_count DESC
482552
| 172.16.0.30 | 480 | 580 | 490 | **1,550** |
483553
| 192.168.1.150 | 440 | 520 | 450 | **1,410** |
484554

485-
**Final Top 10 Results:**
555+
**Final top 10 results:**
486556

487557
| Rank | Client IP | Total Requests |
488558
|------|-----------|----------------|
@@ -497,7 +567,7 @@ ORDER BY request_count DESC
497567
| 9 | 172.16.0.30 | 1,550 |
498568
| 10 | 192.168.1.150 | 1,410 |
499569

500-
**Why Results Are Approximate** <br>
570+
**Why results are approximate** <br>
501571

502572
The approx_topk function returns approximate results because it relies on each query node sending only its local top N entries to the leader. The leader combines these partial lists to produce the final result.
503573

@@ -599,7 +669,7 @@ ORDER BY distinct_count DESC
599669
- Memory usage for distinct counting: Potentially unlimited storage for tracking unique values.
600670
- Combined with grouping: Memory requirements become exponentially larger.
601671

602-
**Typical Error Message:**
672+
**Typical error message:**
603673
```
604674
Resources exhausted: Failed to allocate additional 63232256 bytes for GroupedHashAggregateStream[20] with 0 bytes already allocated for this reservation - 51510301 bytes remain available for the total pool
605675
```
@@ -610,7 +680,7 @@ ORDER BY distinct_count DESC
610680
SELECT approx_topk_distinct(clientip, clientas, 10) FROM default
611681
```
612682

613-
**Combined Approach:**
683+
**Combined approach:**
614684

615685
- **HyperLogLog**: Handles distinct counting using a fixed **16 kilobytes** data structure per group.
616686
- **Space-Saving**: Limits the number of groups returned from each partition to top K.
@@ -619,7 +689,7 @@ ORDER BY distinct_count DESC
619689
**Example: Web Server User Agent Analysis**
620690
Find the top 10 client IPs by unique user agent count from web server logs in the `default` stream.
621691

622-
**Raw Data Distribution**
692+
**Raw data distribution**
623693

624694
| Node 1 | Distinct User Agents | Node 2 | Distinct User Agents | Node 3 | Distinct User Agents |
625695
|---------|---------------------|---------|---------------------|---------|---------------------|
@@ -636,7 +706,7 @@ ORDER BY distinct_count DESC
636706

637707
**Note**: Each distinct count is computed using HyperLogLog's 16KB data structure per client IP.
638708

639-
**Follower Query Nodes Process Data**
709+
**Follower query nodes process data**
640710

641711
Each follower node executes the query locally and returns only its top 10 results:
642712

@@ -653,7 +723,7 @@ ORDER BY distinct_count DESC
653723
| 9 | 203.0.113.80 | 220 | 10.0.0.25 | 270 | 172.16.0.30 | 260 |
654724
| 10 | 192.168.1.150 | 200 | 192.168.1.150 | 250 | 192.168.1.150 | 240 |
655725

656-
**Leader Query Node Aggregates Results**
726+
**Leader query node aggregates results**
657727

658728
| Client IP | Node 1 | Node 2 | Node 3 | Total Distinct User Agents |
659729
|-----------|---------|---------|---------|---------------------------|
@@ -668,7 +738,7 @@ ORDER BY distinct_count DESC
668738
| 172.16.0.30 | 240 | 290 | 260 | **790** |
669739
| 192.168.1.150 | 200 | 250 | 240 | **690** |
670740

671-
**Final Top 10 Results:**
741+
**Final top 10 results:**
672742

673743
| Rank | Client IP | Total Distinct User Agents |
674744
|------|-----------|---------------------------|
@@ -684,7 +754,7 @@ ORDER BY distinct_count DESC
684754
| 10 | 192.168.1.150 | 690 |
685755

686756

687-
**Why Results Are Approximate**
757+
**Why results are approximate**
688758
Results are approximate due to two factors:
689759

690760
1. **HyperLogLog approximation:** Distinct counts are estimated, not exact.

docs/storage-management/storage.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -191,7 +191,11 @@ OpenObserve supports multiple metadata store backends, configurable using the `Z
191191
- Recommended for production deployments due to reliability and scalability.
192192
- The default Helm chart (after February 23, 2024) uses [cloudnative-pg](https://cloudnative-pg.io/) to create a postgres cluster (primary + replica) which is used as the meta store. These instances provide high availability and backup support.
193193

194-
### etcd (Deprecated)
194+
### etcd (Removed)
195+
196+
!!! warning "Removal notice"
197+
Etcd support has been removed. Use NATS instead.
198+
195199
- Set `ZO_META_STORE=etcd`.
196200
- While etcd is used as the cluster coordinator, it was also the default metadata store in Helm charts released before 23 February 2024. This configuration is now deprecated. Helm charts released after 23 February 2024 use PostgreSQL as the default metadata store.
197201

docs/user-guide/management/aggregation-cache.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -150,7 +150,15 @@ This page explains what streaming aggregation is and shows how to use it to impr
150150
- [approx_topk_distinct](https://openobserve.ai/docs/sql-functions/approximate-aggregate/approx-topk-distinct/)
151151

152152
---
153+
## Aggregation cache metrics
154+
OpenObserve exposes Prometheus metrics to monitor aggregation cache performance and memory usage.
153155

156+
| Metric | Description |
157+
|--------|-------------|
158+
| `zo_query_aggregation_cache_items` | Monitor to understand cache utilization and verify that streaming aggregation is populating the cache as expected |
159+
| `zo_query_aggregation_cache_bytes` | Monitor memory consumption to ensure the cache stays within acceptable limits and doesn't exhaust system resources |
160+
161+
---
154162
=== "How to use"
155163

156164
## How to use streaming aggregation

0 commit comments

Comments
 (0)