- 
                Notifications
    You must be signed in to change notification settings 
- Fork 705
index: add document for partial index #21903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| [APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. 
Needs approval from an approver in each of these files:
 Approvers can indicate their approval by writing  | 
| Summary of ChangesHello @YangKeao, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request adds comprehensive documentation for the partial index feature. The changes provide users with a clear understanding of how to define, use, and optimize queries with partial indexes, along with their limitations and performance advantages. This update aims to improve user experience by making this database feature more accessible and understandable. Highlights
 Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either  
 Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a  Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds documentation for the new partial index feature. The documentation is well-structured and provides good examples. I've made a few suggestions to improve clarity and address potential user confusion, particularly around when the optimizer chooses to use a partial index. I've also suggested minor wording improvements for better readability and a more formal tone, in line with the style guide.
| CREATE INDEX idx_pending_status ON users (created_at) WHERE status = 'pending'; | ||
| ``` | ||
| Then the following queries can use the partial index: | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current phrasing "Then the following queries can use the partial index:" is slightly misleading, as the last example (SELECT * FROM users WHERE status = 'pending';) does not use the partial index. To improve clarity, I suggest explaining that for a partial index to be used, the query must benefit from the indexed columns.
| Then the following queries can use the partial index: | |
| The following examples show how the optimizer uses (or does not use) partial indexes based on the query structure. For a partial index to be used, the query must not only match the `WHERE` condition of the index but also benefit from the indexed columns for filtering or ordering. | 
| | └─TableFullScan_6 | 10000.00 | cop[tikv] | table:users | keep order:false, stats:pseudo | | ||
| +-------------------------+----------+-----------+---------------+----------------------------------+ | ||
| 3 rows in set (0.00 sec) | ||
| ``` | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example correctly shows that the optimizer might not choose the partial index. To help users understand why, I suggest adding a note explaining that since the query doesn't filter or order by created_at, a full table scan is more efficient.
| ``` | |
| > **Note:** | |
| > In this case, although the `WHERE status = 'pending'` condition matches the partial index `idx_pending_status (created_at) WHERE status = 'pending'`, the query does not filter or order by `created_at`. Therefore, the optimizer chooses a full table scan as it is more efficient. | 
| 3 rows in set (0.00 sec) | ||
| ``` | ||
| If the predicates in query don't meet the index definition, the index will not be chosen even with hint: | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The phrase "don't meet the index definition" is a bit vague. For better clarity, I suggest rephrasing to explain that the query's WHERE clause must imply the condition of the partial index for it to be used.
| If the predicates in query don't meet the index definition, the index will not be chosen even with hint: | |
| If the query's `WHERE` clause does not imply the condition of the partial index, the index will not be used, even with a hint: | 
| Partial indexes offer several advantages: | ||
| 1. **Reduced storage**: Only rows matching the predicate are indexed, saving storage space | ||
| 2. **Faster DML**: It'll be faster to maintain the index of a subset of data during INSERT, UPDATE, and DELETE operations | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of "It'll" is informal. I suggest rephrasing for a more professional tone and improved clarity, and adding backticks to DML operations as per the style guide.
| 2. **Faster DML**: It'll be faster to maintain the index of a subset of data during INSERT, UPDATE, and DELETE operations | |
| 2. **Faster DML**: Index maintenance during `INSERT`, `UPDATE`, and `DELETE` operations is faster, as only a subset of data is indexed. | 
Signed-off-by: Yang Keao <yangkeao@chunibyo.icu>
de26acb    to
    355e3e4      
    Compare
  
    
First-time contributors' checklist
What is changed, added or deleted? (Required)
Which TiDB version(s) do your changes apply to? (Required)
Tips for choosing the affected version(s):
By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.
For details, see tips for choosing the affected versions.
What is the related PR or file link(s)?
Do your changes match any of the following descriptions?