-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Fix issue with internal git repo telemetry collection #1408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix issue with internal git repo telemetry collection #1408
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes a critical issue with git repository telemetry collection where diff strings were empty despite files being correctly identified as changed. The root cause was a mismatch between the reference used to identify changed files (upstream commit) and the reference used to generate diffs (HEAD).
Key changes:
- Introduces a new method
getWorkingTreeDiffsFromRef()
to generate diffs against a specific reference - Adds status string mapping to convert git status enums to readable strings for telemetry
- Updates all telemetry collection to use the new diff method with the correct upstream reference
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
src/platform/git/common/gitDiffService.ts |
Adds interface method for diff generation against specific ref |
src/platform/git/common/nullGitDiffService.ts |
Implements null service method for new diff interface |
src/extension/prompt/vscode-node/gitDiffService.ts |
Implements core diff generation logic against specified ref |
src/extension/prompt/node/repoInfoTelemetry.ts |
Updates telemetry to use new diff method and adds status string mapping |
src/extension/prompt/node/test/repoInfoTelemetry.spec.ts |
Updates all test mocks to use new diff method and validates string status output |
let diff: string; | ||
if (change.status === 7 /* UNTRACKED */) { | ||
// For untracked files, generate a patch showing all content as additions | ||
diff = await this._getUntrackedChangePatch(repository, change.uri); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were you able to test with changes added to the index? That seems to have special handling in the other function, so I just want to confirm it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sbatten This was a good call to check. I had checked with committed and with unstaged in particular but not with index changes on the new function. There was a tiny bit of a rabbit hole here when I investigated, so I'll summarize.
I checked some staged scenarios with modified files and while I got the correct diff to recreate the file each time, I was a bit surprised that I didn't see INDEX_MODIFIED status in scenarios where I expected it. But in the git extension I saw that if we are using diffWith on a ref we'll generally just be getting a MODIFIED, since diffWith is not using --cached, so it's purposefully ignoring staged vs unstaged in the local repository versus the target ref. It's just going to return the diff from the target ref on modified files for everything between the current state and that ref and won't return an INDEX_MODIFIED.
To test I did the following rotation (just using shorthand here to summarize), which all looked good to me it always had MODIFIED and the full diff of all current changes.
fileA => changeA (unstaged) => diff MODIFIED (changeA) // So the diff showed MODIFIED status with changeA in the file diff
fileA => changeA (staged) => diff MODIFIED (changeA)
fileA => changeA (committed) => diff MODIFIED (changeA)
fileA => changeA (committed) + changeB (unstaged) => diff MODIFIED (changeA + changeB)
fileA => changeA (committed) + changeB (staged) => diff MODIFIED (changeA + changeB)
fileA => changeA (committed) + changeB (committed) => diff MODIFIED (changeA + changeB)
fileA => changeA (committed) + changeB (committed) + changeC (unstaged) => diff MODIFIED (changeA + changeB + changeC)
However your comment here did lead me to check and find a bug in one scenario. One of the places that we do currently see an INDEX_* status is when a new file is added, then it shows up as INDEX_ADDED. In this case it returns the new file content via the getWorkingTreeDiffsFromRef, which is correct. But in the case where we have a new file and that new file is in the repository and not ignored, but is untracked we were missing that file. That file won't show up at all in diffWith against the ref. Instead it will be in either untrackedChanges or workingTreeChanges with status UNTRACKED (which it ends up in can change based on config). So after the code where we get our set of file changes from diffWith I add in any changes (by file name) that are not already picked up by diffWith, this pulls in those untracked files. So tested this out with github and ADO repos.
When a new file is untracked is shows up with the file contents and status: UNTRACKED. Then when it's added with git add it shows up with the file contents as INDEX_ADDED. In both cases it has the correct file contents. If you modify the file after adding it without pushing the add to upstream, you get INDEX_ADDED still with the modified file contents.
const changeMap = new Map<string, Change>(); | ||
|
||
// Prority to the diffWith changes, then working tree changes, then untracked changes. | ||
for (const change of diffChanges) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This area has the change to pick up the missing UNTRACKED files. According to the git extension it looks like they could show up in either workingTreeChanges or untrackedChanges depending on config. Anything that shows up already via diffWith gets preferred as that's the same diffWith command we'll run to get the actual diff string.
assert.ok(diffs[0].renameUri); | ||
}); | ||
|
||
test('should include untracked files from both workingTreeChanges and untrackedChanges', async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New test to check that we pick up the untracked files.
There was an issue that we ran into when looking at the git repo data that I added. In particular there were a lot of files that were listed as having diffs, but inside the diffJSON package the actual diff string was "".
I investigated and the issue was pretty clear. The files changed were calculated looking at the upstreamCommit ref that we calculate, but the actual individual diff string calculations in getChangeDiff were looking at a diff against HEAD, not against that computed ref. When testing this would work with staged, un-staged, or new files. But it fails if you have local commits that are not pushed upstream. In this case it correctly registers the files as changed, but doesn't pick up the diff from getDiffChanges.
Also per request from science team I added a mapping to convert the file operation (MODIFIED, DELETED, ect..) from the enum value to the readable string.
Scenarios that I tested manually with this (first scenario here is the easiest way to see the repo):