-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Empty metadata support for autotagger plugins #6065
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `test/plugins/test_musicbrainz.py:1035-1040` </location>
<code_context>
+ ("Artist", "Title", 1),
+ (None, "Title", 1),
+ ("Artist", None, 1),
+ (None, None, 0),
+ ],
+ )
+ def test_item_candidates(
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding a test case for empty strings as artist/title.
Please include test cases with empty strings ("", " ") for artist and/or title to verify this conversion logic is properly tested.
```suggestion
[
("Artist", "Title", 1),
(None, "Title", 1),
("Artist", None, 1),
(None, None, 0),
("", "Title", 1),
("Artist", "", 1),
("", "", 0),
(" ", "Title", 1),
("Artist", " ", 1),
(" ", " ", 0),
(None, "", 0),
("", None, 0),
(None, " ", 0),
(" ", None, 0),
],
```
</issue_to_address>
### Comment 2
<location> `test/plugins/test_musicbrainz.py:1055` </location>
<code_context>
)
- candidates = list(mb.item_candidates(Item(), "hello", "there"))
+ candidates = list(mb.item_candidates(Item(), artist, title))
- assert len(candidates) == 1
</code_context>
<issue_to_address>
**suggestion (testing):** Missing test for error handling when plugin returns unexpected data.
Please add a test where the mocked plugin returns malformed or incomplete data to verify the function handles it without crashing.
</issue_to_address>
### Comment 3
<location> `test/plugins/test_musicbrainz.py:1058-1059` </location>
<code_context>
</code_context>
<issue_to_address>
**issue (code-quality):** Avoid conditionals in tests. ([`no-conditionals-in-tests`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/no-conditionals-in-tests))
<details><summary>Explanation</summary>Avoid complex code, like conditionals, in test functions.
Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:
* loops
* conditionals
Some ways to fix this:
* Use parametrized tests to get rid of the loop.
* Move the complex logic into helpers.
* Move the complex part into pytest fixtures.
> Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.
Software Engineering at Google / [Don't Put Logic in Tests](https://abseil.io/resources/swe-book/html/ch12.html#donapostrophet_put_logic_in_tests)
</details>
</issue_to_address>
### Comment 4
<location> `beets/autotag/match.py:324` </location>
<code_context>
def tag_item(
item: Item,
search_artist: str | None = None,
search_title: str | None = None,
search_ids: list[str] | None = None,
) -> Proposal:
"""Find metadata for a single track. Return a `Proposal` consisting
of `TrackMatch` objects.
`search_artist` and `search_title` may be used to override the item
metadata in the search query. `search_ids` may be used for restricting the
search to a list of metadata backend IDs.
"""
# Holds candidates found so far: keys are MBIDs; values are
# (distance, TrackInfo) pairs.
candidates = {}
rec: Recommendation | None = None
# First, try matching by the external source ID.
trackids = search_ids or [t for t in [item.mb_trackid] if t]
if trackids:
for trackid in trackids:
log.debug("Searching for track ID: {}", trackid)
if info := metadata_plugins.track_for_id(trackid):
dist = track_distance(item, info, incl_artist=True)
candidates[info.track_id] = hooks.TrackMatch(dist, info)
# If this is a good match, then don't keep searching.
rec = _recommendation(_sort_candidates(candidates.values()))
if (
rec == Recommendation.strong
and not config["import"]["timid"]
):
log.debug("Track ID match.")
return Proposal(_sort_candidates(candidates.values()), rec)
# If we're searching by ID, don't proceed.
if search_ids:
if candidates:
assert rec is not None
return Proposal(_sort_candidates(candidates.values()), rec)
else:
return Proposal([], Recommendation.none)
# Search terms.
search_artist = search_artist or item.artist
search_title = search_title or item.title or item.filepath.stem
log.debug("Item search terms: {} - {}", search_artist, search_title)
# Replace empty string with None
if isinstance(search_artist, str) and search_artist.strip() == "":
search_artist = None
if isinstance(search_title, str) and search_title.strip() == "":
search_title = None
# Get and evaluate candidate metadata.
for track_info in metadata_plugins.item_candidates(
item, search_artist, search_title
):
dist = track_distance(item, track_info, incl_artist=True)
candidates[track_info.track_id] = hooks.TrackMatch(dist, track_info)
# Sort by distance and return with recommendation.
log.debug("Found {} candidates.", len(candidates))
candidates_sorted = _sort_candidates(candidates.values())
rec = _recommendation(candidates_sorted)
return Proposal(candidates_sorted, rec)
</code_context>
<issue_to_address>
**issue (code-quality):** We've found these issues:
- Use named expression to simplify assignment and conditional ([`use-named-expression`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-named-expression/))
- Swap if/else branches ([`swap-if-else-branches`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/swap-if-else-branches/))
- Remove unnecessary else after guard condition ([`remove-unnecessary-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/remove-unnecessary-else/))
- Low code quality found in tag\_item - 24% ([`low-code-quality`](https://docs.sourcery.ai/Reference/Default-Rules/comments/low-code-quality/))
<br/><details><summary>Explanation</summary>
The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.
How can you solve this?
It might be worth refactoring this function to make it shorter and more readable.
- Reduce the function length by extracting pieces of functionality out into
their own functions. This is the most important thing you can do - ideally a
function should be less than 10 lines.
- Reduce nesting, perhaps by introducing guard clauses to return early.
- Ensure that variables are tightly scoped, so that code using related concepts
sits together within the function rather than being scattered.</details>
</issue_to_address>
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6065 +/- ##
==========================================
- Coverage 66.98% 66.93% -0.06%
==========================================
Files 118 118
Lines 18189 18206 +17
Branches 3079 3084 +5
==========================================
+ Hits 12184 12186 +2
- Misses 5345 5359 +14
- Partials 660 661 +1
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `test/plugins/test_musicbrainz.py:1057-1058` </location>
<code_context>
- assert len(candidates) == 1
- assert candidates[0].track_id == self.RECORDING["id"]
+ assert len(candidates) == expected_count
+ if expected_count == 1:
+ assert candidates[0].track_id == self.RECORDING["id"]
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding a test for empty string values ("") for artist and title.
Tests currently check for None but not for empty strings. Since empty strings are normalized to None, please add test cases for empty string inputs to verify this behavior.
</issue_to_address>
### Comment 2
<location> `test/plugins/test_musicbrainz.py:1058-1059` </location>
<code_context>
</code_context>
<issue_to_address>
**issue (code-quality):** Avoid conditionals in tests. ([`no-conditionals-in-tests`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/no-conditionals-in-tests))
<details><summary>Explanation</summary>Avoid complex code, like conditionals, in test functions.
Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:
* loops
* conditionals
Some ways to fix this:
* Use parametrized tests to get rid of the loop.
* Move the complex logic into helpers.
* Move the complex part into pytest fixtures.
> Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.
Software Engineering at Google / [Don't Put Logic in Tests](https://abseil.io/resources/swe-book/html/ch12.html#donapostrophet_put_logic_in_tests)
</details>
</issue_to_address>
### Comment 3
<location> `beets/autotag/match.py:324` </location>
<code_context>
def tag_item(
item: Item,
search_artist: str | None = None,
search_title: str | None = None,
search_ids: list[str] | None = None,
) -> Proposal:
"""Find metadata for a single track. Return a `Proposal` consisting
of `TrackMatch` objects.
`search_artist` and `search_title` may be used to override the item
metadata in the search query. `search_ids` may be used for restricting the
search to a list of metadata backend IDs.
"""
# Holds candidates found so far: keys are MBIDs; values are
# (distance, TrackInfo) pairs.
candidates = {}
rec: Recommendation | None = None
# First, try matching by the external source ID.
trackids = search_ids or [t for t in [item.mb_trackid] if t]
if trackids:
for trackid in trackids:
log.debug("Searching for track ID: {}", trackid)
if info := metadata_plugins.track_for_id(trackid):
dist = track_distance(item, info, incl_artist=True)
candidates[info.track_id] = hooks.TrackMatch(dist, info)
# If this is a good match, then don't keep searching.
rec = _recommendation(_sort_candidates(candidates.values()))
if (
rec == Recommendation.strong
and not config["import"]["timid"]
):
log.debug("Track ID match.")
return Proposal(_sort_candidates(candidates.values()), rec)
# If we're searching by ID, don't proceed.
if search_ids:
if candidates:
assert rec is not None
return Proposal(_sort_candidates(candidates.values()), rec)
else:
return Proposal([], Recommendation.none)
# Search terms.
search_artist = search_artist or item.artist
search_title = search_title or item.title or item.filepath.stem
log.debug("Item search terms: {} - {}", search_artist, search_title)
# Replace empty string with None
if isinstance(search_artist, str) and search_artist.strip() == "":
search_artist = None
if isinstance(search_title, str) and search_title.strip() == "":
search_title = None
# Get and evaluate candidate metadata.
for track_info in metadata_plugins.item_candidates(
item, search_artist, search_title
):
dist = track_distance(item, track_info, incl_artist=True)
candidates[track_info.track_id] = hooks.TrackMatch(dist, track_info)
# Sort by distance and return with recommendation.
log.debug("Found {} candidates.", len(candidates))
candidates_sorted = _sort_candidates(candidates.values())
rec = _recommendation(candidates_sorted)
return Proposal(candidates_sorted, rec)
</code_context>
<issue_to_address>
**issue (code-quality):** We've found these issues:
- Use named expression to simplify assignment and conditional ([`use-named-expression`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-named-expression/))
- Swap if/else branches ([`swap-if-else-branches`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/swap-if-else-branches/))
- Remove unnecessary else after guard condition ([`remove-unnecessary-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/remove-unnecessary-else/))
- Low code quality found in tag\_item - 24% ([`low-code-quality`](https://docs.sourcery.ai/Reference/Default-Rules/comments/low-code-quality/))
<br/><details><summary>Explanation</summary>
The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.
How can you solve this?
It might be worth refactoring this function to make it shorter and more readable.
- Reduce the function length by extracting pieces of functionality out into
their own functions. This is the most important thing you can do - ideally a
function should be less than 10 lines.
- Reduce nesting, perhaps by introducing guard clauses to return early.
- Ensure that variables are tightly scoped, so that code using related concepts
sits together within the function rather than being scattered.</details>
</issue_to_address>
### Comment 4
<location> `beetsplug/discogs.py:190-193` </location>
<code_context>
def candidates(
self,
items: Sequence[Item],
artist: str | None,
album: str | None,
va_likely: bool,
) -> Iterable[AlbumInfo]:
query = ""
if artist is not None:
query += artist
if album is not None:
query += f" {album}"
if va_likely:
query = album or ""
query = query.strip()
if not query:
return []
return self.get_albums(query)
</code_context>
<issue_to_address>
**suggestion (code-quality):** We've found these issues:
- Lift code into else after jump in control flow ([`reintroduce-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/reintroduce-else/))
- Replace if statement with if expression ([`assign-if-exp`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/assign-if-exp/))
```suggestion
return [] if not query else self.get_albums(query)
```
</issue_to_address>
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
e2683ac
to
b525d4c
Compare
88450c7
to
4f0beba
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- In beatport._get_releases the early
return
produces None (not an iterable), which will breakyield from
—change it to return an empty list or generator to keep the expected iterable type. - After changing the signature of
metadata_plugins.candidates
/item_candidates
, double-check that all existing plugins updated their implementations (or add a compatibility shim) so they won’t error when receiving keyword args instead of positional.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In beatport._get_releases the early `return` produces None (not an iterable), which will break `yield from`—change it to return an empty list or generator to keep the expected iterable type.
- After changing the signature of `metadata_plugins.candidates`/`item_candidates`, double-check that all existing plugins updated their implementations (or add a compatibility shim) so they won’t error when receiving keyword args instead of positional.
## Individual Comments
### Comment 1
<location> `beetsplug/beatport.py:463-468` </location>
<code_context>
+
+ # query may be empty strings
+ # We want to skip the lookup in this case.
+ if not query.strip():
+ self._log.debug(
+ "Empty search query after preprocessing, skipping {.data_source}.",
+ self,
+ )
+ return
+
for beatport_release in self.client.search(query, "release"):
</code_context>
<issue_to_address>
**issue (bug_risk):** Returning None instead of an empty iterable may cause issues for consumers expecting an iterable.
Returning None here may cause runtime errors if the caller uses iteration. Use 'return []' to ensure consistency and prevent such issues.
</issue_to_address>
### Comment 2
<location> `beets/autotag/match.py:246` </location>
<code_context>
def tag_album(
items,
search_artist: str | None = None,
search_album: str | None = None,
search_ids: list[str] = [],
) -> tuple[str, str, Proposal]:
"""Return a tuple of the current artist name, the current album
name, and a `Proposal` containing `AlbumMatch` candidates.
The artist and album are the most common values of these fields
among `items`.
The `AlbumMatch` objects are generated by searching the metadata
backends. By default, the metadata of the items is used for the
search. This can be customized by setting the parameters.
`search_ids` is a list of metadata backend IDs: if specified,
it will restrict the candidates to those IDs, ignoring
`search_artist` and `search album`. The `mapping` field of the
album has the matched `items` as keys.
The recommendation is calculated from the match quality of the
candidates.
"""
# Get current metadata.
likelies, consensus = get_most_common_tags(items)
cur_artist: str = likelies["artist"]
cur_album: str = likelies["album"]
log.debug("Tagging {} - {}", cur_artist, cur_album)
# The output result, keys are the MB album ID.
candidates: dict[Any, AlbumMatch] = {}
# Search by explicit ID.
if search_ids:
for search_id in search_ids:
log.debug("Searching for album ID: {}", search_id)
if info := metadata_plugins.album_for_id(search_id):
_add_candidate(items, candidates, info)
# Use existing metadata or text search.
else:
# Try search based on current ID.
if info := match_by_id(items):
_add_candidate(items, candidates, info)
rec = _recommendation(list(candidates.values()))
log.debug("Album ID match recommendation is {}", rec)
if candidates and not config["import"]["timid"]:
# If we have a very good MBID match, return immediately.
# Otherwise, this match will compete against metadata-based
# matches.
if rec == Recommendation.strong:
log.debug("ID match.")
return (
cur_artist,
cur_album,
Proposal(list(candidates.values()), rec),
)
# Search terms.
_search_artist, _search_album = _parse_search_terms(
(search_artist, cur_artist),
(search_album, cur_album),
)
log.debug("Search terms: {} - {}", _search_artist, _search_album)
# Is this album likely to be a "various artist" release?
va_likely = (
(not consensus["artist"])
or (_search_artist.lower() in VA_ARTISTS)
or any(item.comp for item in items)
)
log.debug("Album might be VA: {}", va_likely)
# Get the results from the data sources.
for matched_candidate in metadata_plugins.candidates(
items, _search_artist, _search_album, va_likely
):
_add_candidate(items, candidates, matched_candidate)
log.debug("Evaluating {} candidates.", len(candidates))
# Sort and get the recommendation.
candidates_sorted = _sort_candidates(candidates.values())
rec = _recommendation(candidates_sorted)
return cur_artist, cur_album, Proposal(candidates_sorted, rec)
</code_context>
<issue_to_address>
**issue (code-quality):** Replace mutable default arguments with None ([`default-mutable-arg`](https://docs.sourcery.ai/Reference/Default-Rules/suggestions/default-mutable-arg/))
</issue_to_address>
### Comment 3
<location> `beets/autotag/match.py:334` </location>
<code_context>
def tag_item(
item: Item,
search_artist: str | None = None,
search_title: str | None = None,
search_ids: list[str] | None = None,
) -> Proposal:
"""Find metadata for a single track. Return a `Proposal` consisting
of `TrackMatch` objects.
`search_artist` and `search_title` may be used to override the item
metadata in the search query. `search_ids` may be used for restricting the
search to a list of metadata backend IDs.
"""
# Holds candidates found so far: keys are MBIDs; values are
# (distance, TrackInfo) pairs.
candidates = {}
rec: Recommendation | None = None
# First, try matching by the external source ID.
trackids = search_ids or [t for t in [item.mb_trackid] if t]
if trackids:
for trackid in trackids:
log.debug("Searching for track ID: {}", trackid)
if info := metadata_plugins.track_for_id(trackid):
dist = track_distance(item, info, incl_artist=True)
candidates[info.track_id] = hooks.TrackMatch(dist, info)
# If this is a good match, then don't keep searching.
rec = _recommendation(_sort_candidates(candidates.values()))
if (
rec == Recommendation.strong
and not config["import"]["timid"]
):
log.debug("Track ID match.")
return Proposal(_sort_candidates(candidates.values()), rec)
# If we're searching by ID, don't proceed.
if search_ids:
if candidates:
assert rec is not None
return Proposal(_sort_candidates(candidates.values()), rec)
else:
return Proposal([], Recommendation.none)
# Search terms.
_search_artist, _search_title = _parse_search_terms(
(search_artist, item.artist),
(search_title, item.title),
)
log.debug("Item search terms: {} - {}", _search_artist, _search_title)
# Get and evaluate candidate metadata.
for track_info in metadata_plugins.item_candidates(
item,
_search_artist,
_search_title,
):
dist = track_distance(item, track_info, incl_artist=True)
candidates[track_info.track_id] = hooks.TrackMatch(dist, track_info)
# Sort by distance and return with recommendation.
log.debug("Found {} candidates.", len(candidates))
candidates_sorted = _sort_candidates(candidates.values())
rec = _recommendation(candidates_sorted)
return Proposal(candidates_sorted, rec)
</code_context>
<issue_to_address>
**issue (code-quality):** We've found these issues:
- Use named expression to simplify assignment and conditional ([`use-named-expression`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-named-expression/))
- Swap if/else branches ([`swap-if-else-branches`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/swap-if-else-branches/))
- Remove unnecessary else after guard condition ([`remove-unnecessary-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/remove-unnecessary-else/))
</issue_to_address>
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
||
# query may be empty strings | ||
# We want to skip the lookup in this case. | ||
if not query.strip(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this logic being duplicated in every data source. Can we not check this in plugins.item_candidates
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can't move this logic easily. Some metadataplugins do not require artist or title/album at all. We can't move the check one layer up as this would break plugins only depending on item.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this logic is present in all internal plugins. I also don't see a use case in beetcamp
where both artist
and title
/ album
are missing - I think we can safely handle this one layer up and remove the duplication.
When does fromfilename
plugin intercept these parameters and insert data from the filename?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have a look at chroma.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completely forgot about this one 😞
4f0beba
to
a280237
Compare
This reverts commit a27cf64.
Description
It is possible for an metadata lookup to be performed with an empty string for both
artist
andtitle/album
. This PR add handling for this edgecase for the metadata lookup ofmusibrainz
,spotify
,discogs
andbeatport
.Seems like the issue was not catched earlier, since the typehints were
not propagated correctly in the
metadata_plugin.item_candidates
function.closes #6060
#5965 might have helped here too