Skip to content

Conversation

@mohamedelabbas1996
Copy link
Contributor

Summary

This PR implements Class Masking as part of the post-processing framework.

List of Changes

TBD

Related Issues

TBD

Detailed Description

TBD

How to Test the Changes

TBD

Screenshots

TBD

Deployment Notes

TBD

Checklist

  • I have tested these changes appropriately.
  • I have added and/or modified relevant tests.
  • I updated relevant documentation or comments.
  • I have verified that this PR follows the project's coding standards.
  • Any dependent changes have already been merged to main.

…en creating a terminal classification with the rolled up taxon
@netlify
Copy link

netlify bot commented Oct 14, 2025

Deploy Preview for antenna-preview canceled.

Name Link
🔨 Latest commit 1b8700e
🔍 Latest deploy log https://app.netlify.com/projects/antenna-preview/deploys/68f0aec81240f300089fbf9f

@mohamedelabbas1996 mohamedelabbas1996 force-pushed the feat/postprocessing-class-masking branch from 0b77504 to 88ffba8 Compare October 14, 2025 19:08
@mihow mihow requested a review from Copilot October 15, 2025 03:20
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Implements class masking as a post-processing task that recalculates classifications by masking out classes not present in a provided taxa list and updates occurrences accordingly.

  • Adds ClassMaskingTask to the post-processing framework and registers it.
  • Filters and recalculates logits/scores per taxa list, creates new terminal classifications, and updates occurrences.
  • Minor logging update in job runner.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 10 comments.

File Description
ami/ml/post_processing/class_masking.py New class masking task and supporting functions to filter classifications by taxa list and recompute softmax.
ami/ml/post_processing/init.py Registers the new class_masking task module.
ami/jobs/models.py Improves log line to print only the task config for post-processing.

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +151 to +156
top_index = scores.index(max(scores))
top_taxon = category_map_with_taxa[top_index][
"taxon"
] # @TODO: This doesn't work if the taxon has never been classified
print("Top taxon: ", category_map_with_taxa[top_index]) # @TODO: REMOVE
print("Top index: ", top_index) # @TODO: REMOVE
Copy link

Copilot AI Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argmax is computed across all categories, so an excluded class can still be selected as the top taxon. If all categories are excluded, the current approach will select an arbitrary class. Restrict the selection to indices whose taxa are in taxa_in_list and handle the 'all-excluded' case gracefully (skip creating a new classification or mark appropriately). For example:

  • Build allowed_indices = [i for i, c in enumerate(category_map_with_taxa) if c['taxon'] in taxa_in_list]
  • Mask logits for non-allowed indices with -np.inf, recompute softmax over the allowed set, and if allowed_indices is empty, skip this classification.

Copilot uses AI. Check for mistakes.
Comment on lines +72 to +74
logger.info(f"Found {len(classifications)} terminal classifications with scores to update.")

if not classifications:
Copy link

Copilot AI Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

len(classifications) executes an extra COUNT query and if not classifications triggers a potentially expensive truthiness evaluation on a QuerySet. Use count() once for logging and a zero check (or exists() if you don't need the exact number) to avoid double evaluation, e.g., count = classifications.count(); if count == 0: ...

Suggested change
logger.info(f"Found {len(classifications)} terminal classifications with scores to update.")
if not classifications:
count = classifications.count()
logger.info(f"Found {count} terminal classifications with scores to update.")
if count == 0:

Copilot uses AI. Check for mistakes.
scores, logits = classification.scores, classification.logits
# Set scores and logits to zero if they are not in the filtered category indices

import numpy as np
Copy link

Copilot AI Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Importing inside the processing loop adds overhead each iteration. Move these imports to the module top and prefer using np.exp and np.sum directly for consistency, e.g., import numpy as np at the top and use np.exp / np.sum.

Copilot uses AI. Check for mistakes.
Comment on lines +138 to +139
from numpy import exp
from numpy import sum as np_sum
Copy link

Copilot AI Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Importing inside the processing loop adds overhead each iteration. Move these imports to the module top and prefer using np.exp and np.sum directly for consistency, e.g., import numpy as np at the top and use np.exp / np.sum.

Copilot uses AI. Check for mistakes.
"taxon"
] # @TODO: This doesn't work if the taxon has never been classified
print("Top taxon: ", category_map_with_taxa[top_index]) # @TODO: REMOVE
print("Top index: ", top_index) # @TODO: REMOVE
Copy link

Copilot AI Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid print statements in production code; use logger.debug(...) to keep logs consistent and configurable.

Suggested change
print("Top index: ", top_index) # @TODO: REMOVE
logger.debug(f"Top index: {top_index}")

Copilot uses AI. Check for mistakes.
assert new_classification.detection.occurrence is not None
occurrences_to_update.add(new_classification.detection.occurrence)

logging.info(
Copy link

Copilot AI Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses the root logging module instead of the module logger or the provided task_logger, making log output inconsistent. Replace with logger.info(...) or task_logger.info(...).

Suggested change
logging.info(
task_logger.info(

Copilot uses AI. Check for mistakes.
Comment on lines +21 to +27
# Get the classifications for the occurrence in the collection
classifications = Classification.objects.filter(
detection__occurrence=occurrence,
terminal=True,
algorithm=algorithm,
scores__isnull=False,
).distinct()
Copy link

Copilot AI Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You validate that logits is a list later and raise if not, but the query doesn't exclude classifications with null logits. Add logits__isnull=False to avoid unnecessary processing failures.

Copilot uses AI. Check for mistakes.
terminal=True,
# algorithm__task_type="classification",
algorithm=algorithm,
scores__isnull=False,
Copy link

Copilot AI Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mirror the logits presence guard here as well to avoid raising later when logits is missing: add logits__isnull=False to the filter.

Suggested change
scores__isnull=False,
scores__isnull=False,
logits__isnull=False,

Copilot uses AI. Check for mistakes.
updated_at=timestamp,
)
if new_classification.taxon is None:
raise (ValueError("Classification isn't registered yet. Aborting")) # @TODO remove or fail gracefully
Copy link

Copilot AI Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message is unclear for the actual failure mode. Clarify to something actionable, e.g., raise ValueError('Unable to determine top taxon after class masking (no allowed classes). Aborting.').

Suggested change
raise (ValueError("Classification isn't registered yet. Aborting")) # @TODO remove or fail gracefully
raise ValueError("Unable to determine top taxon after class masking (no allowed classes). Aborting.")

Copilot uses AI. Check for mistakes.
Comment on lines +196 to +205
if classifications_to_update:
logger.info(f"Bulk updating {len(classifications_to_update)} existing classifications")
Classification.objects.bulk_update(classifications_to_update, ["terminal", "updated_at"])
logger.info(f"Updated {len(classifications_to_update)} existing classifications")

if classifications_to_add:
# Bulk create the new classifications
logger.info(f"Bulk creating {len(classifications_to_add)} new classifications")
Classification.objects.bulk_create(classifications_to_add)
logger.info(f"Added {len(classifications_to_add)} new classifications")
Copy link

Copilot AI Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider wrapping the bulk_update and bulk_create in a single transaction to keep updates atomic and avoid partial state if an error occurs later (e.g., during occurrence updates). For example: with transaction.atomic(): ... bulk_update ... bulk_create ....

Copilot uses AI. Check for mistakes.
# Update the occurrence determinations
logger.info(f"Updating the determinations for {len(occurrences_to_update)} occurrences")
for occurrence in occurrences_to_update:
occurrence.save(update_determination=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mohamedelabbas1996 here is how I updated all of the determinations previously

Base automatically changed from feat/postprocessing-framework to main October 16, 2025 06:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants