Skip to content
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
96 changes: 96 additions & 0 deletions machine_learning/k_medoids.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
"""
README, Author - Rohit Kumar Bansal (mailto:rohitbansal.dev@gmail.com)

Requirements:
- numpy
- matplotlib
Python:
- 3.5+
Inputs:
- X: 2D numpy array of features
- k: number of clusters
Usage:
1. Define k and X
2. Create initial medoids:
initial_medoids = get_initial_medoids(X, k, seed=0)
3. Run kmedoids:
medoids, cluster_assignment = kmedoids(
X, k, initial_medoids, maxiter=100, verbose=True
)
"""

import numpy as np
from matplotlib import pyplot as plt
from sklearn.metrics import pairwise_distances

Check failure on line 24 in machine_learning/k_medoids.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (I001)

machine_learning/k_medoids.py:22:1: I001 Import block is un-sorted or un-formatted

def get_initial_medoids(data, k, seed=None):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As there is no test file in this pull request nor any test function or class in the file machine_learning/k_medoids.py, please provide doctest for the function get_initial_medoids

Please provide return type hint for the function: get_initial_medoids. If the function does not return a value, please provide the type hint as: def function() -> None:

Please provide type hint for the parameter: data

Please provide descriptive name for the parameter: k

Please provide type hint for the parameter: k

Please provide type hint for the parameter: seed

rng = np.random.default_rng(seed)
n = data.shape[0]
indices = rng.choice(n, k, replace=False)
medoids = data[indices, :]
return medoids

def assign_clusters(data, medoids):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As there is no test file in this pull request nor any test function or class in the file machine_learning/k_medoids.py, please provide doctest for the function assign_clusters

Please provide return type hint for the function: assign_clusters. If the function does not return a value, please provide the type hint as: def function() -> None:

Please provide type hint for the parameter: data

Please provide type hint for the parameter: medoids

distances = pairwise_distances(data, medoids, metric='euclidean')
cluster_assignment = np.argmin(distances, axis=1)
return cluster_assignment

def revise_medoids(data, k, cluster_assignment):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As there is no test file in this pull request nor any test function or class in the file machine_learning/k_medoids.py, please provide doctest for the function revise_medoids

Please provide return type hint for the function: revise_medoids. If the function does not return a value, please provide the type hint as: def function() -> None:

Please provide type hint for the parameter: data

Please provide descriptive name for the parameter: k

Please provide type hint for the parameter: k

Please provide type hint for the parameter: cluster_assignment

new_medoids = []
for i in range(k):
members = data[cluster_assignment == i]
if len(members) == 0:
continue
# Compute total distance from each point to all others in cluster
total_distances = np.sum(pairwise_distances(members, members), axis=1)
medoid_index = np.argmin(total_distances)
new_medoids.append(members[medoid_index])
return np.array(new_medoids)

def compute_heterogeneity(data, k, medoids, cluster_assignment):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As there is no test file in this pull request nor any test function or class in the file machine_learning/k_medoids.py, please provide doctest for the function compute_heterogeneity

Please provide return type hint for the function: compute_heterogeneity. If the function does not return a value, please provide the type hint as: def function() -> None:

Please provide type hint for the parameter: data

Please provide descriptive name for the parameter: k

Please provide type hint for the parameter: k

Please provide type hint for the parameter: medoids

Please provide type hint for the parameter: cluster_assignment

heterogeneity = 0.0
for i in range(k):
members = data[cluster_assignment == i]
if len(members) == 0:
continue
distances = pairwise_distances(members, [medoids[i]])
heterogeneity += np.sum(distances**2)
return heterogeneity

def kmedoids(data, k, initial_medoids, maxiter=100, verbose=False):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As there is no test file in this pull request nor any test function or class in the file machine_learning/k_medoids.py, please provide doctest for the function kmedoids

Please provide return type hint for the function: kmedoids. If the function does not return a value, please provide the type hint as: def function() -> None:

Please provide type hint for the parameter: data

Please provide descriptive name for the parameter: k

Please provide type hint for the parameter: k

Please provide type hint for the parameter: initial_medoids

Please provide type hint for the parameter: maxiter

Please provide type hint for the parameter: verbose

medoids = initial_medoids.copy()
prev_assignment = None
for itr in range(maxiter):
cluster_assignment = assign_clusters(data, medoids)
medoids = revise_medoids(data, k, cluster_assignment)

if prev_assignment is not None and (prev_assignment == cluster_assignment).all():

Check failure on line 67 in machine_learning/k_medoids.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (E501)

machine_learning/k_medoids.py:67:89: E501 Line too long (89 > 88)
break

if verbose and prev_assignment is not None:
changed = np.sum(prev_assignment != cluster_assignment)
print(f"Iteration {itr}: {changed} points changed clusters")

prev_assignment = cluster_assignment.copy()

return medoids, cluster_assignment

# Optional plotting
def plot_clusters(data, medoids, cluster_assignment):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As there is no test file in this pull request nor any test function or class in the file machine_learning/k_medoids.py, please provide doctest for the function plot_clusters

Please provide return type hint for the function: plot_clusters. If the function does not return a value, please provide the type hint as: def function() -> None:

Please provide type hint for the parameter: data

Please provide type hint for the parameter: medoids

Please provide type hint for the parameter: cluster_assignment

ax = plt.axes(projection='3d')
ax.scatter(data[:,0], data[:,1], data[:,2], c=cluster_assignment, cmap='viridis')
ax.scatter(medoids[:,0], medoids[:,1], medoids[:,2], c='red', s=100, marker='x')
ax.set_xlabel("X")
ax.set_ylabel("Y")
ax.set_zlabel("Z")
ax.set_title("3D K-Medoids Clustering")
plt.show()

# Optional test
if __name__ == "__main__":
from sklearn import datasets
X = datasets.load_iris()['data']
k = 3
medoids = get_initial_medoids(X, k, seed=0)
medoids, clusters = kmedoids(X, k, medoids, maxiter=50, verbose=True)
plot_clusters(X, medoids, clusters)
Loading