clustering

Functions

calc_perm_variance(pca, embeddings_df[, ...])

Calculates the variance explained for a PCA of the permuted data.

get_optimal_n_components(embeddings[, ...])

Calculates the optimal number of principal components to keep in a dimension reduction situation.

hdbscan_clustering(reduced[, ...])

Uses HDBSCAN to calculate clusters from the reduced data.

kmeans_clustering(reduced[, num_clusters, ...])

This function calculates clusters based on the reduced vectors.

reduce_dimensions_pca(embeddings[, dimensions])

Reduces the number of dimensions using PCA.

reduce_dimensions_umap(embeddings[, ...])

Uses UMAP to reduce the dimensionality of the embeddings.

shuffle(df)

Shuffles the data by each column or row for a pandas dataframe.

single_sample_t_test(sample[, population_stat])

Run a simple t test on a sample to see if it is significantly different from the population mean.