Module VITAE.utils
Functions
def reset_random_seeds(seed)def get_embedding(z, dimred='umap', **kwargs)-
Get low-dimensional embeddings for visualizations.
Parameters
z:np.array- [N, d] The latent variables.
dimred:str, optional- 'pca', 'tsne', or umap'.
**kwargs:- Extra key-value arguments for dimension reduction algorithms.
Returns:
embed : np.array [N, 2] The latent variables after dimension reduction.
def get_igraph(z, random_state=0)-
Get igraph for running Leidenalg clustering.
Parameters
z:np.array- [N, d] The latent variables.
random_state:int, optional- The random state.
Returns:
g : igraph The igraph object of connectivities.
def leidenalg_igraph(g, res, random_state=0)-
Leidenalg clustering on an igraph object.
Parameters
g:igraph- The igraph object of connectivities.
res:float- The resolution parameter for Leidenalg clustering.
random_state:int, optional- The random state.
Returns
labels:np.array- [N, ] The clustered labels.
def plot_clusters(embed_z, labels, plot_labels=False, path=None)-
Plot the clustering results.
Parameters
embed_z:np.array- [N, 2] The latent variables after dimension reduction.
labels:np.array- [N, ] The clustered labels.
plot_labels:boolean, optional- Whether to plot text of labels or not.
path:str, optional- The path to save the figure.
def plot_marker_gene(expression, gene_name: str, embed_z, path=None)-
Plot the marker gene.
Parameters
expression:np.array- [N, ] The expression of the marker gene.
gene_name:str- The name of the marker gene.
embed_z:np.array- [N, 2] The latent variables after dimension reduction.
path:str, optional- The path to save the figure.
def plot_uncertainty(uncertainty, embed_z, path=None)-
Plot the uncertainty for all selected cells.
Parameters
uncertainty:np.array- [N, ] The uncertainty of the all cells.
embed_z:np.array- [N, 2] The latent variables after dimension reduction.
path:str, optional- The path to save the figure.
def DE_test(Y, X, gene_names, i_test, alpha: float = 0.05)-
Differential gene expression test.
Parameters
Y:numpy.array- n, the expression matrix.
X:numpy.array- n,1+1+s the constant term, the pseudotime and the covariates.
gene_names:numpy.array- n, the names of all genes.
i_test:numpy.array- The indices of covariates to be tested.
alpha:float, optional- The cutoff of p-values.
Returns
res_df:pandas.DataFrame- The test results of expressed genes with two columns, the estimated coefficients and the adjusted p-values.
def load_data(path, file_name, return_dict=False)-
Load h5df data.
Parameters
path:str- The path of the h5 files.
file_name:str- The dataset name.
return_dict:boolean, optional- Whether to return the dict of the dataset or not.
Returns:
data : dict The dict containing count, grouping, etc. of the dataset. dd : anndata.AnnData The AnnData object of the dataset.
def compute_kernel(x, y, kernel='rbf', **kwargs)-
Computes RBF kernel between x and y.
Parameters
x: Tensor Tensor with shape [batch_size, z_dim] y: Tensor Tensor with shape [batch_size, z_dim]Returns
The computed RBF kernel between x and y def squared_distance(x, y)-
Compute the pairwise euclidean distance.
Parameters
x:Tensor- Tensor with shape [batch_size, z_dim]
y:Tensor- Tensor with shape [batch_size, z_dim]
Returns
The pairwise euclidean distance between x and y.
def compute_mmd(x, y, kernel, **kwargs)-
Computes Maximum Mean Discrepancy(MMD) between x and y.
Parameters
x:Tensor- Tensor with shape [batch_size, z_dim]
y:Tensor- Tensor with shape [batch_size, z_dim]
kernel:str- The kernel type used in MMD. It can be 'rbf', 'multi-scale-rbf' or 'raphy'.
**kwargs:dict- The parameters used in kernel function.
Returns
The computed MMD between x and y
def sample_z(args)-
Samples from standard Normal distribution with shape [size, z_dim] and applies re-parametrization trick. It is actually sampling from latent space distributions with N(mu, var) computed in
_encoderfunction.Parameters
args:list- List of [mu, log_var] computed in
_encoderfunction.
Returns
The computed Tensor of samples with shape [size, z_dim].
Classes
class Early_Stopping (warmup=0, patience=10, tolerance=0.001, relative=False, is_minimize=True)-
The early-stopping monitor.
Expand source code
class Early_Stopping(): ''' The early-stopping monitor. ''' def __init__(self, warmup=0, patience=10, tolerance=1e-3, relative=False, is_minimize=True): self.warmup = warmup self.patience = patience self.tolerance = tolerance self.is_minimize = is_minimize self.relative = relative self.step = -1 self.best_step = -1 self.best_metric = np.inf if not self.is_minimize: self.factor = -1.0 else: self.factor = 1.0 def __call__(self, metric): self.step += 1 if self.step < self.warmup: return False elif (self.best_metric==np.inf) or \ (self.relative and (self.best_metric-metric)/self.best_metric > self.tolerance) or \ ((not self.relative) and self.factor*metric<self.factor*self.best_metric-self.tolerance): self.best_metric = metric self.best_step = self.step return False elif self.step - self.best_step>self.patience: print('Best Epoch: %d. Best Metric: %f.'%(self.best_step, self.best_metric)) return True else: return False