Module `VITAE.utils`

Functions

def reset_random_seeds(seed)

def get_embedding(z, dimred='umap', **kwargs)

Get low-dimensional embeddings for visualizations.

Parameters

z : np.array: $[N, d]$ The latent variables.
dimred : str, optional: 'pca', 'tsne', or umap'.
**kwargs :: Extra key-value arguments for dimension reduction algorithms.

Returns:

embed : np.array $[N, 2]$ The latent variables after dimension reduction.

def get_igraph(z, random_state=0)

Get igraph for running Leidenalg clustering.

Parameters

z : np.array: $[N, d]$ The latent variables.
random_state : int, optional: The random state.

Returns:

g : igraph The igraph object of connectivities.

def leidenalg_igraph(g, res, random_state=0)

Leidenalg clustering on an igraph object.

Parameters

g : igraph: The igraph object of connectivities.
res : float: The resolution parameter for Leidenalg clustering.
random_state : int, optional: The random state.

Returns

labels : np.array: $[N, ]$ The clustered labels.

def plot_clusters(embed_z, labels, plot_labels=False, path=None)

Plot the clustering results.

Parameters

embed_z : np.array: $[N, 2]$ The latent variables after dimension reduction.
labels : np.array: $[N, ]$ The clustered labels.
plot_labels : boolean, optional: Whether to plot text of labels or not.
path : str, optional: The path to save the figure.

def plot_marker_gene(expression, gene_name: str, embed_z, path=None)

Plot the marker gene.

Parameters

expression : np.array: $[N, ]$ The expression of the marker gene.
gene_name : str: The name of the marker gene.
embed_z : np.array: $[N, 2]$ The latent variables after dimension reduction.
path : str, optional: The path to save the figure.

def plot_uncertainty(uncertainty, embed_z, path=None)

Plot the uncertainty for all selected cells.

Parameters

uncertainty : np.array: $[N, ]$ The uncertainty of the all cells.
embed_z : np.array: $[N, 2]$ The latent variables after dimension reduction.
path : str, optional: The path to save the figure.

def DE_test(Y, X, gene_names, i_test, alpha: float = 0.05)

Differential gene expression test.

Parameters

Y : numpy.array: $n,$ the expression matrix.
X : numpy.array: $n,1+1+s$ the constant term, the pseudotime and the covariates.
gene_names : numpy.array: $n,$ the names of all genes.
i_test : numpy.array: The indices of covariates to be tested.
alpha : float, optional: The cutoff of p-values.

Returns

res_df : pandas.DataFrame: The test results of expressed genes with two columns, the estimated coefficients and the adjusted p-values.

def load_data(path, file_name, return_dict=False)

Load h5df data.

Parameters

path : str: The path of the h5 files.
file_name : str: The dataset name.
return_dict : boolean, optional: Whether to return the dict of the dataset or not.

Returns:

data : dict The dict containing count, grouping, etc. of the dataset. dd : anndata.AnnData The AnnData object of the dataset.

def compute_kernel(x, y, kernel='rbf', **kwargs)

Computes RBF kernel between x and y.

Parameters

x: Tensor
    Tensor with shape [batch_size, z_dim]
y: Tensor
    Tensor with shape [batch_size, z_dim]

Returns

The computed RBF kernel between x and y

def squared_distance(x, y)

Compute the pairwise euclidean distance.

Parameters

x : Tensor: Tensor with shape [batch_size, z_dim]
y : Tensor: Tensor with shape [batch_size, z_dim]

Returns

The pairwise euclidean distance between x and y.

def compute_mmd(x, y, kernel, **kwargs)

Computes Maximum Mean Discrepancy(MMD) between x and y.

Parameters

x : Tensor: Tensor with shape [batch_size, z_dim]
y : Tensor: Tensor with shape [batch_size, z_dim]
kernel : str: The kernel type used in MMD. It can be 'rbf', 'multi-scale-rbf' or 'raphy'.
**kwargs : dict: The parameters used in kernel function.

Returns

The computed MMD between x and y

def sample_z(args)

Samples from standard Normal distribution with shape [size, z_dim] and applies re-parametrization trick. It is actually sampling from latent space distributions with N(mu, var) computed in _encoder function.

Parameters

args : list: List of [mu, log_var] computed in _encoder function.

Returns

The computed Tensor of samples with shape [size, z_dim].

Classes

class Early_Stopping (warmup=0, patience=10, tolerance=0.001, relative=False, is_minimize=True)

The early-stopping monitor.

Expand source code

class Early_Stopping():
    '''
    The early-stopping monitor.
    '''
    def __init__(self, warmup=0, patience=10, tolerance=1e-3, 
            relative=False, is_minimize=True):
        self.warmup = warmup
        self.patience = patience
        self.tolerance = tolerance
        self.is_minimize = is_minimize
        self.relative = relative

        self.step = -1
        self.best_step = -1
        self.best_metric = np.inf

        if not self.is_minimize:
            self.factor = -1.0
        else:
            self.factor = 1.0

    def __call__(self, metric):
        self.step += 1
        
        if self.step < self.warmup:
            return False
        elif (self.best_metric==np.inf) or \
                (self.relative and (self.best_metric-metric)/self.best_metric > self.tolerance) or \
                ((not self.relative) and self.factor*metric<self.factor*self.best_metric-self.tolerance):
            self.best_metric = metric
            self.best_step = self.step
            return False
        elif self.step - self.best_step>self.patience:
            print('Best Epoch: %d. Best Metric: %f.'%(self.best_step, self.best_metric))
            return True
        else:
            return False