Module VITAE.utils

Functions

def reset_random_seeds(seed)
def get_embedding(z, dimred='umap', **kwargs)

Get low-dimensional embeddings for visualizations.

Parameters

z : np.array
[N, d] The latent variables.
dimred : str, optional
'pca', 'tsne', or umap'.
**kwargs : 
Extra key-value arguments for dimension reduction algorithms.

Returns:

embed : np.array [N, 2] The latent variables after dimension reduction.

def get_igraph(z, random_state=0)

Get igraph for running Leidenalg clustering.

Parameters

z : np.array
[N, d] The latent variables.
random_state : int, optional
The random state.

Returns:

g : igraph The igraph object of connectivities.

def leidenalg_igraph(g, res, random_state=0)

Leidenalg clustering on an igraph object.

Parameters

g : igraph
The igraph object of connectivities.
res : float
The resolution parameter for Leidenalg clustering.
random_state : int, optional
The random state.

Returns

labels : np.array
[N, ] The clustered labels.
def plot_clusters(embed_z, labels, plot_labels=False, path=None)

Plot the clustering results.

Parameters

embed_z : np.array
[N, 2] The latent variables after dimension reduction.
labels : np.array
[N, ] The clustered labels.
plot_labels : boolean, optional
Whether to plot text of labels or not.
path : str, optional
The path to save the figure.
def plot_marker_gene(expression, gene_name: str, embed_z, path=None)

Plot the marker gene.

Parameters

expression : np.array
[N, ] The expression of the marker gene.
gene_name : str
The name of the marker gene.
embed_z : np.array
[N, 2] The latent variables after dimension reduction.
path : str, optional
The path to save the figure.
def plot_uncertainty(uncertainty, embed_z, path=None)

Plot the uncertainty for all selected cells.

Parameters

uncertainty : np.array
[N, ] The uncertainty of the all cells.
embed_z : np.array
[N, 2] The latent variables after dimension reduction.
path : str, optional
The path to save the figure.
def DE_test(Y, X, gene_names, i_test, alpha: float = 0.05)

Differential gene expression test.

Parameters

Y : numpy.array
n, the expression matrix.
X : numpy.array
n,1+1+s the constant term, the pseudotime and the covariates.
gene_names : numpy.array
n, the names of all genes.
i_test : numpy.array
The indices of covariates to be tested.
alpha : float, optional
The cutoff of p-values.

Returns

res_df : pandas.DataFrame
The test results of expressed genes with two columns, the estimated coefficients and the adjusted p-values.
def load_data(path, file_name, return_dict=False)

Load h5df data.

Parameters

path : str
The path of the h5 files.
file_name : str
The dataset name.
return_dict : boolean, optional
Whether to return the dict of the dataset or not.

Returns:

data : dict The dict containing count, grouping, etc. of the dataset. dd : anndata.AnnData The AnnData object of the dataset.

def compute_kernel(x, y, kernel='rbf', **kwargs)

Computes RBF kernel between x and y.

Parameters

x: Tensor
    Tensor with shape [batch_size, z_dim]
y: Tensor
    Tensor with shape [batch_size, z_dim]

Returns

The computed RBF kernel between x and y
def squared_distance(x, y)

Compute the pairwise euclidean distance.

Parameters

x : Tensor
Tensor with shape [batch_size, z_dim]
y : Tensor
Tensor with shape [batch_size, z_dim]

Returns

The pairwise euclidean distance between x and y.

def compute_mmd(x, y, kernel, **kwargs)

Computes Maximum Mean Discrepancy(MMD) between x and y.

Parameters

x : Tensor
Tensor with shape [batch_size, z_dim]
y : Tensor
Tensor with shape [batch_size, z_dim]
kernel : str
The kernel type used in MMD. It can be 'rbf', 'multi-scale-rbf' or 'raphy'.
**kwargs : dict
The parameters used in kernel function.

Returns

The computed MMD between x and y
 
def sample_z(args)

Samples from standard Normal distribution with shape [size, z_dim] and applies re-parametrization trick. It is actually sampling from latent space distributions with N(mu, var) computed in _encoder function.

Parameters

args : list
List of [mu, log_var] computed in _encoder function.

Returns

The computed Tensor of samples with shape [size, z_dim].

Classes

class Early_Stopping (warmup=0, patience=10, tolerance=0.001, relative=False, is_minimize=True)

The early-stopping monitor.

Expand source code
class Early_Stopping():
    '''
    The early-stopping monitor.
    '''
    def __init__(self, warmup=0, patience=10, tolerance=1e-3, 
            relative=False, is_minimize=True):
        self.warmup = warmup
        self.patience = patience
        self.tolerance = tolerance
        self.is_minimize = is_minimize
        self.relative = relative

        self.step = -1
        self.best_step = -1
        self.best_metric = np.inf

        if not self.is_minimize:
            self.factor = -1.0
        else:
            self.factor = 1.0

    def __call__(self, metric):
        self.step += 1
        
        if self.step < self.warmup:
            return False
        elif (self.best_metric==np.inf) or \
                (self.relative and (self.best_metric-metric)/self.best_metric > self.tolerance) or \
                ((not self.relative) and self.factor*metric<self.factor*self.best_metric-self.tolerance):
            self.best_metric = metric
            self.best_step = self.step
            return False
        elif self.step - self.best_step>self.patience:
            print('Best Epoch: %d. Best Metric: %f.'%(self.best_step, self.best_metric))
            return True
        else:
            return False