PyDimRed.utils package

Submodules

PyDimRed.utils.data module

Basic utility module to load and save csv files to different formats

PyDimRed.utils.data.load_data_df(path) DataFrame

Load a csv file with a given file and format it as a pandas data frame

Args:

path (file | str | pathlib.Path): path to file

Returns:

data (pd.df)

PyDimRed.utils.data.load_data_np(path) array

Load a csv file with a given file and format it as a numpy array

Args:

path (file | str | pathlib.Path): path to file

Returns:

data (np.array)

PyDimRed.utils.data.save_data(path, arr: array) None

Save a numpy array to a .csv file

Args:

path (file | str | pathlib.Path): path to file

Returns:

None

PyDimRed.utils.dr_utils module

Dimensionality reduction utility functions

PyDimRed.utils.dr_utils.path_to_names(paths: list[str]) list[str]

Small utility methods to extract filename without file extension from a list of paths

Args:

paths (list[str]): list of paths

Returns:

list[str]

PyDimRed.utils.dr_utils.reduce_data_with_params(X: array, y: array, *method_params: dict, save: bool = False, base_save_folder: Path = None, n_jobs: int = 1)

Given a data set (X, y) and dict(s) containing parameter to value mapping, obtain all reduced data sets.

Args:

X (np.array): N x D dimensional data array

y (np.array): N dimensional labels array

*method_params (dict): dictionary input / set_params input to define DR models that will reduce data. The resulting parameters will be the cross product of all combinations of parameters in a dictonary, and the union of disjoint dictionaries. See sklearn.model_selection.ParameterGrid for more information.

save (bool): if True save data in disk

base_save_folder (pathlib.Path): directory where data will be saved if save is True

n_jobs (int): Number of jobs for the joblib backend. Default = 1. Setting n_jobs = -1 is equivalent to

setting to the maximum number of jobs system can handle. Note that for small data sets or few parameter combinations it can be quicker to set n_jobs = 1

Returns:

tuple : list of reduced data, list of all paths (if save = False list of all names returned)

Examples:

Examples of values for methodParams: Single parameter

>>> param = {"method" : ["TSNE", "UMAP"] , "n_nbrs" : [20,60,80]}
>>> reduce_data_with_params(X,y,param)

Multiple parameters

>>> params = [{"method" : ["TRIMAP"] , "n_nbrs" : [20,60,80] , "n_outliers" : [10,20,30]} , {"method" = ["PCA"]} ]
>>> reduce_data_with_params(X,y,*params)

Module contents