PyDimRed.utils package
Submodules
PyDimRed.utils.data module
Basic utility module to load and save csv files to different formats
- PyDimRed.utils.data.load_data_df(path) DataFrame
Load a csv file with a given file and format it as a pandas data frame
Args:
path (file | str | pathlib.Path): path to file
Returns:
data (pd.df)
PyDimRed.utils.dr_utils module
Dimensionality reduction utility functions
- PyDimRed.utils.dr_utils.path_to_names(paths: list[str]) list[str]
Small utility methods to extract filename without file extension from a list of paths
Args:
paths (list[str]): list of paths
Returns:
list[str]
- PyDimRed.utils.dr_utils.reduce_data_with_params(X: array, y: array, *method_params: dict, save: bool = False, base_save_folder: Path = None, n_jobs: int = 1)
Given a data set (X, y) and dict(s) containing parameter to value mapping, obtain all reduced data sets.
Args:
X (np.array): N x D dimensional data array
y (np.array): N dimensional labels array
*method_params (dict): dictionary input / set_params input to define DR models that will reduce data. The resulting parameters will be the cross product of all combinations of parameters in a dictonary, and the union of disjoint dictionaries. See sklearn.model_selection.ParameterGrid for more information.
save (bool): if True save data in disk
base_save_folder (pathlib.Path): directory where data will be saved if save is True
- n_jobs (int): Number of jobs for the joblib backend. Default = 1. Setting n_jobs = -1 is equivalent to
setting to the maximum number of jobs system can handle. Note that for small data sets or few parameter combinations it can be quicker to set n_jobs = 1
Returns:
tuple : list of reduced data, list of all paths (if save = False list of all names returned)
Examples:
Examples of values for methodParams: Single parameter
>>> param = {"method" : ["TSNE", "UMAP"] , "n_nbrs" : [20,60,80]} >>> reduce_data_with_params(X,y,param)
Multiple parameters
>>> params = [{"method" : ["TRIMAP"] , "n_nbrs" : [20,60,80] , "n_outliers" : [10,20,30]} , {"method" = ["PCA"]} ] >>> reduce_data_with_params(X,y,*params)