Data

This module contains wrappers for pandas.DataFrame data for estimators.

Storage

This is a wrapper for pandas.DataFrame, which allows you to define dataset (data, labels/values, sample weights) for an estimator in a simple way.

class rep.data.storage.LabeledDataStorage(data, target=None, sample_weight=None, random_state=None, shuffle=False)[source]

Bases: object

This class implements an interface of data for estimators training. It contains data, labels/values and weights - all information to train a model.

Parameters:
  • data (pandas.DataFrame) – features, array-like of shape [n_samples, n_features]
  • target (None or numbers.Number or array-like) – labels/values for classification/regression (set None for the predictive methods)
  • sample_weight (None or numbers.Number or array-like) – weight (set None for predictive methods)
  • random_state (None or int or RandomState) – state for a pseudo random generator
  • shuffle (bool) – shuffle data or not
col(index)[source]

Return column from the data.

Parameters:index (None or str or list(str)) – names
Return type:pandas.Series or pandas.DataFrame
eval_column(expression)[source]

Evaluate some expression to obtain necessary columns for the data

Return type:numpy.array or str or
get_data(features=None)[source]

Return data.

Parameters:features (None or list[str]) – set of feature names (if None then use all features in data storage)
Return type:pandas.DataFrame
get_indices()[source]

Return data indices.

Return type:numpy.array
get_targets()[source]

Return sample target, labels or values.

Return type:numpy.array
get_weights(allow_nones=False)[source]

Return sample weights.

Return type:numpy.array