Data¶

This module contains wrappers for pandas.DataFrame data for estimators.

Storage¶

This is a wrapper for pandas.DataFrame, which allows you to define dataset (data, labels/values, sample weights) for an estimator in a simple way.

class rep.data.storage.LabeledDataStorage(data, target=None, sample_weight=None, random_state=None, shuffle=False)[source]¶

Bases: object

This class implements an interface of data for estimators training. It contains data, labels/values and weights - all information to train a model.

Parameters:

data (pandas.DataFrame) – features, array-like of shape [n_samples, n_features]
target (None or numbers.Number or array-like) – labels/values for classification/regression (set None for the predictive methods)
sample_weight (None or numbers.Number or array-like) – weight (set None for predictive methods)
random_state (None or int or RandomState) – state for a pseudo random generator
shuffle (bool) – shuffle data or not

col(index)[source]¶

Return column from the data.

Parameters:	index (None or str or list(str)) – names
Return type:	pandas.Series or pandas.DataFrame

eval_column(expression)[source]¶

Evaluate some expression to obtain necessary columns for the data

Return type:	numpy.array or str or

get_data(features=None)[source]¶

Return data.

Parameters:	features (None or list[str]) – set of feature names (if None then use all features in data storage)
Return type:	pandas.DataFrame

get_indices()[source]¶

Return data indices.

Return type:	numpy.array

get_targets()[source]¶

Return sample target, labels or values.

Return type:	numpy.array

get_weights(allow_nones=False)[source]¶

Return sample weights.

Return type:	numpy.array