Python SDK Technical Reference¶

This reference provides pertinent technical specifications regarding each TruEra Python SDK method and function. Each API call is listed in the navigator panel on the right under Table of contents in alphabetical order, organized by class.

`TrueraWorkspace (BaseTrueraWorkspace)` ¶

Workspace for Truera computations.

`init(self, connection_string, authentication, log_level=20, workspace_name='', **kwargs)` `special` ¶

Construct a new TruEra workspace.

Parameters:

Name	Type	Description	Default
`connection_string`	`str`	URL of the TruEra deployment. Defaults to None.	required
`authentication`	`TrueraAuthentication`	Credentials to connect to TruEra deployment. Defaults to None.	required
`log_level`	`int`	Log level (defaults to `logging.INFO`).	`20`
`**verify_cert`	`bool\|str`	When set to `False` certificate verification failures will be ignored (not recommended). A path to certificate file or directory can also be provided. If `verify_cert` is set to a path to a directory, the directory must have been processed using the c_rehash utility supplied with OpenSSL.	required

Exceptions:

Type	Description
`ValueError`	Raised if exactly one of connection_string and authentication is None.

`activate_client_setting(self, setting_name)` `inherited` ¶

Activates a setting for client side behavior.

Parameters:

Name	Type	Description	Default
`setting_name`	`str`	Client setting to activate.	required

`add_credential(self, name, secret, identity=None)` ¶

[Alpha] Add a new credential to TruEra product. The credential is saved in a secure manner and is used to authenticate with the data source to be able to perform various operations (read, filter, sample etc.).

Parameters:

Name	Description	Default
`name`	Friendly name of the credential.	required
`secret`	The secret to be stored.	required
`identity`	Identity portion of the secret. Not needed in all cases. Defaults to None.	`None`

Returns:

Type	Description
`Credential`	Returns an object with the credential name and id. The secret is not stored in this object.

`add_data(self, data, *, data_split_name, column_spec, model_output_context=None, is_production_data=False, **kwargs)` ¶

Add data by either creating a new split or appending to an existing split. The split will be set in the current context.

ColumnSpec/NLPColumnSpec and ModelOutputContext classes can be imported from truera.client.ingestion. Alternatively column_spec and model_output_context can be specified as Python dictionaries.

Parameters:

Name	Type	Description	Default
`data`	`Union[pd.DataFrame, 'Table']`	A pd.DataFrame or Table containing the data to be added.	required
`data_split_name`	`str`	The name of the split to be created or appended to.	required
`column_spec`	`Union[ColumnSpec, NLPColumnSpec, Mapping[str, Union[str, Sequence[str]]]]`	The ColumnSpec or NLPColumnSpec mapping column names in the data to corresponding data kind. Parameters include: id_col_name, timestamp_col_name, pre_data_col_names, post_data_col_names, prediction_col_names, label_col_names, extra_data_col_names, feature_influence_col_names, token_influence_col_names, tags_col_name, token_col_name, sentence_embeddings_col_name	required
`model_output_context`	`Optional[Union[ModelOutputContext, dict]]`	Contextual information about data involving a model, such as the model name and score type. This argument can be omitted in most cases, as the workspace infers the appropriate values from the context.	`None`

`add_data_collection(self, data_collection_name, pre_to_post_feature_map=None, provide_transform_with_model=None)` ¶

Creates and sets the current data collection to use for all operations in the workspace. Must specify feature mapping if some data transformation is being done.

Parameters:

Name	Type	Description	Default
`data_collection_name`	`str`	Name of the data_collection.	required
`pre_to_post_feature_map`	`Optional[Mapping[str, Sequence[str]]]`	One-to-many mapping from a single pre-processed human-readable feature name to post-processed model-readable feature names. Ignored if post-processed data is not provided for the data collection.	`None`
`provide_transform_with_model`	`Optional[bool]`	Specifies if transform function will be mentioned for each model using this data collection. If `True`, then feature mapping is via a `transform` function and if `False`, then separate pre-/post-transform data is expected in all splits. Defaults to `None`. Must be set, if feature map is provided for the data collection.	`None`

Exceptions:

Type	Description
`ValueError`	Raised if no project is associated with the current workspace. Use set_project to set the correct project.

Examples:

>>> tru.set_project("Project Name")
>>> feature_map = {"numeric_feature":["numeric_feature"],
                "numeric_feature":["bucket_1","bucket_2"],
                "categorical feature":["categorical_onehot_1", "categorical_onehot_2"]}
>>> tru.add_data_collection("Data Collection Name", pre_to_post_feature_map = feature_map)

`add_data_source(self, name, uri, credential=None, **kwargs)` ¶

Add a new data source in the system.

Parameters:

Name	Type	Description	Default
`name`	`str`	Friendly name of the data source.	required
`uri`	`str`	URI describing the location of the data source. For local files this can be file:///path/to/my/file or /path/to/my/file For files stored in Azure Storage Blobs the expected path is wasb://container@account.blob.core.windows.net/blob	required
`credential`	`Credential`	Provide the credential object if the data source requires authentication to read from it. Defaults to None.	`None`
`**format`	`str`	The format in which the file (local) or blob (AWS S3, Azure WASB etc.) are stored in.	required
`**first_row_is_header`	`bool`	For text based delimited files (csv, tsv etc.), indicates if the first row provides header information. Defaults to True.	required
`**column_delimiter`	`str`	For text based delimited files (csv, tsv etc.), provides the delimiter to separate column values. Defaults to ','.	required
`**quote_character`	`str`	For text based delimited files (csv, tsv etc.), if quotes are used provide the quote character. Defaults to '"'.	required
`**null_value`	`str`	For text based delimited files (csv, tsv etc.), the string that signifies null value. Defaults to 'null'.	required
`**empty_value`	`str`	For text based delimited files (csv, tsv etc.), the string that signifies empty value. Defaults to '""'.	required
`**date_format`	`str`	For text based delimited files (csv, tsv etc.), if any column has date time, provide the format string. Defaults to 'yyyy-MM-dd-HHssZZ'.	required
`**account_key`	`str`	For reading from Azure Storage Blob (WASB), provide the account_key to be used to read the blob. Not required if `credential` object is provided.	required
`**database_name`	`str`	For reading from sql databases, provide the database name to use. Required for MySQL or Hive data source.	required
`**table_name`	`str`	For reading from sql databases, provide the table name to use. Required for MySQL or Hive data source.	required

Exceptions:

Type	Description
`ValueError`	Raised if the current project is a local project.

Returns:

Type	Description
`Table`	Returns a Table object which allows interaction with the attached data.

`add_extra_data(self, extra_data, extras_col_names, id_col_name, **kwargs)` ¶

[Alpha] Add extra data to an existing data split.

Parameters:

Name	Type	Description	Default
`extra_data`	`Union[Table, str]`	A Table or URI of file containing the label data. Table: Use `add_data_source` or `get_data_source` to get a Table. You can optionally futher filter the Table by applying `filter`. URI: Used to load external data source into the system. This is similar to ingesting via `add_data_source` and Table, without the filters. The data in the external data source will be sub-sampled (random or first N) and ingested into the system. By default up to 5000 rows are ingested as the split. You can override this by specifying `sample_count=X`. Sampling is either done by using first N rows or sampling rows randomly (without replacement). Defaults to random. You can override this by specifying `sample_kind="first"`.	required
`extras_col_names`	`Union[str, Sequence[str]]`	Column name(s) for the extra data columns to be ingested.	required
`id_col_name`	`str`	Column name for the unique row identifier in the provided label_data. Used to match labels with corresponding data points.	required
`**credential`	`Credential`	Provide the credential object if the data source requires authentication to read from it. Defaults to None.	required
`**format`	`str`	The format in which the file (local) or blob (AWS S3, Azure WASB etc.) are stored in.	required
`**first_row_is_header`	`bool`	For text based delimited files (csv, tsv etc.), indicates if the first row provides header information. Defaults to True.	required
`**column_delimiter`	`str`	For text based delimited files (csv, tsv etc.), provides the delimiter to separate column values. Defaults to ','.	required
`**quote_character`	`str`	For text based delimited files (csv, tsv etc.), if quotes are used provide the quote character. Defaults to '"'.	required
`**null_value`	`str`	For text based delimited files (csv, tsv etc.), the string that signifies null value. Defaults to 'null'.	required
`**empty_value`	`str`	For text based delimited files (csv, tsv etc.), the string that signifies empty value. Defaults to '""'.	required
`**date_format`	`str`	For text based delimited files (csv, tsv etc.), if any column has date time, provide the format string. Defaults to 'yyyy-MM-dd HHssZZ'.	required
`**account_key`	`str`	For reading from Azure Storage Blob (WASB), provide the account_key to be used to read the blob. Not required if `credential` object is provided.	required
`**access_key_id`	`str`	For reading from a s3 bucket, provide the access key id to be used to read the blob. Not required if `credential` object is provided or the underlying deployment is in a role with access to the bucket.	required
`**secret_access_key`	`str`	For reading from a s3 bucket, provide the secret access key to be used to read the blob. Not required if `credential` object is provided or the underlying deployment is in a role with access to the bucket.	required
`**database_name`	`str`	For reading from MySQL database, provide the database name to use. Required for MySQL data source.	required
`**table_name`	`str`	For reading from MySQL database, provide the table name to use. Required for MySQL data source.	required
`**sample_count`	`int`	Maximum rows to use when creating the split. Defaults to 5000.	required
`**sample_kind`	`str`	Specifies the strategy to use while sub-sampling the rows. Defaults to "random".	required
`**timeout_seconds`	`int`	Number of seconds to wait for data source. Defaults to 300.	required

`add_feature_metadata(self, feature_description_map=None, group_to_feature_map=None, missing_values=None, force_update=False)` ¶

Upload metadata describing features and feature groupings to the server.

Parameters:

Name	Type	Description	Default
`feature_description_map`	`Optional[Mapping[str, str]]`	Map from pre-processed feature name, as provided in the data, to the description of the feature.	`None`
`group_to_feature_map`	`Optional[Mapping[str, Sequence[str]]]`	Grouping of pre-features for analysis purposes. A key of the map will be a name for the collection of pre-features mapped to. If given, all pre-features must appear in exactly one of the map's value lists.	`None`
`missing_values`	`Optional[Sequence[str]]`	List of strings to be registered as missing values when reading split data.	`None`
`force_update`	`bool`	Overwrite any existing feature metadata.	`False`

`add_labels(self, label_data, label_col_name, id_col_name, **kwargs)` ¶

[Alpha] Add labels to an existing data split.

Parameters:

Name	Type	Description	Default
`label_data`	`Union[Table, str]`	A Table or URI of file containing the label data. Table: Use `add_data_source` or `get_data_source` to get a Table. You can optionally futher filter the Table by applying `filter`. URI: Used to load external data source into the system. This is similar to ingesting via `add_data_source` and Table, without the filters. The data in the external data source will be sub-sampled (random or first N) and ingested into the system. By default up to 5000 rows are ingested as the split. You can override this by specifying `sample_count=X`. Sampling is either done by using first N rows or sampling rows randomly (without replacement). Defaults to random. You can override this by specifying `sample_kind="first"`.	required
`label_col_name`	`str`	Column name for the labels or ground truth in the provided label_data.	required
`id_col_name`	`str`	Column name for the unique row identifier in the provided label_data. Used to match labels with corresponding data points.	required
`**credential`	`Credential`	Provide the credential object if the data source requires authentication to read from it. Defaults to None.	required
`**format`	`str`	The format in which the file (local) or blob (AWS S3, Azure WASB etc.) are stored in.	required
`**first_row_is_header`	`bool`	For text based delimited files (csv, tsv etc.), indicates if the first row provides header information. Defaults to True.	required
`**column_delimiter`	`str`	For text based delimited files (csv, tsv etc.), provides the delimiter to separate column values. Defaults to ','.	required
`**quote_character`	`str`	For text based delimited files (csv, tsv etc.), if quotes are used provide the quote character. Defaults to '"'.	required
`**null_value`	`str`	For text based delimited files (csv, tsv etc.), the string that signifies null value. Defaults to 'null'.	required
`**empty_value`	`str`	For text based delimited files (csv, tsv etc.), the string that signifies empty value. Defaults to '""'.	required
`**date_format`	`str`	For text based delimited files (csv, tsv etc.), if any column has date time, provide the format string. Defaults to 'yyyy-MM-dd HHssZZ'.	required
`**account_key`	`str`	For reading from Azure Storage Blob (WASB), provide the account_key to be used to read the blob. Not required if `credential` object is provided.	required
`**access_key_id`	`str`	For reading from a s3 bucket, provide the access key id to be used to read the blob. Not required if `credential` object is provided or the underlying deployment is in a role with access to the bucket.	required
`**secret_access_key`	`str`	For reading from a s3 bucket, provide the secret access key to be used to read the blob. Not required if `credential` object is provided or the underlying deployment is in a role with access to the bucket.	required
`**database_name`	`str`	For reading from MySQL database, provide the database name to use. Required for MySQL data source.	required
`**table_name`	`str`	For reading from MySQL database, provide the table name to use. Required for MySQL data source.	required
`**sample_count`	`int`	Maximum rows to use when creating the split. Defaults to 5000.	required
`**sample_kind`	`str`	Specifies the strategy to use while sub-sampling the rows. Defaults to "random".	required
`**timeout_seconds`	`int`	Number of seconds to wait for data source. Defaults to 300.	required

`add_model(self, model_name, train_split_name=None, train_parameters=None)` ¶

Registers and adds a new model in TruEra. By default, the model is "virtual" in that it does not have an executable model object attached. To add the model object itself, see add_python_model().

Parameters:

Name	Type	Description	Default
`model_name`		Name of model to create	required
`train_split_name`	`Optional[str]`	The name of the train split of the model	`None`
`train_parameters`	`Optional[Mapping[str, Any]]`	Train parameters of the model. Ex. {'n_estimators": 10}"}	`None`

`add_model_error_influences(self, error_influence_data, score_type=None, *, data_split_name=None, background_split_name=None, id_col_name=None, timestamp_col_name=None, influence_type=None)` ¶

Adds error influence of given score type for the current model and split.

Parameters:

Name	Type	Description	Default
`error_influence_data`	`pd.DataFrame`	Feature influence data to add. Must be aligned with the pre-processed data of the given split.	required
`score_type`	`Optional[str]`	Score type of the influences, either `mean_absolute_error_for_classification` or `mean_absolute_error_for_regression`, depending on project type. Inferred if None. Defaults to None.	`None`
`id_col_name`	`Optional[str]`	Column name for the unique identifier of each data point.	`None`
`data_split_name`	`Optional[str]`	Data split that influences are associated with. If None, defaults to split set in the current context.	`None`
`background_split_name`	`Optional[str]`	Background data split that influences are computed against. If None, defaults to the base split of the data collection (if this is not explicitly set, it is an ingested split of type "all" or "train").	`None`
`influence_type`	`Optional[str]`	Influence algorithm used to generate influences. If influence type of project is set to "truera-qii", assumes that explanations are generated using truera-qii. If influence type of project is set to "shap", then `influence_type` must be passed in as one of ["tree-shap-tree-path-dependent", "tree-shap-interventional", "kernel-shap"].	`None`
`timestamp_col_name`	`Optional[str]`	Column name for the timestamp of each data point. Must be a column of type string or pd.DateTime. Defaults to None.	`None`

`add_model_feature_influences(self, feature_influence_data, *, id_col_name, data_split_name=None, background_split_name=None, timestamp_col_name=None, influence_type=None, score_type=None)` ¶

Adds feature influence for the current model and split. Assumes influences are calculated for the default score type of the project.

Parameters:

Name	Type	Description	Default
`feature_influence_data`	`pd.DataFrame`	Feature influence data to add. Must be aligned with the pre-processed data of the given split.	required
`id_col_name`	`str`	Column name for the unique identifier of each data point.	required
`data_split_name`	`Optional[str]`	Data split that influences are associated with. If None, defaults to split set in the current context.	`None`
`background_split_name`	`Optional[str]`	Background data split that influences are computed against. If None, defaults to the base split of the data collection (if this is not explicitly set, it is an ingested split of type "all" or "train").	`None`
`timestamp_col_name`	`Optional[str]`	Column name for the timestamp of each data point. Must be a column of type string or pd.DateTime. Defaults to None.	`None`
`influence_type`	`Optional[str]`	Influence algorithm used to generate influences. If influence type of project is set to "truera-qii", assumes that explanations are generated using truera-qii. If influence type of project is set to "shap", then `influence_type` must be passed in as one of ["tree-shap-tree-path-dependent", "tree-shap-interventional", "kernel-shap"].	`None`
`score_type`	`Optional[str]`	The score type to use when computing influences. If None, uses default score type of project. Defaults to None. For a list of valid score types, see `list_valid_score_types`.	`None`

`add_model_metadata(self, train_split_name=None, train_parameters=None, overwrite=False)` ¶

Add or update metadata for the current model in context.

Parameters:

Name	Type	Description	Default
`train_split_name`	`Optional[str]`	The name of the train split of the model	`None`
`train_parameters`	`Optional[Mapping[str, Any]]`	Train parameters of the model. Ex. {'n_estimators": 10}"}	`None`
`overwrite`	`bool`	Overwrite existing values (if exist).	`False`

`add_model_predictions(self, prediction_data, id_col_name=None, *, prediction_col_name=None, data_split_name=None, ranking_group_id_column_name=None, ranking_item_id_column_name=None, score_type=None)` ¶

Adds prediction data for the current model. Assumes predictions are calculated for the default score type of the project.

Parameters:

Name	Type	Description	Default
`prediction_data`	`Union[pd.DataFrame, Table]`	Prediction data to add.	required
`id_col_name`	`str`	Column name for the unique row identifier in the provided `prediction_data`. Used to match predictions with corresponding split data.	`None`
`prediction_col_name`	`Optional[str]`	Column name from which to pull prediction data. Only required if `prediction_data` is a Table or URI.	`None`
`data_split_name`	`Optional[str]`	Data split that predictions are associated with. If None, defaults to split set in the current context.	`None`
`ranking_group_id_column_name`	`Optional[str]`	Column name for group id for ranking projects.	`None`
`ranking_item_id_column_name`	`Optional[str]`	Column name for item id for ranking projects.	`None`
`score_type`	`Optional[str]`	Specifies the score type for prediction data, if provided. Defaults to None, in which case the score type of the project is used.	`None`

`add_nn_data_split(self, data_split_name, truera_wrappers, split_type='all', *, pre_data=None, label_data=None, label_col_name=None, id_col_name=None, extra_data_df=None)` ¶

[Alpha] Upload NN data split to TruEra server.

Examples:

# During NN Ingestion to add a split you will create wrappers
>>> from truera.client.nn.wrappers.autowrap import autowrap
>>> truera_wrappers = autowrap(...) # Use the appropriate NN Diagnostics Ingestion to create this

# Add the data split to the truera workspace
>>> tru.add_nn_data_split(
>>>     data_split_name="<split_name>",
>>>     truera_wrappers,
>>>     split_type="<split_type_train_or_test>"
>>> )

Parameters:

Name	Type	Description	Default
`data_split_name`	`str`	Name of the split to be uploaded.	required
`truera_wrappers`	`base.WrapperCollection`	A Base.WrapperCollection housing a base.Wrappers.SplitLoadWrapper that helps load data files from files. This must be implemented via subclassing the truera.client.nn.wrappers.Base.SplitLoadWrapper	required
`split_type`	`Optional[str]`	The type of the data split. Options are ["all", "train", "test", "validate", "oot", "prod", "custom"]. Defaults to "all".	`'all'`
`pre_data`	`Optional[Union[np.ndarray, pd.DataFrame]]`	Data.	`None`
`label_data`	`Optional[pd.DataFrame]`	Label data.	`None`
`label_col_name`	`Optional[str]`	The column name in `pre_data` containig label data	`None`
`id_col_name`	`Optional[str]`	The column name in `pre_data` containing record ID data	`None`
`extra_data_df`	`Optional[pd.DataFrame]`	Extra columns which are not used / consumed by the model, but could be used for other analysis like defining segments.	`None`

`add_nn_model(self, model_name, truera_wrappers, attribution_config, model=None, train_split_name=None, train_parameters=None, **kwargs)` ¶

[Alpha] Upload a NN model to TruEra server. The model is also attached to the workspace as the current model.

Examples:

# During NN Ingestion you will create two objects
>>> from truera.client.nn.client_configs import NLPAttributionConfiguration
>>> attr_config = NLPAttributionConfiguration(...)

>>> from truera.client.nn.wrappers.autowrap import autowrap
>>> truera_wrappers = autowrap(...) # Use the appropriate NN Diagnostics Ingestion to create this

# Add the model to the truera workspace
>>> tru.add_nn_model(
>>>     model_name="<model_name>",
>>>     truera_wrappers,
>>>     attr_config
>>> )

Parameters:

Name	Type	Description	Default
`model_name`	`str`	Name assigned to the model.	required
`truera_wrappers`	`base.WrapperCollection`	A set of wrappers to help truera run your model. The tutorial should help you get them.	required
`attribution_config`	`dict`	An attribution config containing attribution run parameters.	required
`model`	`Optional[Any]`	Your model object.	`None`
`train_split_name`	`Optional[str]`	The name of the train split of the model.	`None`
`train_parameters`	`Optional[Mapping[str, Any]]`	Train parameters of the model. Ex. {'n_estimators": 10}"}	`None`

Exceptions:

Type	Description
`ValueError`	Raised if no project is associated with the current workspace. Use set_project to set the correct project.
`ValueError`	Raised if workspace is not attached to a data_collection. Either attach workspace to a data-collection or provide the data_collection_name.
`ValueError`	Raised if the provided data_collection_name does not exist in the current project.

`add_packaged_python_model(self, model_name, model_dir, *, data_collection_name=None, train_split_name=None, train_parameters=None, verify_model=True, compute_predictions=None, compute_feature_influences=False, compute_for_all_splits=False)` ¶

Registers and adds a new model, along with a pre-serialized and packaged executable Python model object. The model is also attached to the workspace as the current model.

Parameters:

Name	Type	Description	Default
`model_name`	`str`	Name assigned to the model.	required
`model_dir`	`str`	Directory where packaged model is located.	required
`data_collection_name`	`Optional[str]`	Data collection to attach to the model, by default the data collection attached to the workspace will be used. Defaults to None.	`None`
`train_split_name`	`Optional[str]`	The name of the train split of the model	`None`
`train_parameters`	`Optional[Mapping[str, Any]]`	Train parameters of the model. Ex. {'n_estimators": 10}"}	`None`
`verify_model`	`bool`	Locally verify the model is packaged properly and can operate on existing split data. Defaults to True.	`True`
`compute_predictions`	`Optional[bool]`	Trigger computations of model predictions on the base/default split of the model's data collection, if such a split exists. Ignored for local models. Defaults to True when using `local_execution` for remote workspace.	`None`
`compute_feature_influences`	`bool`	Trigger computations of model feature influences on the base/default split of the model's data collection, if such a split exists. Ignored for local models.	`False`
`compute_for_all_splits`	`bool`	If `compute_predictions` and/or `compute_feature_influences`, triggers computations for all existing data splits (not solely the base/default split). Ignored for local models.	`False`

`add_production_data(self, data, *, column_spec, model_output_context=None, **kwargs)` ¶

Add production data.

ColumnSpec and ModelOutputContext classes can be imported from truera.client.ingestion. Alternatively column_spec and model_output_context can be specified as Python dictionaries.

Parameters:

Name	Type	Description	Default
`data`	`Union[pd.DataFrame, 'Table']`	A pd.DataFrame or Table containing the data to be added.	required
`column_spec`	`Union[ColumnSpec, NLPColumnSpec, Mapping[str, Union[str, Sequence[str]]]]`	The ColumnSpec mapping column names in the data to corresponding data kind. Parameters include: id_col_name, timestamp_col_name, pre_data_col_names, post_data_col_names, prediction_col_names, label_col_names, extra_data_col_names, feature_influence_col_names, token_influence_col_names, tags_col_name, token_col_name, sentence_embeddings_col_name	required
`model_output_context`	`Optional[Union[ModelOutputContext, dict]]`	Contextual information about data involving a model, such as the model name and score type. This argument can be omitted in most cases, as the workspace infers the appropriate values from the context.	`None`

`add_project(self, project, score_type, input_type='tabular', num_default_influences=None)` ¶

Adds and sets project to use for the current workspace environment. This will unset the rest of the context (data collection, data split, model, etc) if set prior.

Parameters:

Name	Type	Description	Default
`project`	`str`	Name of the project.	required
`score_type`	`str`	Scorer type configuration for the project. Options are ["logits", "probits", "classification", "regression"].	required
`input_type`	`Optional[str]`	Input data type for the project. Must be one of ["tabular", "time_series_tabular"]. Defaults to "tabular".	`'tabular'`
`num_default_influences`	`Optional[int]`	Number of influences used by default for most influence-requiring graphs, computations, etc. Note that this will take the first of the provided many from the data split --- therefore, shuffling data splits is generally advised prior to ingestion. If creating a project and left as None, then will be set as 1000.	`None`

Examples:

# Create a probits project
>>> tru.add_project("Project Name", score_type = "probits")

`add_python_model(self, model_name, model, transformer=None, *, additional_pip_dependencies=None, additional_modules=None, classification_threshold=None, train_split_name=None, train_parameters=None, verify_model=True, compute_predictions=None, compute_feature_influences=False, compute_for_all_splits=False, **kwargs)` ¶

Registers and adds a new model, including the executable model object provided. This method deduces the model framework to appropriately serialize and upload the model object to TruEra server. Models of supported frameworks can be passed directly. Supported Model Frameworks: sklearn, xgboost, catboost, lightgbm, pyspark (tree models only). If you cannot ingest your model via this function due to custom logic, feature transforms, etc., see create_packaged_python_model(). [ALPHA] For frameworks that are not yet supported, or for custom model implementations the prediction function for the model can be provided as the model. For binary classifiers, the prediction function should accept a pandas DataFrame as input and produce a pandas DataFrame as output with the class probabilities and [0, 1] as the column header. For regression models, the prediction function should accept a pandas DataFrame as input and produce the result as a pandas DataFrame with "Result" as the column header. All required dependencies to execute the prediction function should be provide as additional_pip_dependencies. For example:

`add_segment_group(self, name, segment_definitions)` ¶

[Alpha] Create a segment group where each segments is defined by an SQL expression.

Parameters:

Name	Type	Description	Default
`name`	`str`	The name of the segment group.	required
`segment_definitions`	`Mapping[str, str]`	A dictionary containing the name of the segment and an SQL expression that defines the segment. Supported expressions: `=` : filters for equality, ex: amount = 1000 `!=` : filters for inequality, ex: amount != 1000 `<` : filters for less-than, ex: amount < 1000 `<=` : filters for less-than-or-equal, ex: amount <= 1000 `>` : filters for greater-than, ex: amount > 1000 `>=` : filters for great-than-or-equal, ex: amount >= 1000 `NOT`: filters records if the inner condition is not true, ex: NOT(amount >= 1000) `AND`: filters records if both the conditions are true, ex: (amount >= 1000) AND (state = 'WA') `OR` : filters records if any of the two conditions is true, ex: (amount >= 1000) OR (state = 'WA') String literals should be within quotes (''), numeric literals should not have quotes(''). Left side of a binary expression should be either column name (without quotes) or one of the special keywords (see below for the list of supported special keywords). Right side of an expression should be a literal. For example, (amount < salary) is not a valid expression, as both left and right side of the expression are column-names. Supported special keywords: DATA_GROUND_TRUTH : filters using the data label, ex: _DATA_GROUND_TRUTH == 1 _MODEL: filters using the of the model where is "REGRESSION" for regression models or one of ["PROBITS", "LOGITS", "CLASSIFICATION"] for classification models, ex: MODEL_REGRESSION >= 5 _MODEL$<model_name>: filters using the <score_type> of the model with name <model_name> where <score_type> is "REGRESSION" for regression models or one of ["PROBITS", "LOGITS", "CLASSIFICATION"] for classification models, example to filter by "REGRESSION" score of "model_a": _MODEL_REGRESSION$model_a >= 5 _RANKING_GROUP_ID: filters using the ranking group id, ex: _RANKING_GROUP_ID = "group_id_1"	required

Examples:

>>> tru.set_project("Project Name")
>>> tru.set_data_collection("Data Collection Name")

# Add a segment group for Sex
>>> tru.add_segment_group(name = "Sex", segment_definitions = {"Male": "Sex == 'Male'", "Female": "Sex == 'Female'"})

# Add a segment group for Language at Home
>>> tru.add_segment_group("Language at Home", {"English": "LANX == 1", "Not English": "LANX == 2"})

`attach_packaged_python_model_object(self, model_object_dir, verify_model=True)` ¶

Attaches a pre-serialized and packaged executable model object to the current model, which must be virtual. This effectively "converts" the virtual model to a non-virtual one, as the system can now call the model to generate predictions.

Parameters:

Name	Type	Description	Default
`model_object_dir`	`str`	Directory where packaged model object is located.	required
`verify_model`	`bool`	Locally verify the model is packaged properly and can operate on existing split data. Defaults to True.	`True`

`attach_python_model_object(self, model_object, additional_pip_dependencies=None, verify_model=True)` ¶

Attaches the provided executable model object to the current model, which must be virtual. This effectively "converts" the virtual model to a non-virtual one, as the system can now call the model to generate predictions. Supported Model Frameworks: sklearn, xgboost, catboost, lightgbm, pyspark (tree models only). If you cannot ingest your model object via this function due to custom logic, feature transforms, etc., see attach_packaged_python_model_object().

[ALPHA] For frameworks that are not yet supported, or for custom model implementations the prediction
function for the model can be provided as the model.
For binary classifiers, the prediction function should accept a pandas DataFrame as input and produce
a pandas DataFrame as output with the class probabilities and [0, 1] as the column header.
For regression models, the prediction function should accept a pandas DataFrame as input and produce
the result as a pandas DataFrame with "Result" as the column header.
All required dependencies to execute the prediction function should be provide as additional_pip_dependencies.
For example:
```python
def predict(df):
    return pd.DataFrame(my_model.predict_proba(df, validate_features=False), columns=[0, 1])
tru.add_python_model("my_model", predict, additional_pip_dependencies=["xgboost==1.3.1", "pandas==1.1.1"])
```

Parameters:

Name	Type	Description	Default
`model_object`	`Any`	The Python model object or the prediction function to attach. Supported frameworks are catboost, lightgbm, sklearn and xgboost, and tree-based PySpark models. For prediction function, please see the description above.	required
`additional_pip_dependencies`	`Optional[Sequence[str]]`	List of pip dependencies required to execute the model object. If the model object is from a supported framework, the pip dependency for that framework is automatically inferred. If a prediction function is provided as the model, additional pip dependencies are not automatically inferred and must be explicitly provided. Defaults to None. Example: ["pandas==1.1.1", "numpy==1.20.1"]	`None`
`verify_model`	`bool`	[Alpha] Locally verify the model is packaged properly and can operate on existing split data. Defaults to True.	`True`

`cancel_scheduled_ingestion(self, workflow_id)` ¶

[Alpha] Cancel a scheduled ingestion.

Parameters:

Name	Type	Description	Default
`workflow_id`	`str`	The id of the scheduled ingestion workflow.	required

Returns:

Type	Description
`str`	Returns an object containing the canceled_on timestamp of a workflow.

`compute_all(self, computations=['predictions', 'feature influences', 'error influences'], models=None, data_splits=None, data_collection=None, num_influences=None)` `inherited` ¶

Executes and ingests all specified computations over models and data-splits in a data collection

Parameters:

Name	Type	Description	Default
`computations`	`Optional[Sequence[str]]`	Types of computations. Defaults to ["predictions", "feature influences", "error influences"].	`['predictions', 'feature influences', 'error influences']`
`models`	`Optional[Sequence[str]]`	Models to compute with. All models in data collection if None. Defaults to None.	`None`
`data_splits`	`Optional[Sequence[str]]`	Data-splits to compute with. All data-splits in data collection if None. Defaults to None.	`None`
`data_collection`	`Optional[str]`	Data collection to compute over. Uses data collection in context if None. Defaults to None.	`None`
`num_influences`	`Optional[int]`	Number of influences to calculate for influence computations. Uses project settings if None. Defaults to None.	`None`

`compute_error_influences(self, start=0, stop=None, score_type=None, system_data=False, by_group=False, num_per_group=None, wait=True, ingest=True)` ¶

Compute the error QIIs/shapley-values associated with the current data-split.

Parameters:

Name	Type	Description	Default
`start`	`Optional[int]`	The lower bound (inclusive) of the index of points to include. Defaults to 0.	`0`
`stop`	`Optional[int]`	The upper bound (exclusive) of the index of points to include. Defaults to None which is interpreted as the total number of points for local projects and the setting for "Number of default influences" for remote projects.	`None`
`score_type`	`Optional[str]`	The score type to use when computing error influences. If None, infers error score type based on project configuration. Defaults to None.	`None`
`system_data`	`bool`	Include system data (e.g. timestamps) if available in response. Defaults to False.	`False`
`by_group`	`bool`	For ranking projects, whether to retrieve data by group or not. Ignored for non-ranking projects. Defaults to False.	`False`
`num_per_group`	`Optional[int]`	For ranking projects and when `by_group` is True, the number of points per group to return.	`None`
`wait`	`bool`	Whether to wait for the job to finish. Defaults to True.	`True`
`ingest`	`bool`	Whether to ingest computed feature influences. Defaults to True.	`True`

Returns:

Type	Description
`pd.DataFrame`	The QIIs/shapley-values associated with the current data-split.

`compute_feature_influences(self, start=0, stop=None, score_type=None, system_data=False, by_group=False, num_per_group=None, wait=True, ingest=True)` ¶

Compute the QIIs/shapley-values associated with the current data-split.

Parameters:

Name	Type	Description	Default
`start`	`Optional[int]`	The lower bound (inclusive) of the index of points to include. Defaults to 0.	`0`
`stop`	`Optional[int]`	The upper bound (exclusive) of the index of points to include. Defaults to None which is interpreted as the total number of points for local projects and the setting for "Number of default influences" for remote projects.	`None`
`score_type`	`Optional[str]`	The score type to use when computing influences. If None, uses default score type of project. Defaults to None. For a list of valid score types, see `list_valid_score_types`.	`None`
`system_data`	`bool`	Include system data (e.g. timestamps) if available in response. Defaults to False.	`False`
`by_group`	`bool`	For ranking projects, whether to retrieve data by group or not. Ignored for non-ranking projects. Defaults to False.	`False`
`num_per_group`	`Optional[int]`	For ranking projects and when `by_group` is True, the number of points per group to return.	`None`
`wait`	`bool`	Whether to wait for the job to finish. Defaults to True.	`True`
`ingest`	`bool`	Whether to ingest computed feature influences. Defaults to True.	`True`

Returns:

Type	Description
`pd.DataFrame`	The QIIs/shapley-values associated with the current data-split.

`compute_predictions(self, start=0, stop=None, system_data=False, wait=True, ingest=True)` ¶

Compute predictions over the current data-split.

Parameters:

Name	Type	Description	Default
`start`	`Optional[int]`	The lower bound (inclusive) of the index of points to include. Defaults to 0.	`0`
`stop`	`Optional[int]`	The upper bound (exclusive) of the index of points to include. Defaults to None which is interpreted as the total number of rows.	`None`
`system_data`	`bool`	Include system data (e.g. timestamps) if available in response. Defaults to False.	`False`
`by_group`		For ranking projects, whether to retrieve data by group or not. Ignored for non-ranking projects. Defaults to False.	required
`num_per_group`		For ranking projects and when `by_group` is True, the number of points per group to return.	required
`wait`	`bool`	Whether to wait for the job to finish. Defaults to True.	`True`
`ingest`	`bool`	Whether to ingest predictions. Defaults to True.	`True`

Returns:

Type	Description
	The predictions for the current data-split.

`create_packaged_python_model(self, output_dir, model_obj=None, additional_pip_dependencies=None, additional_modules=None, model_path=None, model_code_files=None, **kwargs)` ¶

Creates a template of a packaged Python model object to be edited and then uploaded to TruEra server. Can be used to package either Python model objects, or serialized model data. This workflow should only be used for custom models or to debug model ingestion. To upload a models of a known framework, it is recommended to use the add_python_model() function.

Parameters:

Name	Type	Description	Default
`output_dir`	`str`	Path to the directory to create template. Cannot be an existing directory.	required
`model_obj`	`Optional[Any]`	The Python model object to package. Supported frameworks are catboost, lightgbm, sklearn, xgboost, and tree-based PySpark models.	`None`
`additional_pip_dependencies`	`Optional[Sequence[str]]`	List of pip dependencies required to execute the model. When model object is from a supported framework, pip dependency for that framework is automatically inferred. If a prediction function is provided as the model, additional pip dependencies are not automatically inferred and must be explicitly provided. Defaults to None. Example: ["pandas==1.1.1", "numpy==1.20.1"].	`None`
`additional_modules`	`Optional[Sequence[Any]]`	List of modules not available as pip packages required for the model. These must already be imported. Defaults to None.	`None`
`model_path`	`Optional[str]`	Path to a model file or directory. Can be a serialized model or a directory containing multiple files of serialized data. Ignored if `model_obj` is passed in.	`None`
`model_code_files`	`Optional[Sequence[str]]`	List of paths to additional files to be packaged with the model. Ignored if `model_obj` is passed in.	`None`

`deactivate_client_setting(self, setting_name)` `inherited` ¶

Deactivates a setting for client side behavior.

Parameters:

Name	Type	Description	Default
`setting_name`	`str`	Client setting to deactivate.	required

`delete_credential(self, name)` ¶

[Alpha] Removes a credential from the TruEra product.

Parameters:

Name	Type	Description	Default
`name`		Friendly name of the credential.	required

`delete_data_collection(self, data_collection_name=None, *, recursive=False)` ¶

Delete a data collection from the current TruEra workspace. This will only delete artifacts within the current location context (either local or remote). Note: Data collection can be deleted only after all the data splits in the data collection have been deleted.

Parameters:

Name	Type	Description	Default
`data_collection_name`	`Optional[str]`	Name of the data collection to be deleted. By default the currently set data collection will be deleted.	`None`
`recursive`	`bool`	Whether to delete any data splits as well in the data collection. Defaults to False.	`False`

`delete_data_source(self, name)` ¶

Delete a data source that was already created in the system.

Parameters:

Name	Type	Description	Default
`name`	`str`	The friendly name of the data source.	required

`delete_data_split(self, data_split_name=None, *, recursive=False)` ¶

Delete a data split from the current TruEra workspace. This will only delete artifacts within the current location context (either local or remote).

Parameters:

Name	Type	Description	Default
`data_split_name`	`Optional[str]`	Name of the data split to be deleted. By default the currently set data split will be deleted.	`None`
`recursive`	`bool`	Whether to delete any model tests associated with the data split. Defaults to False.	`False`

`delete_model(self, model_name=None, *, recursive=False)` ¶

Delete a model from the current TruEra workspace. This will only delete artifacts within the current location context (either local or remote).

Parameters:

Name	Type	Description	Default
`model_name`	`Optional[str]`	Name of the model to be deleted. By default the currently set model will be deleted.	`None`
`recursive`	`bool`	Whether to delete any model tests associated with the model. Defaults to False.	`False`

`delete_model_metadata(self)` ¶

Unset train_split_name and train_parameters for the current model in context.

`delete_project(self, project_name=None)` ¶

Delete a project from the current TruEra workspace. This will only delete artifacts within the current location context (either local or remote).

Parameters:

Name	Type	Description	Default
`project_name`	`Optional[str]`	Name of the project to be deleted. By default the currently set project will be deleted.	`None`

`delete_segment_group(self, name)` ¶

[Alpha] Delete a segment group.

Parameters:

Name	Type	Description	Default
`name`	`str`	The name of the segment group.	required

`get_client_setting_value(self, setting_name)` `inherited` ¶

Gets current value of a setting for client side behavior.

`get_credential_metadata(self, name)` ¶

[Alpha] Get metadata about a credential in the TruEra product. The credential details are not returned.

Parameters:

Name	Type	Description	Default
`name`		Friendly name of the credential.	required

Returns:

Type	Description
`Credential`	Returns an object with the credential name and id.

`get_data_collections(self)` ¶

Get all data-collections in the connected project.

Exceptions:

Type	Description
`ValueError`	Raised if the workspace isn't connected to any project.

Returns:

Type	Description
`Sequence[str]`	Name of data-collections in the project.

`get_data_source(self, name)` ¶

Get a data source that was already created in the system.

Parameters:

Name	Type	Description	Default
`name`	`str`	The friendly name of the data source.	required

Exceptions:

Type	Description
`ValueError`	Raised if the current project is a local project.

Returns:

Type	Description
`Table`	Returns a Table object which allows interaction with the attached data.

`get_data_sources(self)` ¶

Get list of data sources attached in the current project.

Exceptions:

Type	Description
`ValueError`	Raised if the current project is a local project.

`get_data_splits(self)` ¶

Get all data-splits in the connected data-collection.

Exceptions:

Type	Description
`ValueError`	Raised if the workspace isn't connected to any project.
`ValueError`	Raised if the workspace isn't connected to any data-collection.

Returns:

Type	Description
`Sequence[str]`	Name of data-splits in the project.

`get_default_performance_metrics(self)` ¶

Get the default performance metrics of the currently set project.

Returns:

Type	Description
`Sequence[str]`	Default performance metrics.

`get_error_influences(self, start=0, stop=None, score_type=None, system_data=False, by_group=False, num_per_group=None)` ¶

Get the error QIIs/shapley-values associated with the current data-split. Note that, if you set the start and stop, the number of records returned will not be the exact number requested but in the neighborhood of the start and stop limit provided.

Parameters:

Name	Type	Description	Default
`start`	`Optional[int]`	The lower bound (inclusive) of the index of points to include. Defaults to 0.	`0`
`stop`	`Optional[int]`	The upper bound (exclusive) of the index of points to include. Defaults to None which is interpreted as the total number of points for local projects and the setting for "Number of default influences" for remote projects.	`None`
`score_type`	`Optional[str]`	The score type of error influences to retrieve. If None, infers error score type based on project configuration. Defaults to None.	`None`
`system_data`	`bool`	Include system data (e.g. timestamps) if available in response. Defaults to False.	`False`
`by_group`	`bool`	For ranking projects, whether to retrieve data by group or not. Ignored for non-ranking projects. Defaults to False.	`False`
`num_per_group`	`Optional[int]`	For ranking projects and when `by_group` is True, the number of points per group to return.	`None`

Returns:

Type	Description
`pd.DataFrame`	The QIIs/shapley-values associated with the current data-split.

`get_explainer(self, base_data_split=None, comparison_data_splits=None)` ¶

Get the explainer associated with the TruEra workspace.

Parameters:

Name	Type	Description	Default
`base_data_split`	`Optional[str]`	The name of the data split to set as the base split for explainer operations.	`None`
`comparison_data_splits`	`Optional[Sequence[str]]`	The name(s) of the data splits to set as the comparison splits for explainer operations.	`None`

Returns:

Type	Description
`Explainer`	Explainer for current context of the TruEra workspace.

`get_feature_influences(self, start=0, stop=None, score_type=None, system_data=False, by_group=False, num_per_group=None)` ¶

Get the QIIs/shapley-values associated with the current data-split. Note that, if you set the start and stop, the number of records returned will not be the exact number requested but in the neighborhood of the start and stop limit provided.

Parameters:

Name	Type	Description	Default
`start`	`Optional[int]`	The lower bound (inclusive) of the index of points to include. Defaults to 0.	`0`
`stop`	`Optional[int]`	The upper bound (exclusive) of the index of points to include. Defaults to None which is interpreted as the project setting for "Number of default influences".	`None`
`score_type`	`Optional[str]`	The score type to use when computing influences. If None, uses default score type of project. Defaults to None. For a list of valid score types, see `list_valid_score_types`.	`None`
`system_data`	`bool`	Include system data (e.g. timestamps) if available in response. Defaults to False.	`False`
`by_group`	`bool`	For ranking projects, whether to retrieve data by group or not. Ignored for non-ranking projects. Defaults to False.	`False`
`num_per_group`	`Optional[int]`	For ranking projects and when `by_group` is True, the number of points per group to return.	`None`

Returns:

Type	Description
`pd.DataFrame`	The QIIs/shapley-values associated with the current data-split.

`get_feature_names(self)` `inherited` ¶

Get the feature names associated with the current data-collection.

Returns:

Type	Description
`Sequence[str]`	Feature names.

`get_influence_type(self)` ¶

Get the influence algorithm type of the currently set project.

`get_influences_background_data_split(self, data_collection_name=None)` ¶

Get the background data split used for computing feature influences.

Parameters:

Name	Type	Description	Default
`data_collection_name`	`Optional[str]`	(Optional) Name of the data collection. Defaults to the current data collection in context.	`None`

Returns:

Type	Description
`str`	Name of the background data split.

`get_ingestion_client(self)` ¶

[Alpha] Get the data ingestion client associated with the TruEra workspace. Valid only for "remote" workspace. The ingestion client can be used to pull data from different data sources into TruEra to perform analytics.

Exceptions:

Type	Description
`ValueError`	Raised if no project is associated with the current workspace. Use set_project to set the correct project.
`ValueError`	Raised if no data collection is associated with the current workspace. Use set_data_collection to set the correct data collection.
`ValueError`	Raised if the current project is a local project.

Returns:

Type	Description
`IngestionClient`	IngestionClient for current context of the TruEra workspace.

`get_maximum_model_runner_failure_rate(self)` ¶

Get the maximum model runner failure rate (fraction of points on which the model can fail for a model run to be considered successful) for the current project.

`get_model_metadata(self)` ¶

Get model metadata for the current model in context.

`get_model_threshold(self)` ¶

Gets the model threshold for the currently set model and score type in the TruEra workspace.

Returns:

Type	Description
`Optional[float]`	The model threshold.

`get_models(self)` ¶

Get all models in the connected project.

Exceptions:

Type	Description
`ValueError`	Raised if the workspace isn't connected to any project.

Returns:

Type	Description
`Sequence[str]`	Name of models in the project.

`get_nn_user_configs(self)` ¶

Get NN user configurations for project and model set in the current context.

Examples:

>>> from truera.client.nn.client_configs import NLPAttributionConfiguration
>>> attr_config = NLPAttributionConfiguration(
>>>     token_embeddings_layer=token_embeddings_layer_name,
>>>     token_embeddings_anchor=token_embeddings_layer_tensor_anchor,
>>>     n_output_neurons=n_output_neurons,
>>>     n_metrics_records=n_metrics_records,
>>>     rebatch_size=rebatch_size,)

# View the ingested config after add_nn_model in NN Ingestion...
>>> tru.get_nn_user_configs()

Returns:

Type	Description
`Union[AttributionConfiguration, RNNUserInterfaceConfiguration]`	NN user config.

`get_num_default_influences(self)` ¶

Get the number of influences computed by default of the currently set project.

Returns:

Type	Description
`int`	Number of default influences.

`get_num_internal_qii_samples(self)` ¶

Get the number of samples used internally in influence computations of the currently set project.

Returns:

Type	Description
`int`	Number of samples to be used internally for influence computations.

`get_predictions(self, start=0, stop=None, system_data=False, by_group=False, num_per_group=None)` ¶

Get the model predictions associated with the current data-split. Note that, if you set the start and stop, the number of records returned will not be the exact number requested but in the neighborhood of the start and stop limit provided.

Parameters:

Name	Type	Description	Default
`start`	`Optional[int]`	The lower bound (inclusive) of the index of points to include. Defaults to 0.	`0`
`stop`	`Optional[int]`	The upper bound (exclusive) of the index of points to include. Defaults to None which is interpreted as the total number of points for local projects and the setting for "Number of default influences" for remote projects.	`None`
`system_data`	`bool`	Include system data (e.g. timestamps) if available in response. Defaults to False.	`False`
`by_group`	`bool`	For ranking projects, whether to retrieve data by group or not. Ignored for non-ranking projects. Defaults to False.	`False`
`num_per_group`	`Optional[int]`	For ranking projects and when `by_group` is True, the number of points per group to return.	`None`

Returns:

Type	Description
`pd.DataFrame`	The model predictions associated with the current data-split.

`get_projects(self)` ¶

Get all projects accessible by current user for the current workspace environment.

`get_ranking_k(self)` ¶

Gets the ranking k for the current project.

Returns:

Type	Description
`int`	Ranking k.

`get_scheduled_ingestion(self, workflow_id)` ¶

[Alpha] Get the metadata about a scheduled ingestion from a workflow_id

Parameters:

Name	Type	Description	Default
`workflow_id`	`str`	The id of the scheduled ingestion workflow.	required

Returns:

Type	Description
`GetScheduleResponse`	Returns an object containing the request_template, schedule, and run_results of a workflow.

`get_segment_groups(self)` ¶

[Alpha] Get all segment groups associated with the current TruEra workspace, along with their respective segments.

Returns:

Type	Description
`Mapping[str, Mapping[str, str]]`	Mapping of segment group names to the corresponding segment group definition.

`get_xs(self, start=0, stop=None, extra_data=False, system_data=False, by_group=False, num_per_group=None)` ¶

Get the inputs/data/x-values associated with the current data-split. Note that, if you set the start and stop, the number of records returned will not be the exact number requested but in the neighborhood of the start and stop limit provided.

Parameters:

Name	Type	Description	Default
`start`	`Optional[int]`	The lower bound (inclusive) of the index of points to include. Defaults to 0.	`0`
`stop`	`Optional[int]`	The upper bound (exclusive) of the index of points to include. Defaults to None which is interpreted as the total number of points for local projects and the setting for "Number of default influences" for remote projects.	`None`
`extra_data`	`bool`	Include extra data columns in the response. Defaults to False.	`False`
`system_data`	`bool`	Include system data (e.g. timestamps) if available in response. Defaults to False.	`False`
`by_group`	`bool`	For ranking projects, whether to retrieve data by group or not. Ignored for non-ranking projects. Defaults to False.	`False`
`num_per_group`	`Optional[int]`	For ranking projects and when `by_group` is True, the number of points per group to return.	`None`

Returns:

Type	Description
`pd.DataFrame`	The inputs/data/x-values associated with the current data-split.

`get_ys(self, start=0, stop=None, system_data=False, by_group=False, num_per_group=None)` ¶

Get the targets/y-values associated with the current data-split. Note that, if you set the start and stop, the number of records returned will not be the exact number requested but in the neighborhood of the start and stop limit provided.

Parameters:

Name	Type	Description	Default
`start`	`Optional[int]`	The lower bound (inclusive) of the index of points to include. Defaults to 0.	`0`
`stop`	`Optional[int]`	The upper bound (exclusive) of the index of points to include. Defaults to None which is interpreted as the total number of points for local projects and the setting for "Number of default influences" for remote projects.	`None`
`system_data`	`bool`	Include system data (e.g. timestamps) if available in response. Defaults to False.	`False`
`by_group`	`bool`	For ranking projects, whether to retrieve data by group or not. Ignored for non-ranking projects. Defaults to False.	`False`
`num_per_group`	`Optional[int]`	For ranking projects and when `by_group` is True, the number of points per group to return.	`None`

Returns:

Type	Description
`pd.DataFrame`	The targets/y-values associated with the current data-split.

`get_ys_pred(self, start=0, stop=None, system_data=False, by_group=False, num_per_group=None, wait=True)` ¶

Get the model predictions associated with the current data-split. Note that, if you set the start and stop, the number of records returned will not be the exact number requested but in the neighborhood of the start and stop limit provided.

Parameters:

Name	Type	Description	Default
`start`	`Optional[int]`	The lower bound (inclusive) of the index of points to include. Defaults to 0.	`0`
`stop`	`Optional[int]`	The upper bound (exclusive) of the index of points to include. Defaults to None which is interpreted as the total number of points for local projects and the setting for "Number of default influences" for remote projects.	`None`
`system_data`	`bool`	Include system data (e.g. timestamps) if available in response. Defaults to False.	`False`
`by_group`	`bool`	For ranking projects, whether to retrieve data by group or not. Ignored for non-ranking projects. Defaults to False.	`False`
`num_per_group`	`Optional[int]`	For ranking projects and when `by_group` is True, the number of points per group to return.	`None`
`wait`	`bool`	Whether to wait for the job to finish. Defaults to True.	`True`

Returns:

Type	Description
`pd.DataFrame`	The model predictions associated with the current data-split.

`list_monitoring_tables(self)` ¶

lists monitoring tables relevant to current project

Returns:

Type	Description
`str`	Returns a json of objects containing the monitoring tables for a project

`list_performance_metrics(self)` ¶

Lists the available metrics that can be supplied to compute performance, and be set as the project default.

Returns:

Type	Description
`Sequence[str]`	Available metrics.

`list_scheduled_ingestions(self, last_key=None, limit=50)` ¶

[Alpha] List workflows.

Parameters:

Name	Type	Description	Default
`last_key`	`str`	The last id to fetch workflows after. Defaults to None.	`None`
`limit`	`int`	The number of workflows to fetch. Defaults to 50.	`50`

Returns:

Type	Description
`str`	Returns a list of objects containing the workflow_id and active state of all workflows.

`list_valid_score_types(self)` `inherited` ¶

List the valid score types that can be set for the currently set project.

Returns:

Type	Description
`Sequence[str]`	Valid score types.

`register_schema(self, schema)` ¶

Register a schema for streaming.

Exceptions:

Type	Description
`ValueError`	Raised on validation error.

`schedule_existing_data_split(self, split_name, cron_schedule, override_split_name=None, append=True)` ¶

[Alpha] Schedule a new scheduled ingestion based off an existing split.

Parameters:

Name Type Description Default

split_name

str

The name of an already materialized split.

required

cron_schedule

str

The schedule for the periodic ingestion. Follows cron unix format: ┌───────────── minute (0 - 59) │ ┌───────────── hour (0 - 23) │ │ ┌───────────── day of the month (1 - 31) │ │ │ ┌───────────── month (1 - 12) │ │ │ │ ┌───────────── day of the week (0 - 6) (Sunday to Saturday; │ │ │ │ │ 7 is also Sunday on some systems) │ │ │ │ │ │ │ │ │ │

For example, to run a cron on the first of every month: "0 0 1 * *"

required

override_split_name

str

The name of the new splits to be created. For information on templating, see the schedule_ingestion function.

None

Returns:

Type	Description
`str`	A workflow_id for looking up the workflow.

`schedule_ingestion(self, raw_json, cron_schedule)` ¶

[Alpha] Schedule a new scheduled ingestion based off a JSON request tree.

Templating

Templating is supported for uris, split names, and filter expessions for scheduled ingestion. The scheduler passes in several variables and functions when evaluating the supported field, which can be accessed by using the syntax: ${}.

For example, to add the run date to the split name you can add the following suffix when submitting a split name to scheduled ingestion: "split_name_${formatDate("yyyy-MM-dd", now)}"

Supported variables: - now: The Date of the run. - last: The Date of the last run. - lastSuccess: The Date of the last successful run.

Supported functions: - uuid(): Generate a random v4 uuid. - formatDate(, ): Format the given Date as a string. Format follows Java SimpleDateFormat (https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html). - epochMillis(): The epoch millis of the given Date.

Parameters:

Name Type Description Default

raw_json

str

The JSON string representation of the request tree. To build a json string, you can use the serialize_split function.

required

cron_schedule

str

The schedule for the periodic ingestion. Follows cron unix format: ┌───────────── minute (0 - 59) │ ┌───────────── hour (0 - 23) │ │ ┌───────────── day of the month (1 - 31) │ │ │ ┌───────────── month (1 - 12) │ │ │ │ ┌───────────── day of the week (0 - 6) (Sunday to Saturday; │ │ │ │ │ 7 is also Sunday on some systems) │ │ │ │ │ │ │ │ │ │

For example, to run a cron on the first of every month: "0 0 1 * *"

required

Returns:

Type	Description
`str`	A workflow_id for looking up the workflow.

`serialize_split(self, split_name, override_split_name=None)` ¶

[Alpha] Build a request tree represented as a JSON string.

Templating

Templating is supported for uris, split names, and filter expressions for scheduled ingestion. The scheduler passes in several variables and functions when evaluating the supported field, which can be accessed by using the syntax: ${}.

For example, to add the run date to the split name you can add the following suffix when submitting a split name to scheduled ingestion: "split_name_${formatDate("yyyy-MM-dd", now)}"

Supported variables: - now: The Date of the run. - last: The Date of the last run. - lastSuccess: The Date of the last successful run.

Supported functions: - uuid(): Generate a random v4 uuid. - formatDate(, ): Format the given Date as a string. Format follows Java SimpleDateFormat (https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html). - epochMillis(): The epoch millis of the given Date.

Parameters:

Name	Type	Description	Default
`split_name`	`str`	The name of an already materialized split.	required
`override_split_name`	`str`	The optional name to replace the original split name. For information on templating, see the schedule_ingestion function.	`None`

Returns:

Type	Description
`str`	A JSON string representing the request tree that can be used for scheduled ingestion.

`set_as_protected_segment(self, segment_group_name, segment_name)` ¶

Sets the provided segment as a "protected" segment. This enables fairness analysis for this segment.

Parameters:

Name	Type	Description	Default
`segment_group_name`		Name of segment group.	required
`segment_name`	`str`	Name of segment in provided segment group.	required

Examples:

>>> tru.set_project("Project Name")
>>> tru.set_data_collection("Data Collection Name")

# Add a segment group for Sex
>>> tru.add_segment_group(name = "Sex", segment_definitions = {"Male": "Sex == 'Male'", "Female": "Sex == 'Female'"})

# Set the Female Segment as a Protected Segment
>>> tru.set_as_protected_segment(segment_group_name = "Sex", segment_name = "Female")

`set_data_collection(self, data_collection_name)` ¶

Set the current data collection to use for all operations in the workspace. This will also unset the current model if it is not associated with the provided data collection.

Parameters:

Name	Type	Description	Default
`data_collection_name`	`str`	Name of the data_collection. If None, will unset the data collection.	required

Exceptions:

Type	Description
`ValueError`	Raised if no project is associated with the current workspace. Use set_project to set the correct project.
`ValueError`	Raised if there is no such data_collection in the project.

`set_data_split(self, data_split_name)` ¶

Set the current data split to use for all operations in the current workspace.

Parameters:

Name	Type	Description	Default
`data_split_name`	`str`	Name of the data_split. If None, will unset the data split.	required

Exceptions:

Type	Description
`ValueError`	Raised if no project is associated with the current workspace. Use set_project to set the correct project.
`ValueError`	Raised if no data_collection is associated with the current workspace. Use set_data_collection to set the correct data_collection.
`ValueError`	Raised if there is no such data_split in the data_collection.

`set_default_performance_metrics(self, performance_metrics)` ¶

Set the default performance metrics of the currently set project.

Parameters:

Name	Type	Description	Default
`performance_metrics`	`Sequence[str]`	Performance metrics to use by default.	required

`set_influence_type(self, algorithm)` ¶

Set the influence algorithm type of the currently set project.

Parameters:

Name	Type	Description	Default
`algorithm`	`str`	Algorithm type. Options are ["truera-qii", "shap"]. Defaults to "truera-qii" if available, and otherwise "shap".	required

`set_influences_background_data_split(self, data_split_name, data_collection_name=None)` ¶

Set the background data split used for computing feature influences.

Parameters:

Name	Type	Description	Default
`data_split_name`	`str`	Name of the data split.	required
`data_collection_name`	`Optional[str]`	(Optional) Name of the data collection. Defaults to the current data collection in context.	`None`

`set_maximum_model_runner_failure_rate(self, maximum_model_runner_failure_rate)` ¶

Sets the maximum model runner failure rate (fraction of points on which the model can fail for a model run to be considered successful) for the current project.

Parameters:

Name	Type	Description	Default
`maximum_model_runner_failure_rate`	`float`	Maximum failure rate. Must be in [0, 1). By default, it is set to 0.	required

`set_model(self, model_name)` ¶

Set the current model to use for all operations in the current workspace. This will also change the data collection to the one corresponding to the provided model if different than the priorly set data collection.

Parameters:

Name	Type	Description	Default
`model_name`	`str`	Name of the model. If None, will unset the model.	required

Exceptions:

Type	Description
`ValueError`	Raised if no project is associated with the current workspace. Use set_project to set the correct project.
`ValueError`	Raised if there is no such model in the project.

`set_model_execution(self, environment)` ¶

Set the environment (either local or remote) to execute models in.

Parameters:

Name	Type	Description	Default
`environment`	`str`	Environment to execute models in. Either "local" or "remote".	required

`set_num_default_influences(self, num_default_influences)` ¶

Set the number of influences computed by default of the currently set project.

Parameters:

Name	Type	Description	Default
`num_default_influences`	`int`	Number of influences used by default for most influence-requiring graphs, computations, etc. Note that this will take the first of the provided many from the data split --- therefore, shuffling data splits is generally advised prior to ingestion.	required

`set_num_internal_qii_samples(self, num_samples)` ¶

Set the number of samples used internally in influence computations of the currently set project.

Parameters:

Name	Type	Description	Default
`num_samples`	`int`	Number of samples to be used internally for influence computations.	required

`set_project(self, project)` ¶

Set the current project to use for the current workspace environment. This will unset the rest of the context (data collection, data split, model, etc.) if set prior.

Parameters:

Name	Type	Description	Default
`project`	`str`	Name of the project.	required

Exceptions:

Type	Description
`ValueError`	Raised if the project does not exist.

`set_ranking_k(self, ranking_k)` ¶

Sets the ranking k for the current project.

Parameters:

Name	Type	Description	Default
`ranking_k`	`int`	Must be in >= 0.	required

`set_score_type(self, score_type)` ¶

Set the score type of the currently set project.

Parameters:

Name	Type	Description	Default
`score_type`	`str`	Scorer type configuration. Options are ["logits", "probits", "classification", "regression", None]. Defaults to None.	required

`update_model_threshold(self, classification_threshold)` ¶

Update the classification threshold for the model associated with the TruEra workspace. A model score (probits, logits) that is greater than or equal to the threshold is assigned a positive classification outcome.

Parameters:

Name	Type	Description	Default
`classification_threshold`	`float`	New threshold to update. Ignored for regression models.	required

`update_nn_user_config(self, config)` ¶

Update NN user configurations for project and model set in the current context.

Examples:

# If you need to make changes to the attributions config after you have run add_nn_model in NN Ingestion...
>>> from truera.client.nn.client_configs import NLPAttributionConfiguration
>>> updated_attr_config = NLPAttributionConfiguration(
>>>     token_embeddings_layer=token_embeddings_layer_name,
>>>     token_embeddings_anchor=token_embeddings_layer_tensor_anchor,
>>>     n_output_neurons=n_output_neurons,
>>>     n_metrics_records=n_metrics_records,
>>>     rebatch_size=rebatch_size,
>>> )

>>> tru.update_nn_user_config(updated_attr_config)
>>> tru.get_nn_user_configs() # Will return the updated attr_config

Parameters:

Name	Type	Description	Default
`config`	`Union[AttributionConfiguration, RNNUserInterfaceConfiguration]`	Config to set.	required

`verify_nn_wrappers(self, *, clf, attr_config=None, truera_wrappers)` ¶

Validates that all wrappers and methods are well formed.

Examples:

# During NN Ingestion you will create two objects
>>> from truera.client.nn.client_configs import NLPAttributionConfiguration
>>> attr_config = NLPAttributionConfiguration(...)

>>> from truera.client.nn.wrappers.autowrap import autowrap
>>> truera_wrappers = autowrap(...) # Use the appropriate NN Diagnostics Ingestion to create this

# Check if ingestion is set up correctly
>>> tru.verify_nn_wrappers(
>>>     clf=model,
>>>     attr_config=attr_config,
>>>     truera_wrappers=truera_wrappers
>>> )

Parameters:

Name	Type	Description	Default
`clf`	`NNBackend.Model`	The model object.	required
`truera_wrappers`	`Optional[base.WrapperCollection]`	A collection of wrappers	required

`verify_packaged_model(self, model_path)` ¶

Locally verifies a packaged Python model by loading the model and, if available, running it on split data ingested into the TruEra system. The model must already be packaged, e.g. via create_packaged_python_model(). The project and data collection for the model must also be set in the current workspace context. This function assumes that it is running an environment with any model dependencies/packages installed.

Parameters:

Name	Type	Description	Default
`model_path`	`str`	Path to packaged model directory.	required

`BasicAuthentication (TrueraAuthentication)` ¶

Basic authentication for BaseTrueraWorkspace.

`init(self, username, password)` `special` ¶

Construct BasicAuth for BaseTrueraWorkspace.

Parameters:

Name	Type	Description	Default
`username`	`str`	Username for the basic auth credentials.	required
`password`	`str`	Password for the basic auth credentials.	required

Note: Most users will use TokenAuthentication rather than BasicAuthentication.

Examples:

# import BasicAuthentication and TruEraWorkspace
>>> from truera.client.truera_authentication import BasicAuthentication
>>> from truera.client.truera_workspace import TrueraWorkspace

# Create authentication object
>>> auth = BasicAuthentication(username="My Username", password="My Password")

# Create TruEra Workspace
>>> tru = TrueraWorkspace(connection_string="https://myconnectionstring", authentication=auth)

`ServiceAccountAuthentication (TrueraAuthentication)` ¶

Service Account Authentication for BaseTrueraWorkspace.

`init(self, client_id, client_secret, *, token_endpoint=None, additional_payload=None, verify_cert=True)` `special` ¶

Construct ServiceAccountAuth for BaseTrueraWorkspace.

Parameters:

Name	Type	Description	Default
`client_id`	`str`	Client ID from service account credentials.	required
`client_secret`	`str`	Client secret from service account credentials.	required
`token_endpoint`	`Optional[str]`	Optional override of the endpoint to retrieve token from.	`None`
`additional_payload`	`Optional[Mapping[str, str]]`	Optional payload to include in request to retrieve token.	`None`

`set_token_endpoint(self, endpoint, append_path=True, overwrite=False)` ¶

Set token endpoint. Appends '/oauth/token' to endpoint if append_path is True. Does nothing if token_endpoint is already defined unless overwrite is True.

`Token` ¶

`str(self)` `special` ¶

Support implicit usage of this class in str(Token) or f"{Token}".

`TokenAuthentication (TrueraAuthentication)` ¶

Token authentication for BaseTrueraWorkspace.

`init(self, token)` `special` ¶

Construct TokenAuth for BaseTrueraWorkspace.

Parameters:

Name	Type	Description	Default
`token`	`str`	Token containing credentials to authenticate with the services.	required

Note: Most users will use TokenAuthentication rather than BasicAuthentication.

Examples:

# import TokenAuthentication and TruEraWorkspace
>>> from truera.client.truera_authentication import TokenAuthentication
>>> from truera.client.truera_workspace import TrueraWorkspace

# Create authentication object
>>> auth = TokenAuthentication(token="My Token From the TruEra Web App")

# Create TruEra Workspace
>>> tru = TrueraWorkspace(connection_string="https://myconnectionstring", authentication=auth)

`Explainer (ABC)` ¶

`clear_segment(self)` ¶

Clears any set segments from all explainer operations.

Examples:

# Assuming `tru` is a `TrueraWorkspace`.
>>> explainer = tru.get_explainer("split1")
>>> explainer.set_segment("segment_group1", "segment1_in_segment_group1")

# This will return only the xs in "split1" filtered to the segment "segment1_in_segment_group1".
>>> explainer.get_xs()

>>> explainer.clear_segment()

# This will return all the xs in "split1".
>>> explainer.get_xs()

`compute_feature_influences(self, start=0, stop=None, score_type=None, system_data=False, by_group=False, num_per_group=None, wait=True)` ¶

Compute the QIIs/shapley-values for this explainer's currently set data split.

Parameters:

Name	Type	Description	Default
`start`	`Optional[int]`	The lower bound (inclusive) of the index of points to include. Defaults to 0.	`0`
`stop`	`Optional[int]`	The upper bound (exclusive) of the index of points to include. Defaults to None which is interpreted as the project setting for "Number of default influences".	`None`
`score_type`	`Optional[str]`	The score type to use when computing influences. If None, defaults to score type of project. Defaults to None. For a list of valid score types, see `tru.list_valid_score_types`.	`None`
`system_data`	`bool`	Include system data (e.g. timestamps) if available in response. Defaults to False.	`False`
`wait`	`bool`	Whether to wait for the job to finish. Defaults to True.	`True`

Returns:

Type	Description
`pd.DataFrame`	The QIIs/shapley-values.

Examples:

# Assuming `tru` is a `TrueraWorkspace`.
>>> tru.set_model("model1")
>>> explainer = tru.get_explainer("split1")

# This will return "model1"'s feature influences on "split1".
>>> explainer.compute_feature_influences()

`compute_feature_influences_for_data(self, pre_data, post_data=None, ys=None, score_type=None, comparison_post_data=None, num_internal_qii_samples=1000, algorithm='truera-qii')` ¶

Compute the QIIs/shapley-values for the provided data.

Parameters:

Name	Type	Description	Default
`pre_data`	`pd.DataFrame`	A pandas DataFrame containing the human-readable data for which to compute influences. If `post_data` is not specified, `pre_data` is assumed to be both human- and model-readable.	required
`post_data`	`Optional[pd.DataFrame]`	A pandas DataFrame containing the model-readable post-processed data that is aligned with the pre-processed data. Can be ignored if model-readable pre-processed data is provided. If providing different pre- and post-processed data, be sure the mapping between them adheres to the feature map of the data collection specified during the data collection's creation.	`None`
`ys`	`Optional[Union[np.ndarray, pd.Series]]`	Labels for which to compute influences if required by the provided `score_type`. Defaults to None.	`None`
`score_type`	`Optional[str]`	The score type to use when computing influences. If None, defaults to score type of project. Defaults to None. For a list of valid score types, see `tru.list_valid_score_types`.	`None`
`comparison_post_data`	`Optional[pd.DataFrame]`	The comparison data to use when computing influences. If None, defaults to a data split of the data collection of type "all" or "train" and failing that uses the base split currently set in this explainer. Defaults to None.	`None`
`num_internal_qii_samples`	`int`	Number of samples used internally in influence computations.	`1000`
`algorithm`	`str`	Algorithm to use during computation. Must be one of ["truera-qii", "tree-shap-tree-path-dependent", "tree-shap-interventional", "kernel-shap"]. Defaults to "truera-qii".	`'truera-qii'`

Returns:

Type	Description
`pd.DataFrame`	The QIIs/shapley-values.

Examples:

# Assuming `tru` is a `TrueraWorkspace`.
>>> tru.set_model("model1")
>>> explainer = tru.get_explainer("split1")
>>> pre_data = ...
>>> post_data = ...

# This will compute feature influences on `pre_data`/`post_data`. Note that the feature mapping from
# `pre_data` to `post_data` must be the same as that expected by "model1".
>>> explainer.compute_feature_influences_for_data(
        pre_data=pre_data,
        post_data=post_data,
        score_type="probits",
        comparison_post_data=post_data
    )

`compute_performance(self)` ¶

Computes performance metrics from labels and predictions. To see the list of available metrics, use list_performance_metrics

Examples:

# Set your project, data collection and model
>>> tru.set_project("Project Name")
>>> tru.set_data_collection("data collection name")
>>> tru.set_model("model v1")

# Get the explainer for the base split
>>> explainer = tru.get_explainer("train split name")

# Compute performance for the base split using the explainer object
>>> explainer.compute_performance(metric_type = "MAE", plot = False)

`get_base_data_split(self)` ¶

Get the base data split used by explainer.

Returns:

Type	Description
`str`	The name of the base data split.

`get_comparison_data_splits(self)` ¶

Gets the comparison data splits used by the explainer.

Returns:

Type	Description
`Sequence[str]`	The names of the comparison data splits.

`get_data_collection(self)` ¶

Get the data collection name used by explainer.

`get_feature_names(self)` ¶

Get the feature names associated with the current data-collection.

Returns:

Type	Description
`Sequence[str]`	Feature names.

`get_spline_fitter(self, start=0, stop=None)` ¶

Get the spline-fitter using the provided range of points to fit splines.

Parameters:

Name	Type	Description	Default
`start`	`int`	The lower bound (inclusive) of the index of points to use during spline fitting. Defaults to 0.	`0`
`stop`	`Optional[int]`	The upper bound (exclusive) of the index of points to use during spline fitting. Defaults to the number of Number of default influences for the project.	`None`

Returns:

Type	Description
`SplineFitter`	Spline-fitter.

Examples:

# Assuming `tru` is a `TrueraWorkspace`.
tru.set_model("model1")
explainer = tru.get_explainer("split1")

# Construct a spline fitter and use it to construct a GAM.
sf = explainer.get_spline_fitter("segment_group1", "segment1_in_segment_group1")
gam = sf.construct_gam(
    n_splines=10,
    spline_orders=3,
    constraints={"feature2": "monotonic_inc", "feature5": "monotonic_dec"}
)

# Add GAM model into TruEra.
tru.add_python_model("GAM from model1", gam)

`get_xs(self, start=0, stop=None, extra_data=False, system_data=False, by_group=False, num_per_group=None)` ¶

Get the inputs/data/x-values.

Parameters:

Name	Type	Description	Default
`start`	`Optional[int]`	The lower bound (inclusive) of the index of points to include. Defaults to 0.	`0`
`stop`	`Optional[int]`	The upper bound (exclusive) of the index of points to include. Defaults to None which is interpreted as the total number of points for local projects and the setting for "Number of default influences" for remote projects.	`None`
`extra_data`	`bool`	Include extra data columns in the response.	`False`
`system_data`	`bool`	Include system data columns (unique ID) in the response.	`False`
`by_group`	`bool`	For ranking projects, whether to retrieve data by group or not. Ignored for non-ranking projects. Defaults to False.	`False`
`num_per_group`	`Optional[int]`	For ranking projects and when `by_group` is True, the number of points per group to return.	`None`

Returns:

Type	Description
`pd.DataFrame`	The inputs/data/x-values.

`get_ys(self, start=0, stop=None, system_data=False, by_group=False, num_per_group=None)` ¶

Get the targets/y-values.

Parameters:

Name	Type	Description	Default
`start`	`Optional[int]`	The lower bound (inclusive) of the index of points to include. Defaults to 0.	`0`
`stop`	`Optional[int]`	The upper bound (exclusive) of the index of points to include. Defaults to None which is interpreted as the total number of points for local projects and the setting for "Number of default influences" for remote projects.	`None`
`system_data`	`bool`	Include system data columns (unique ID) in the response.	`False`
`by_group`	`bool`	For ranking projects, whether to retrieve data by group or not. Ignored for non-ranking projects. Defaults to False.	`False`
`num_per_group`	`Optional[int]`	For ranking projects and when `by_group` is True, the number of points per group to return.	`None`

Returns:

Type	Description
`pd.DataFrame`	The targets/y-values as a pd.DataFrame.

`get_ys_pred(self, start=0, stop=None, system_data=False, by_group=False, num_per_group=None, wait=True)` ¶

Get the model predictions.

Parameters:

Name	Type	Description	Default
`start`	`Optional[int]`	The lower bound (inclusive) of the index of points to include. Defaults to 0.	`0`
`stop`	`Optional[int]`	The upper bound (exclusive) of the index of points to include. Defaults to None which is interpreted as the total number of points for local projects and the setting for "Number of default influences" for remote projects.	`None`
`system_data`	`bool`	Include system data (e.g. timestamps) if available in response. Defaults to False.	`False`
`by_group`	`bool`	For ranking projects, whether to retrieve data by group or not. Ignored for non-ranking projects. Defaults to False.	`False`
`num_per_group`	`Optional[int]`	For ranking projects and when `by_group` is True, the number of points per group to return.	`None`
`wait`	`bool`	Whether to wait for the job to finish. Defaults to True.	`True`

Returns:

Type	Description
`pd.DataFrame`	The model predictions as a pd.DataFrame.

Examples:

# Assuming `tru` is a `TrueraWorkspace`.
>>> tru.set_model("model1")
>>> explainer = tru.get_explainer("split1")

# This will return "model1"'s predictions on "split1".
>>> explainer.get_ys_pred()

`list_performance_metrics(self)` ¶

Lists the available metrics that can be supplied to compute_performance.

`set_base_data_split(self, data_split_name=None)` ¶

Set the base data split to use for all operations in the explainer.

Parameters:

Name	Type	Description	Default
`data_split_name`	`Optional[str]`	Name of the data split.	`None`

Examples:

# Assuming `tru` is a `TrueraWorkspace`.
>>> explainer = tru.get_explainer("split1")

# Gets labels for data-split "split1".
>>> explainer.get_ys()

>>> explainer.set_base_data_split("split2")

# Gets labels for data-split "split2".
>>> explainer.get_ys()

`set_comparison_data_splits(self, comparison_data_splits=None, use_all_data_splits=False)` ¶

Sets comparison data splits to use for all operations in the explainer.

Parameters:

Name	Type	Description	Default
`comparison_data_splits`	`Optional[Sequence[str]]`	List of data split names for comparison. This is ignored if `use_all_data_split` is set to True. (Optional)	`None`
`use_all_data_splits`	`bool`	If set to True, set comparison data splits as all of the data splits in the data collection except the base data split. (Optional)	`False`

Examples:

# Assuming `tru` is a `TrueraWorkspace`.
>>> explainer = tru.get_explainer("split1")
>>> explainer = tru.set_comparison_data_splits(["split2", "split3"])

# This will compute AUC metrics for "split1" and compare to "split2" and "split3".
>>> explainer.compute_performance("AUC")

`set_segment(self, segment_group_name, segment_name)` ¶

Sets and applies a segment filter to all explainer operations.

Parameters:

Name	Type	Description	Default
`segment_group_name`	`str`	Name of segment group under which the segment is defined.	required
`segment_name`	`str`	Name of the segment.	required

Examples:

# Assuming `tru` is a `TrueraWorkspace`.
>>> explainer = tru.get_explainer("split1")

# This will return all the xs in "split1".
>>> explainer.get_xs()

>>> explainer.set_segment("segment_group1", "segment1_in_segment_group1")

# This will return only the xs in "split1" filtered to the segment "segment1_in_segment_group1".
>>> explainer.get_xs()

`NonTabularExplainer (Explainer)` ¶

`clear_segment(self)` ¶

Not Available for NonTabularExplainer

`compute_feature_influences(self, start=0, stop=None, system_data=False, by_group=False, num_per_group=None, wait=True)` ¶

Gets the input influences.

Parameters:

Name	Type	Description	Default
`-`	`start (int`	A starting offset of the data records	required
`-`	`stop (int`	A stop offset of the data records	required

Examples:

# During NN Ingestion you will create two objects
>>> from truera.client.nn.client_configs import NLPAttributionConfiguration
>>> attr_config = NLPAttributionConfiguration(...)

>>> from truera.client.nn.wrappers.autowrap import autowrap
>>> truera_wrappers = autowrap(...) # Use the appropriate NN Diagnostics Ingestion to create this

# Check if ingestion is set up correctly
>>> tru.verify_nn_wrappers(
        clf=model,
        attr_config=attr_config,
        truera_wrappers=truera_wrappers
    )

# Add the model and data to the truera workspace
>>> tru.add_nn_model(
        model_name="<model_name>",
        truera_wrappers,
        attr_config
    )
>>> tru.add_nn_data_split(
        data_split_name="<split_name>",
        truera_wrappers,
        split_type="<split_type_train_or_test>"
    )

# Compute influences
>>> tru.compute_feature_influences()

`compute_feature_influences_for_data(self)` ¶

Not Available for NonTabularExplainer

`compute_performance(self)` `inherited` ¶

Computes performance metrics from labels and predictions. To see the list of available metrics, use list_performance_metrics

Examples:

# Set your project, data collection and model
>>> tru.set_project("Project Name")
>>> tru.set_data_collection("data collection name")
>>> tru.set_model("model v1")

# Get the explainer for the base split
>>> explainer = tru.get_explainer("train split name")

# Compute performance for the base split using the explainer object
>>> explainer.compute_performance(metric_type = "MAE", plot = False)

`get_base_data_split(self)` ¶

Not Available for NonTabularExplainer

`get_comparison_data_splits(self)` ¶

Not Available for NonTabularExplainer

`get_data_collection(self)` `inherited` ¶

Get the data collection name used by explainer.

`get_feature_names(self)` `inherited` ¶

Get the feature names associated with the current data-collection.

Returns:

Type	Description
`Sequence[str]`	Feature names.

`get_spline_fitter(self, start=0, stop=None)` ¶

Not Available for NonTabularExplainer

`get_xs(self, start=0, stop=None, extra_data=False, system_data=False, by_group=False, num_per_group=None)` `inherited` ¶

Get the inputs/data/x-values.

Parameters:

Name	Type	Description	Default
`start`	`Optional[int]`	The lower bound (inclusive) of the index of points to include. Defaults to 0.	`0`
`stop`	`Optional[int]`	The upper bound (exclusive) of the index of points to include. Defaults to None which is interpreted as the total number of points for local projects and the setting for "Number of default influences" for remote projects.	`None`
`extra_data`	`bool`	Include extra data columns in the response.	`False`
`system_data`	`bool`	Include system data columns (unique ID) in the response.	`False`
`by_group`	`bool`	For ranking projects, whether to retrieve data by group or not. Ignored for non-ranking projects. Defaults to False.	`False`
`num_per_group`	`Optional[int]`	For ranking projects and when `by_group` is True, the number of points per group to return.	`None`

Returns:

Type	Description
`pd.DataFrame`	The inputs/data/x-values.

`get_ys(self, start=0, stop=None, system_data=False, by_group=False, num_per_group=None)` `inherited` ¶

Get the targets/y-values.

Parameters:

Name	Type	Description	Default
`start`	`Optional[int]`	The lower bound (inclusive) of the index of points to include. Defaults to 0.	`0`
`stop`	`Optional[int]`	The upper bound (exclusive) of the index of points to include. Defaults to None which is interpreted as the total number of points for local projects and the setting for "Number of default influences" for remote projects.	`None`
`system_data`	`bool`	Include system data columns (unique ID) in the response.	`False`
`by_group`	`bool`	For ranking projects, whether to retrieve data by group or not. Ignored for non-ranking projects. Defaults to False.	`False`
`num_per_group`	`Optional[int]`	For ranking projects and when `by_group` is True, the number of points per group to return.	`None`

Returns:

Type	Description
`pd.DataFrame`	The targets/y-values as a pd.DataFrame.

`get_ys_pred(self, start=0, stop=None, system_data=False, by_group=False, num_per_group=None, wait=True)` `inherited` ¶

Get the model predictions.

Parameters:

Name	Type	Description	Default
`start`	`Optional[int]`	The lower bound (inclusive) of the index of points to include. Defaults to 0.	`0`
`stop`	`Optional[int]`	The upper bound (exclusive) of the index of points to include. Defaults to None which is interpreted as the total number of points for local projects and the setting for "Number of default influences" for remote projects.	`None`
`system_data`	`bool`	Include system data (e.g. timestamps) if available in response. Defaults to False.	`False`
`by_group`	`bool`	For ranking projects, whether to retrieve data by group or not. Ignored for non-ranking projects. Defaults to False.	`False`
`num_per_group`	`Optional[int]`	For ranking projects and when `by_group` is True, the number of points per group to return.	`None`
`wait`	`bool`	Whether to wait for the job to finish. Defaults to True.	`True`

Returns:

Type	Description
`pd.DataFrame`	The model predictions as a pd.DataFrame.

Examples:

# Assuming `tru` is a `TrueraWorkspace`.
>>> tru.set_model("model1")
>>> explainer = tru.get_explainer("split1")

# This will return "model1"'s predictions on "split1".
>>> explainer.get_ys_pred()

`list_performance_metrics(self)` `inherited` ¶

Lists the available metrics that can be supplied to compute_performance.

`set_base_data_split(self, data_split_name=None)` ¶

Not Available for NonTabularExplainer

`set_comparison_data_splits(self)` ¶

Not Available for NonTabularExplainer

`set_segment(self, segment_group_name=None, segment_name=None)` ¶

Not Available for NonTabularExplainer

`TabularExplainer (Explainer)` ¶

Contains methods to provide explanations for tabular models.

Examples:

# Assuming `tru` is a `TrueraWorkspace` with a tabular project 

# Set your project, data collection and model
>>> tru.set_project("Project Name")
>>> tru.set_data_collection("data collection name")
>>> tru.set_model("model v1")

# Get the explainer for the base split
>>> explainer = tru.get_explainer("train split name")

`clear_segment(self)` `inherited` ¶

Clears any set segments from all explainer operations.

Examples:

# Assuming `tru` is a `TrueraWorkspace`.
>>> explainer = tru.get_explainer("split1")
>>> explainer.set_segment("segment_group1", "segment1_in_segment_group1")

# This will return only the xs in "split1" filtered to the segment "segment1_in_segment_group1".
>>> explainer.get_xs()

>>> explainer.clear_segment()

# This will return all the xs in "split1".
>>> explainer.get_xs()

`compute_fairness(self, segment_group, segment1, segment2=None, fairness_type='DISPARATE_IMPACT_RATIO', threshold=None, threshold_score_type=None)` ¶

Compares the fairness of outcomes for two segments within a segment group using the provided fairness type.

Parameters:

Name	Type	Description	Default
`segment_group`	`str`	Name of segment group that the two segments are defined under.	required
`segment1`	`str`	Name of first segment (must belong to provided segment group).	required
`segment2`	`Optional[str]`	Name of second segment (must belong to provided segment group). If None, then uses the complement of `segment1`.	`None`
`fairness_type`	`Optional[str]`	Name of fairness metric. Must be one of the options returned by `list_fairness_metrics`.	`'DISPARATE_IMPACT_RATIO'`
`threshold`	`Optional[float]`	Optional model threshold for classification models. If None, defaults to pre-configured threshold for the model. Ignored for regression models.	`None`
`threshold_score_type`	`Optional[str]`	If `threshold` is provided, the score type to apply the threshold to (`probits` or `logits`). If None, defaults to pre-configured score type for the model. Ignored for regression models.	`None`

Returns:

Type	Description
`BiasResult`	Computed fairness metric along with information about which group is favored.

Examples:

# Assuming `tru` is a `TrueraWorkspace`.

# Setup the project
>>> project_name = 'test_explainer_compute_fairness'
>>> tru.add_project(project_name, score_type='classification', input_type='tabular')
>>> tru.add_data_collection('dc')

# Create train data where membership 1 is unfairly treated
>>> wealth_max = 5000
>>> rng = np.random.default_rng(seed=42)
>>> wealth = rng.uniform(0.0,wealth_max,size=100)
>>> random_approvals = rng.binomial(1,0.25,size=50)
>>> xs_train = pd.DataFrame({
        "membership": [0]*50 + [1]*50,
        "wealth": wealth,
        "approved": np.concatenate([[1]*50, random_approvals])
    }).astype({
        "membership": "int",
        "wealth": "float",
        "approved": "bool",
        })

# Create data split. 
>>> xs = xs_train.drop(['approved'], axis=1)
>>> labels = xs_train['approved']
>>> tru.add_data_split('train', pre_data = xs, label_data = labels, split_type = "train")

# Create xgb model
>>> import xgboost as xgb
>>> params = {"model_type": "xgb.XGBClassifier", "eta": 0.2, "max_depth": 4}
>>> xgb_clf = xgb.XGBClassifier(eta = params['eta'], max_depth = params['max_depth'])
>>> xgb_clf.fit(xs, labels)

# Add model to project and set model
>>> tru.add_python_model("xgb", xgb_clf, train_split_name="train", train_parameters=params)
>>> tru.set_model("xgb")

# Add a segment group on which to compute fairness
>>> tru.add_segment_group("membership", {"zero": 'membership == 0', 'one': 'membership == 1'})

# Compute fairness across gender and display results.
>>> explainer = tru.get_explainer("train")
>>> explainer.compute_fairness("membership", "one", "zero")

`compute_feature_contributors_to_instability(self, score_type=None, use_difference_of_means=False, wait=True)` ¶

Compute feature contributors to model instability from the base split to all comparison splits that are set in the current context. By default, instability is measured using Wasserstein Distance.

Parameters:

Name	Type	Description	Default
`score_type`	`Optional[str]`	The score type to use when computing instability. If None, uses default score type of project. Defaults to None. For a list of valid score types, see `tru.list_valid_score_types`.	`None`
`use_difference_of_means`	`bool`	If True, measures instability with Difference of Means. Defaults to False.	`False`
`wait`	`bool`	Whether to wait for the job to finish. Defaults to True.	`True`

Returns:

Type	Description
`pd.DataFrame`	Table of contributions per feature and comparison split.

Examples:

# Assuming `tru` is a `TrueraWorkspace`.
>>> tru.set_model("model1")
>>> explainer = tru.get_explainer("split1")

# Add one or more comparison data split(s) to the current context.
>>> explainer.set_comparison_data_splits(["split2"])

# Show the features contributors to model instability for each comparison split.
>>> explainer.compute_feature_contributors_to_instability()

# Add multiple comparison data splits and re-run to see more output rows
>>> explainer.set_comparison_data_splits(["split2", "split3"])
>>> explainer.compute_feature_contributors_to_instability()

`compute_feature_influences(self, start=0, stop=None, score_type=None, system_data=False, by_group=False, num_per_group=None, wait=True)` `inherited` ¶

Compute the QIIs/shapley-values for this explainer's currently set data split.

Parameters:

Name	Type	Description	Default
`start`	`Optional[int]`	The lower bound (inclusive) of the index of points to include. Defaults to 0.	`0`
`stop`	`Optional[int]`	The upper bound (exclusive) of the index of points to include. Defaults to None which is interpreted as the project setting for "Number of default influences".	`None`
`score_type`	`Optional[str]`	The score type to use when computing influences. If None, defaults to score type of project. Defaults to None. For a list of valid score types, see `tru.list_valid_score_types`.	`None`
`system_data`	`bool`	Include system data (e.g. timestamps) if available in response. Defaults to False.	`False`
`wait`	`bool`	Whether to wait for the job to finish. Defaults to True.	`True`

Returns:

Type	Description
`pd.DataFrame`	The QIIs/shapley-values.

Examples:

# Assuming `tru` is a `TrueraWorkspace`.
>>> tru.set_model("model1")
>>> explainer = tru.get_explainer("split1")

# This will return "model1"'s feature influences on "split1".
>>> explainer.compute_feature_influences()

`compute_feature_influences_for_data(self, pre_data, post_data=None, ys=None, score_type=None, comparison_post_data=None, num_internal_qii_samples=1000, algorithm='truera-qii')` `inherited` ¶

Compute the QIIs/shapley-values for the provided data.

Parameters:

Name	Type	Description	Default
`pre_data`	`pd.DataFrame`	A pandas DataFrame containing the human-readable data for which to compute influences. If `post_data` is not specified, `pre_data` is assumed to be both human- and model-readable.	required
`post_data`	`Optional[pd.DataFrame]`	A pandas DataFrame containing the model-readable post-processed data that is aligned with the pre-processed data. Can be ignored if model-readable pre-processed data is provided. If providing different pre- and post-processed data, be sure the mapping between them adheres to the feature map of the data collection specified during the data collection's creation.	`None`
`ys`	`Optional[Union[np.ndarray, pd.Series]]`	Labels for which to compute influences if required by the provided `score_type`. Defaults to None.	`None`
`score_type`	`Optional[str]`	The score type to use when computing influences. If None, defaults to score type of project. Defaults to None. For a list of valid score types, see `tru.list_valid_score_types`.	`None`
`comparison_post_data`	`Optional[pd.DataFrame]`	The comparison data to use when computing influences. If None, defaults to a data split of the data collection of type "all" or "train" and failing that uses the base split currently set in this explainer. Defaults to None.	`None`
`num_internal_qii_samples`	`int`	Number of samples used internally in influence computations.	`1000`
`algorithm`	`str`	Algorithm to use during computation. Must be one of ["truera-qii", "tree-shap-tree-path-dependent", "tree-shap-interventional", "kernel-shap"]. Defaults to "truera-qii".	`'truera-qii'`

Returns:

Type	Description
`pd.DataFrame`	The QIIs/shapley-values.

Examples:

# Assuming `tru` is a `TrueraWorkspace`.
>>> tru.set_model("model1")
>>> explainer = tru.get_explainer("split1")
>>> pre_data = ...
>>> post_data = ...

# This will compute feature influences on `pre_data`/`post_data`. Note that the feature mapping from
# `pre_data` to `post_data` must be the same as that expected by "model1".
>>> explainer.compute_feature_influences_for_data(
        pre_data=pre_data,
        post_data=post_data,
        score_type="probits",
        comparison_post_data=post_data
    )

`compute_model_score_instability(self, score_type=None, use_difference_of_means=False, plot=False)` ¶

Compute model score instability from the base split to all comparison splits that are set in the current context. By default, instability is measured using Wasserstein Distance.

Parameters:

Name	Type	Description	Default
`score_type`	`Optional[str]`	The score type to use when computing instability. If None, defaults to score type of project. Defaults to None. For a list of valid score types, see `tru.list_valid_score_types`.	`None`
`use_difference_of_means`	`bool`	If True, measures instability with Difference of Means. Defaults to False.	`False`
`plot`	`bool`	If True, plots performances for all base and comparison splits in the current context.	`False`

Returns:

Type	Description
`Union[float, pd.DataFrame]`	The model score instability. If comparison data splits are set, a pd.DataFrame of all splits and their respective score instabilities. Otherwise, a single float metric is returned.

Examples:

# Assuming `tru` is a `TrueraWorkspace`.

# Setup the project
>>> project_name = 'test_explainer'
>>> tru.add_project(project_name, score_type='regression', input_type='tabular')
>>> tru.add_data_collection('dc')

# Create train data. y = 2*x + 1 over range 'x <= 49.0 AND x >= -50.0'
>>> xs_train = pd.DataFrame({
        "x": range(-50, 50)
    })
>>> ys_train = 2 * xs_train["x"] + 1

# Create test data. Add (seeded) random noise to segment 'x <= 49.0 AND x >= 0.0'.
>>> rng = np.random.default_rng(seed=42)
>>> noise = rng.random(50)
>>> xs_test = xs_train.copy()
>>> ys_test = ys_train.copy() + np.concatenate([np.zeros(50), noise])

# Create another split that will produce high instability. y = -2*x + 1
>>> xs_invert = xs_train.copy()
>>> ys_invert = -2 * xs_train["x"] + 1

# Add data splits to project
>>> tru.add_data_split('train', pre_data = xs_train, label_data = ys_train, split_type = "train")
>>> tru.add_data_split('test', pre_data = xs_test, label_data = ys_test, split_type = "test")
>>> tru.add_data_split('invert', pre_data = xs_invert, label_data = ys_invert, split_type = "validate")

# create xgb model
>>> import xgboost as xgb
>>> params = {"model_type": "xgb.XGBRegressor", "eta": 0.2, "max_depth": 4}
>>> xgb_reg = xgb.XGBRegressor(eta = params['eta'], max_depth = params['max_depth'])
>>> xgb_reg.fit(xs_train, ys_train)

# add model to project
>>> tru.add_python_model("xgb", xgb_reg, train_split_name="train", train_parameters=params)

# Create an explainer and set the comparison splits
>>> explainer = tru.get_explainer("train")
>>> explainer.set_comparison_data_splits(["test", "invert"])

# Denote the score_type and call the method
>>> score_type = "mean_absolute_error_for_regression"
>>> explainer.compute_model_score_instability(score_type=score_type)

`compute_partial_dependencies(self, wait=True)` ¶

Get the partial dependencies for all features. Partial dependencies capture the marginal effect of a feature's value on the predicted outcome of the model.

Parameters:

Name	Type	Description	Default
`wait`	`bool`	Whether to wait for the job to finish. Defaults to True.	`True`

Returns:

Type	Description
`Tuple[Sequence[str], Mapping[str, Sequence], Mapping[str, Sequence]]`	The partial dependencies described in a 3-tuple: A list of the features, a mapping from feature to the x-values in a PDP, and a mapping from feature to the y-values in a PDP.

Examples:

# Assuming `tru` is a `TrueraWorkspace`.

# Setup the project
>>> import numpy as np
>>> import pandas as pd
>>> project_name = 'test_explainer_pdp'
>>> tru.add_project(project_name, score_type='regression', input_type='tabular')
>>> tru.add_data_collection('dc')

# Create train data. y = 2*x + 1 over range 'x <= 49.0 AND x >= -50.0'
>>> xs_train = pd.DataFrame({
    "x": range(-50, 50),
    "cat": [0]*50 + [1]*50
})

# For ys, inject noise when x >= 0
>>> noise_gain = 5
>>> rng = np.random.default_rng(seed=42)
>>> noise = noise_gain*rng.random(50)
>>> ys_train = 2 * xs_train["x"] + 1 
>>> ys_train = ys_train + np.concatenate([np.zeros(50), noise])

# Create data splits.
>>> tru.add_data_split('train', pre_data = xs_train, label_data = ys_train, split_type = "train")

# create xgb model
>>> import xgboost as xgb
>>> params = {"model_type": "xgb.XGBRegressor", "eta": 0.2, "max_depth": 4}
>>> xgb_reg = xgb.XGBRegressor(eta = params['eta'], max_depth = params['max_depth'])
>>> xgb_reg.fit(xs_train, ys_train)

# add model to project
>>> tru.add_python_model("xgb", xgb_reg, train_split_name="train", train_parameters=params)

# create explainer, compute partial dependencies (PDs)
>>> explainer = tru.get_explainer("train")
>>> pds = explainer.compute_partial_dependencies()
>>> features, xs, ys = pds

# Plot the PDs manually, accounting for numerical vs. categorical features
>>> import matplotlib.pyplot as plt
>>> for i, feature in enumerate(features):
>>>     plt.figure()
>>>     if i == 0: # numerical feature (x)
>>>         plt.plot(xs[feature], ys[feature])
>>>     if i == 1: # categorical feature (cat)
>>>         plt.bar(xs[feature], ys[feature])
>>>     plt.title(feature)

`compute_performance(self, metric_type, plot=False)` ¶

Compute performance metric.

Examples:

# Assuming `tru` is a `TrueraWorkspace`.
tru.set_model("model1")
explainer = tru.get_explainer("split1")

# This will compute AUC metrics for "split1".
explainer.compute_performance("AUC")

Parameters:

Name	Type	Description	Default
`metric_type`	`str`	Name of performance metric. Must be one of the options returned by `list_performance_metrics`.	required
`plot`	`bool`	If True, plots performances for all base and comparison splits in the current context.	`False`

Returns:

Type	Description
`Union[float, pd.DataFrame]`	The performance metric. If comparison data splits are set, a pd.DataFrame of all splits and their respective performance. Otherwise, a single float metric is returned.

`find_hotspots(self, num_features=1, max_num_responses=3, num_samples=100, metric_of_interest=None, metrics_to_show=None, minimum_size=50, minimum_metric_of_interest_threshold=0, size_exponent=0.25, comparison_data_split_name=None, bootstrapping_fraction=1, random_state=0, show_what_if_performance=False, use_labels=True)` ¶

Suggests high error segments for the model for the currently set data split.

Parameters:

Name	Type	Description	Default
`num_features`	`int`	Number of features to use to describe a high error segment. Defaults to 1.	`1`
`max_num_responses`	`int`	Maximum number of high error segments to return. Defaults to 3.	`3`
`num_samples`	`int`	Number of samples to use while attempting to find high error segments. The higher the number of samples the slower the computation, but the better the high error segments are generally. Defaults to 100.	`100`
`metric_of_interest`	`Optional[str]`	Name specifying how segments are chosen. When None, defaults internally to either 'SEGMENT_GENERALIZED_AUC' or 'MAE' for classification or regression, respectively. Defaults to None.	`None`
`metrics_to_show`	`Optional[Union[str, Sequence[str]]]`	Name of performance metric or list of them to include. Must be one of the options returned by `list_performance_metrics`. Defaults to None.	`None`
`minimum_size`	`int`	Minimum size of a segment. Defaults to 50.	`50`
`minimum_metric_of_interest_threshold`	`float`	Minimum difference between segment and comparison (i.e. entire split when `comparison_data_split_name` is not given, and segment on the `comparison_data_split_name` data-split otherwise). Defaults to 0.	`0`
`size_exponent`	`float`	Exponential factor on size of segment. Should be in [0, 1]. A zero value implies the segment size has no effect. Defaults to 0.25.	`0.25`
`comparison_data_split_name`	`Optional[str]`	Comparison data-split to use (e.g. train split for overfitting analysis). If set, we look for segments that are far more problematic in the explainer's data split than the comparison one supplied here.	`None`
`bootstrapping_fraction`	`float`	Random fraction of points to use for analysis. Should be in (0, 1]. Defaults to 1.	`1`
`random_state`	`int`	Random seed for two random processes: 1) selecting the features to analyze and 2) choosing points in bootstrapping. If `bootstrapping_fraction` < 1, then changing this parameter will introduce more 'randomness' (i.e., change the segment's values for a given feature/set of features). Otherwise, the method will always return the same values for a given feature/set of features. Defaults to 0.	`0`
`show_what_if_performance`	`bool`	Whether to show the "what if" performance of the segment, defined as what the overall accuracy on the split would be if the segment's performance were brought up to the accuracy on the whole split. The "what if" version of a metric can only be defined if the metric can be defined per-point and averaged over. Defaults to False.	`False`
`use_labels`	`bool`	Whether to use the labels as a feature for segmentation. Defaults to True.	`True`

Returns:

Type Description

pd.DataFrame

DataFrame describing high error segments. Each row corresponds to a suggested high error segment, with the following columns: 1. segment_definition: The segment definition which can be ingested via the workspace add_segment_group function. 2. size: The number of points in this segment in total in the base data split. In the presence of a comparison split, this will also include ({base_data_split_name}). 3. size (%): The percentage of points in this segment in the base data split. In the presence of a comparison split, this will also include ({base_data_split_name}). There will also be additional columns corresponding to: A. The metric of interest along with a column for each metric in metrics_to_show. B. The "what if" metric corresponding to the metric of interest (if viable) along with each viable "what if" metric in metrics_to_show. Only displayed when show_what_if_performance is True. C. size ({comparison_data_split_name}) and size (%) ({comparison_data_split_name}): The same size and size (%) as above but for the comparison data split. Only displayed when comparison_data_split_name is provided. D. size diff (%): The absolute difference in size (%) between base and comparison data split. Only displayed when metric_of_interest is UNDER_OR_OVERSAMPLING.

Examples:

# Assuming `tru` is a `TrueraWorkspace`.
# Setup the project
>>> project_name = 'test_explainer'
>>> tru.add_project(project_name, score_type='regression', input_type='tabular')
>>> tru.add_data_collection('dc')

# Create train data. y = 2*x + 1 over range 'x <= 49.0 AND x >= -50.0'
>>> xs_train = pd.DataFrame({
        "x": range(-50, 50)
    })
>>> ys_train = 2 * xs_train["x"] + 1
# Create test data. Add (seeded) random noise to segment 'x <= 49.0 AND x >= 0.0'.
>>> rng = np.random.default_rng(seed=42)
>>> noise = rng.random(50)
>>> xs_test = xs_train.copy()
>>> ys_test = ys_train.copy() + np.concatenate([np.zeros(50), noise])

# Create data splits.
>>> tru.add_data_split('train', pre_data = xs_train, label_data = ys_train, split_type = "train")
>>> tru.add_data_split('test', pre_data = xs_test, label_data = ys_test, split_type = "test")

# create xgb model
>>> import xgboost as xgb
>>> params = {"model_type": "xgb.XGBRegressor", "eta": 0.2, "max_depth": 4}
>>> xgb_reg = xgb.XGBRegressor(eta = params['eta'], max_depth = params['max_depth'])
>>> xgb_reg.fit(xs_train, ys_train)

# add model to project
>>> tru.add_python_model("xgb", xgb_reg, train_split_name="train", train_parameters=params)

# create explainer and return high_error_segments
>>> explainer = tru.get_explainer("test")
>>> explainer.find_hotspots(
        metric_of_interest="MSE"
    )

# return high_error_segments without labels as segment feature
>>> explainer.find_hotspots(
        metric_of_interest="MSE",
        use_labels=False
    )

# return high_error_segments with multiple segment metrics 
>>> explainer.find_hotspots(
        metric_of_interest="MSE",
        metrics_to_show=["MAE"],
        use_labels=False
    )

# return high_error_segments with comparison split
>>> explainer.find_hotspots(
        metric_of_interest="MSE",
        use_labels=False,
        comparison_data_split_name="train"
    )

`get_base_data_split(self)` `inherited` ¶

Get the base data split used by explainer.

Returns:

Type	Description
`str`	The name of the base data split.

`get_comparison_data_splits(self)` `inherited` ¶

Gets the comparison data splits used by the explainer.

Returns:

Type	Description
`Sequence[str]`	The names of the comparison data splits.

`get_data_collection(self)` `inherited` ¶

Get the data collection name used by explainer.

`get_feature_names(self)` `inherited` ¶

Get the feature names associated with the current data-collection.

Returns:

Type	Description
`Sequence[str]`	Feature names.

`get_global_feature_importances(self, score_type=None, wait=True)` ¶

Get the global feature importances (as measured by QIIs) for this explainer's currently set data split.

Parameters:

Name	Type	Description	Default
`score_type`	`Optional[str]`	The score type to use when computing influences. If None, defaults to score type of project. Defaults to None. For a list of valid score types, see `tru.list_valid_score_types`.	`None`
`wait`	`bool`	Whether to wait for the job to finish.	`True`

Returns:

Type	Description
`pd.DataFrame`	The normalized global importances for each feature.

`get_spline_fitter(self, start=0, stop=None)` `inherited` ¶

Get the spline-fitter using the provided range of points to fit splines.

Parameters:

Name	Type	Description	Default
`start`	`int`	The lower bound (inclusive) of the index of points to use during spline fitting. Defaults to 0.	`0`
`stop`	`Optional[int]`	The upper bound (exclusive) of the index of points to use during spline fitting. Defaults to the number of Number of default influences for the project.	`None`

Returns:

Type	Description
`SplineFitter`	Spline-fitter.

Examples:

# Assuming `tru` is a `TrueraWorkspace`.
tru.set_model("model1")
explainer = tru.get_explainer("split1")

# Construct a spline fitter and use it to construct a GAM.
sf = explainer.get_spline_fitter("segment_group1", "segment1_in_segment_group1")
gam = sf.construct_gam(
    n_splines=10,
    spline_orders=3,
    constraints={"feature2": "monotonic_inc", "feature5": "monotonic_dec"}
)

# Add GAM model into TruEra.
tru.add_python_model("GAM from model1", gam)

`get_xs(self, start=0, stop=None, extra_data=False, system_data=False, by_group=False, num_per_group=None)` `inherited` ¶

Get the inputs/data/x-values.

Parameters:

Name	Type	Description	Default
`start`	`Optional[int]`	The lower bound (inclusive) of the index of points to include. Defaults to 0.	`0`
`stop`	`Optional[int]`	The upper bound (exclusive) of the index of points to include. Defaults to None which is interpreted as the total number of points for local projects and the setting for "Number of default influences" for remote projects.	`None`
`extra_data`	`bool`	Include extra data columns in the response.	`False`
`system_data`	`bool`	Include system data columns (unique ID) in the response.	`False`
`by_group`	`bool`	For ranking projects, whether to retrieve data by group or not. Ignored for non-ranking projects. Defaults to False.	`False`
`num_per_group`	`Optional[int]`	For ranking projects and when `by_group` is True, the number of points per group to return.	`None`

Returns:

Type	Description
`pd.DataFrame`	The inputs/data/x-values.

`get_ys(self, start=0, stop=None, system_data=False, by_group=False, num_per_group=None)` `inherited` ¶

Get the targets/y-values.

Parameters:

Name	Type	Description	Default
`start`	`Optional[int]`	The lower bound (inclusive) of the index of points to include. Defaults to 0.	`0`
`stop`	`Optional[int]`	The upper bound (exclusive) of the index of points to include. Defaults to None which is interpreted as the total number of points for local projects and the setting for "Number of default influences" for remote projects.	`None`
`system_data`	`bool`	Include system data columns (unique ID) in the response.	`False`
`by_group`	`bool`	For ranking projects, whether to retrieve data by group or not. Ignored for non-ranking projects. Defaults to False.	`False`
`num_per_group`	`Optional[int]`	For ranking projects and when `by_group` is True, the number of points per group to return.	`None`

Returns:

Type	Description
`pd.DataFrame`	The targets/y-values as a pd.DataFrame.

`get_ys_pred(self, start=0, stop=None, system_data=False, by_group=False, num_per_group=None, wait=True)` `inherited` ¶

Get the model predictions.

Parameters:

Name	Type	Description	Default
`start`	`Optional[int]`	The lower bound (inclusive) of the index of points to include. Defaults to 0.	`0`
`stop`	`Optional[int]`	The upper bound (exclusive) of the index of points to include. Defaults to None which is interpreted as the total number of points for local projects and the setting for "Number of default influences" for remote projects.	`None`
`system_data`	`bool`	Include system data (e.g. timestamps) if available in response. Defaults to False.	`False`
`by_group`	`bool`	For ranking projects, whether to retrieve data by group or not. Ignored for non-ranking projects. Defaults to False.	`False`
`num_per_group`	`Optional[int]`	For ranking projects and when `by_group` is True, the number of points per group to return.	`None`
`wait`	`bool`	Whether to wait for the job to finish. Defaults to True.	`True`

Returns:

Type	Description
`pd.DataFrame`	The model predictions as a pd.DataFrame.

Examples:

# Assuming `tru` is a `TrueraWorkspace`.
>>> tru.set_model("model1")
>>> explainer = tru.get_explainer("split1")

# This will return "model1"'s predictions on "split1".
>>> explainer.get_ys_pred()

`list_performance_metrics(self)` ¶

Lists all available performance metrics.

Returns:

Type	Description
`Sequence[str]`	List of performance metric names, which can be provided to `compute_performance`.

`plot_isp(self, feature, num=None, figsize=(700, 500), xlim=None)` ¶

Plot the influence sensitivity plot (ISP) of a specific feature.

Parameters:

Name	Type	Description	Default
`feature`	`str`	Feature to plot the ISP of.	required
`num`	`Optional[int]`	Number of points to plot. Defaults to None, which is equivalent to a standard number of points used for calculations.	`None`
`figsize`	`Optional[Tuple[int, int]]`	Size for plot in pixels. Defaults to (700, 500).	`(700, 500)`
`xlim`	`Optional[Tuple[int, int]]`	Range for x-axis. Defaults to None, which scales to the size of the data.	`None`

`plot_isps(self, features=None, num=None, figsize=(700, 500))` ¶

Plot the influence sensitivity plot (ISP) of a set of features.

Parameters:

Name	Type	Description	Default
`features`	`Optional[Sequence[str]]`	Features to plot the ISP of. Defaults to None, which is all features.	`None`
`num`	`Optional[int]`	Number of points to plot. Defaults to None, which is equivalent to a standard number of points used for calculations.	`None`
`figsize`	`Optional[Tuple[int, int]]`	Size for plot. Defaults to (21, 6).	`(700, 500)`

`plot_pdp(self, feature, figsize=(700, 500), xlim=None)` ¶

DEPRECATED: Plot the partial dependence plot (PDP) of a specific feature.

Parameters:

Name	Type	Description	Default
`feature`	`str`	Feature to plot the PDP of.	required
`figsize`	`Optional[Tuple[int, int]]`	Size for plot. Defaults to (21, 6).	`(700, 500)`
`xlim`	`Optional[Tuple[int, int]]`	Range for x-axis. Defaults to None, which scales to the size of the data.	`None`

`plot_pdps(self, features=None, figsize=(700, 500))` ¶

DEPRECATED: Plot the partial dependence plot (PDP) of a set of features.

Parameters:

Name	Type	Description	Default
`features`	`Optional[Sequence[str]]`	Features to plot the PDP of. Defaults to None, which is all features.	`None`
`figsize`	`Optional[Tuple[int, int]]`	Size for plot. Defaults to (21, 6).	`(700, 500)`

`rank_performance(self, metric_type, ascending=False)` ¶

Rank performance of all models in the data collection on the explainer's base data split. If comparison data splits are set, will also show performance of the models on them.

Parameters:

Name	Type	Description	Default
`metric_type`	`str`	Name of performance metric. Must be one of the options returned by `list_performance_metrics`.	required
`ascending`	`bool`	If True, sort the results in ascending order. Defaults to False.	`False`

Returns:

Type	Description
`pd.DataFrame`	The performance score of all models in the data collection.

`set_base_data_split(self, data_split_name=None)` `inherited` ¶

Set the base data split to use for all operations in the explainer.

Parameters:

Name	Type	Description	Default
`data_split_name`	`Optional[str]`	Name of the data split.	`None`

Examples:

# Assuming `tru` is a `TrueraWorkspace`.
>>> explainer = tru.get_explainer("split1")

# Gets labels for data-split "split1".
>>> explainer.get_ys()

>>> explainer.set_base_data_split("split2")

# Gets labels for data-split "split2".
>>> explainer.get_ys()

`set_comparison_data_splits(self, comparison_data_splits=None, use_all_data_splits=False)` `inherited` ¶

Sets comparison data splits to use for all operations in the explainer.

Parameters:

Name	Type	Description	Default
`comparison_data_splits`	`Optional[Sequence[str]]`	List of data split names for comparison. This is ignored if `use_all_data_split` is set to True. (Optional)	`None`
`use_all_data_splits`	`bool`	If set to True, set comparison data splits as all of the data splits in the data collection except the base data split. (Optional)	`False`

Examples:

# Assuming `tru` is a `TrueraWorkspace`.
>>> explainer = tru.get_explainer("split1")
>>> explainer = tru.set_comparison_data_splits(["split2", "split3"])

# This will compute AUC metrics for "split1" and compare to "split2" and "split3".
>>> explainer.compute_performance("AUC")

`set_segment(self, segment_group_name, segment_name)` `inherited` ¶

Sets and applies a segment filter to all explainer operations.

Parameters:

Name	Type	Description	Default
`segment_group_name`	`str`	Name of segment group under which the segment is defined.	required
`segment_name`	`str`	Name of the segment.	required

Examples:

# Assuming `tru` is a `TrueraWorkspace`.
>>> explainer = tru.get_explainer("split1")

# This will return all the xs in "split1".
>>> explainer.get_xs()

>>> explainer.set_segment("segment_group1", "segment1_in_segment_group1")

# This will return only the xs in "split1" filtered to the segment "segment1_in_segment_group1".
>>> explainer.get_xs()

`IngestionClient` ¶

Client for ingesting data from a variety of sources into the TruEra product.

`add_credential(self, name, secret, identity=None, is_aws_iam_role=False)` ¶

Add a new credential to TruEra product. The credential is saved in a secure manner and is used to authenticate with the data source to be able to perform various operations (read, filter, sample etc.).

Parameters:

Name	Type	Description	Default
`name`	`str`	Friendly name of the credential.	required
`secret`	`str`	The secret to be stored.	required
`identity`	`str`	Identity portion of the secret. Not needed in all cases. Defaults to None.	`None`

Returns:

Type	Description
`Credential`	Returns an object with the credential name and id. The secret is not stored in this object.

Examples:

>>> ACCESS_KEY = "access_key"
>>> SECRET_KEY = "asdf1234asdf1234"
>>> ingestion_client.add_credential(
        name="credential_1",
        secret=SECRET_KEY,
        identity=ACCESS_KEY
    )

`add_data_source(self, name, uri, credentials=None, **kwargs)` ¶

Add a new data source in the system.

Parameters:

Name	Type	Description	Default
`name`	`str`	Friendly name of the data source.	required
`uri`	`str`	URI describing the location of the data source. For local files this can be file:///path/to/my/file or /path/to/my/file For files stored in Azure Storage Blobs the expected path is wasb://container@account.blob.core.windows.net/blob For files stored in S3 Buckets the expected path is s3://bucket-name/file For mysql connections the expected path is mysql://database-endpoint.com:port For hive connections the expected path is hive2://database-endpoint.com:port For jdbc connections the expected path is jdbc:://database-endpoint.com:port For bigquery connections, the user is expected to pass a psuedo-uri 'bigquery:'. The connection is embedded in the json service acc key.	required
`credentials`	`Credential`	Provide the credential object if the data source requires authentication to read from it. Defaults to None.	`None`
`**format`	`str`	The format in which the file (local) or blob (AWS S3, Azure WASB etc.) are stored in. Supported formats: CSV and Parquet.	required
`**column_schema`	`Union[str, List[Tuple[str, str]]]`	For providing a schema that should be respected by the data source. This can be provided in the form of a path to a JSON/YAML file containing the schema, or a list columns each represented as a tuple	required
`**first_row_is_header`	`bool`	For text based delimited files (csv, tsv etc.), indicates if the first row provides header information. Defaults to True.	required
`**column_delimiter`	`str`	For text based delimited files (csv, tsv etc.), provides the delimiter to separate column values. Defaults to ','.	required
`**quote_character`	`str`	For text based delimited files (csv, tsv etc.), if quotes are used provide the quote character. Defaults to '"'.	required
`**null_value`	`str`	For text based delimited files (csv, tsv etc.), the string that signifies null value. Defaults to 'null'.	required
`**empty_value`	`str`	For text based delimited files (csv, tsv etc.), the string that signifies empty value. Defaults to '""'.	required
`**date_format`	`str`	For text based delimited files (csv, tsv etc.), if any column has date time, provide the format string. Defaults to 'yyyy-MM-dd HHssZZ'.	required
`**account_key`	`str`	For reading from Azure Storage Blob (WASB), provide the account_key to be used to read the blob. Not required if `credential` object is provided.	required
`**access_key_id`	`str`	For reading from a s3 bucket, provide the access key id to be used to read the blob. Not required if `credential` object is provided or the underlying deployment is in a role with access to the bucket.	required
`**secret_access_key`	`str`	For reading from a s3 bucket, provide the secret access key to be used to read the blob. Not required if `credential` object is provided or the underlying deployment is in a role with access to the bucket.	required
`**database_name`	`str`	For reading from MySQL database, provide the database name to use. Required for MySQL data source.	required
`**table_name`	`str`	For reading from MySQL database, provide the table name to use. Required for MySQL data source.	required

Returns:

Type	Description
`Table`	Returns a Table object which allows interaction with the attached data.

Examples:

# Adding a local file
>>> table = ingestion_client.add_data_source(
        name="local_data_1",
        uri="path/to/data.parquet"
    )

# Adding a data source from S3
>>> credentials = ingestion_client.add_credential(
        name="s3_credential", secret="...", identity="..."
    )
>>> table = ingestion_client.add_data_source(
        name="s3_data_1",
        uri="s3://some-data-bucket/data.parquet",
        credentials=credentials
    )

`delete_credential(self, name)` ¶

Delete a credential in the TruEra product.

Parameters:

Name	Type	Description	Default
`name`	`str`	Friendly name of the credential.	required

Examples:

>>> ingestion_client.delete_credential("credential_1")

`get_credential(self, name)` ¶

Get metadata about a credential in the TruEra product. Response does not contain the credential itself.

Parameters:

Name	Type	Description	Default
`name`	`str`	Friendly name of the credential.	required

Returns:

Type	Description
`dict`	Dictionary containing metadata describing that credential.

Examples:

>>> credential_metadata = ingestion_client.get_credential("credential_1")

`get_data_source(self, name)` ¶

Get a data source that was already created in the system.

Parameters:

Name	Type	Description	Default
`name`	`str`	The friendly name of the data source.	required

Returns:

Type	Description
`Table`	Returns a Table object which allows interaction with the attached data.

Examples:

>>> table = ingestion_client.get_data_source("table1")

`update_credential(self, name, secret, identity=None)` ¶

Update the identity and/or secret of an existing credential.

Parameters:

Name	Type	Description	Default
`name`	`str`	Friendly name of the credential.	required
`secret`	`str`	The secret to be stored.	required
`identity`	`str`	Identity portion of the secret. Not needed in all cases. Defaults to None.	`None`

Returns:

Type	Description
`Credential`	Returns an object with the credential name and id. The secret is not stored in this object.

Examples:

>>> ACCESS_KEY = "access_key"
>>> SECRET_KEY = "asdf1234asdf1234"
>>> ingestion_client.add_credential(
        name="credential_1",
        secret=SECRET_KEY,
        identity=ACCESS_KEY
    )
>>> ingestion_client.update_credentials(
        name="credential_1",
        secret="new_secret_1234",
        identity="new_identity"
    )

`Table` ¶

`add_data_split(self, data_split_name, data_split_type, label_col_name=None, id_col_name=None, sample_count=5000, sample_kind='random', *, seed=None, prediction_col_name=None, pre_data_additional_skip_cols=None, model_name=None, wait=True, timeout_seconds=300, timestamp_col_name=None, score_type=None, train_baseline_model=False, **kwargs)` ¶

Ingest the Table as a split in TruEra to use in analytics.

Parameters:

Name	Type	Description	Default
`data_split_name`	`str`	Name of the data split.	required
`data_split_type`	`str`	Type of the data split, options are ['all', 'train', 'test', 'validate', 'oot', 'custom']	required
`label_col_name`	`str`	Name of the label/ground truth/target column in the table.	`None`
`sample_count`	`int`	Maximum rows to use when creating the split. Defaults to 5000.	`5000`
`sample_kind`	`str`	Specifies the strategy to use while sub-sampling the rows. Defaults to "random".	`'random'`
`wait`	`bool`	Whether to wait for the TruEra service to complete creating the data split. Defaults to True.	`True`
`timeout_seconds`	`int`	Timeout used when `wait` is set to True. Defaults to 300.	`300`

Returns:

Type	Description
`Mapping[str, str]`	Returns a dictionary with `data_split_name`, `operation_id` and `status` of the operation.

`add_extra_data(self, data_split_name, extras_col_names, id_col_name, sample_count=5000, sample_kind='random', *, timestamp_col_name=None, seed=None, wait=True, timeout_seconds=300, **kwargs)` ¶

Upload extra_data from Table to an existing split in TruEra.

Parameters:

Name	Type	Description	Default
`data_split_name`	`str`	Name of the existing data split.	required
`extras_col_names`	`Union[str, Sequence[str]]`	Name(s) of the extra data column(s) in the table.	required
`id_col_name`	`str`	Name of the id column used to match extra data with the corresponding data points.	required
`sample_count`	`int`	Maximum rows to use when creating the split. Defaults to 5000.	`5000`
`sample_kind`	`str`	Specifies the strategy to use while sub-sampling the rows. One of ["random", "first"].	`'random'`
`timestamp_col_name`	`Optional[str]`	Name of the timestamp column of the extra data.	`None`
`wait`	`bool`	Whether to wait for the TruEra service to complete creating the data split. Defaults to True.	`True`
`timeout_seconds`	`int`	Timeout used when `wait` is set to True. Defaults to 300.	`300`

Returns:

Type	Description
`Mapping[str, str]`	Returns a dictionary with `data_split_name`, `operation_id` and `status` of the operation.

`add_feature_influences(self, data_split_name, feature_influence_col_names, id_col_name, model_name, *, background_split_name=None, timestamp_col_name=None, score_type=None, wait=True, timeout_seconds=300, influence_type=None, **kwargs)` ¶

Upload feature influences from Table to an existing split/model in TruEra.

Parameters:

Name	Type	Description	Default
`data_split_name`	`str`	Name of the existing data split.	required
`feature_influence_col_names`	`Sequence[str]`	Name of the feature influence columns in the table.	required
`id_col_name`	`str`	Name of the id column used to match predictions with the corresponding data points.	required
`model_name`	`str`	Name of the model for which feature influences are computed.	required
`background_split_name`	`Optional[str]`	Split name that contains the background of feature influence computation. If not provided, defaults to the default background data split of the given data collection.	`None`
`timestamp_col_name`	`Optional[str]`	Name of the timestamp column of the predictions.	`None`
`score_type`	`Optional[str]`	String name of score type (QoI) for prediction column.	`None`
`wait`	`bool`	Whether to wait for the TruEra service to complete creating the data split. Defaults to True.	`True`
`timeout_seconds`	`int`	Timeout used when `wait` is set to True. Defaults to 300.	`300`
`influence_type`	`Optional[str]`	Influence algorithm used to generate influences. If influence type of project is set to "truera-qii", assumes that explanations are generated using truera-qii. If influence type of project is set to "shap", then `influence_type` must be passed in as one of ["tree-shap-tree-path-dependent", "tree-shap-interventional", "kernel-shap"].	`None`

Returns:

Type	Description
`Mapping[str, str]`	Returns a dictionary with `data_split_name`, `operation_id` and `status` of the operation.

`add_labels(self, data_split_name, label_col_name, id_col_name, sample_count=5000, sample_kind='random', *, timestamp_col_name=None, seed=None, wait=True, timeout_seconds=300, model_name=None, **kwargs)` ¶

Upload labels from Table to an existing split in TruEra.

Parameters:

Name	Type	Description	Default
`data_split_name`	`str`	Name of the existing data split.	required
`label_col_name`	`str`	Name of the label/ground truth/target column in the table.	required
`id_col_name`	`str`	Name of the id column used to match labels with the corresponding data points.	required
`sample_count`	`int`	Maximum rows to use when creating the split. Defaults to 5000.	`5000`
`sample_kind`	`str`	Specifies the strategy to use while sub-sampling the rows. One of ["random", "first"].	`'random'`
`timestamp_col_name`	`Optional[str]`	Name of the timestamp column of the labels.	`None`
`wait`	`bool`	Whether to wait for the TruEra service to complete creating the data split. Defaults to True.	`True`
`timeout_seconds`	`int`	Timeout used when `wait` is set to True. Defaults to 300.	`300`

Returns:

Type	Description
`Mapping[str, str]`	Returns a dictionary with `data_split_name`, `operation_id` and `status` of the operation.

`add_predictions(self, data_split_name, prediction_col_name, id_col_name, model_name, *, timestamp_col_name=None, score_type=None, wait=True, timeout_seconds=300, **kwargs)` ¶

Upload predictions from Table to an existing split/model in TruEra.

Parameters:

Name	Type	Description	Default
`data_split_name`	`str`	Name of the existing data split.	required
`prediction_col_name`	`str`	Name of the prediction column in the table.	required
`id_col_name`	`str`	Name of the id column used to match predictions with the corresponding data points.	required
`timestamp_col_name`	`Optional[str]`	Name of the timestamp column of the predictions.	`None`
`score_type`	`Optional[str]`	String name of score type (QoI) for prediction column.	`None`
`wait`	`bool`	Whether to wait for the TruEra service to complete creating the data split. Defaults to True.	`True`
`timeout_seconds`	`int`	Timeout used when `wait` is set to True. Defaults to 300.	`300`

Returns:

Type	Description
`Mapping[str, str]`	Returns a dictionary with `data_split_name`, `operation_id` and `status` of the operation.

`append_to_data_split(self, data_split_name, id_col_name, *, sample_count=5000, sample_kind='random', seed=None, label_col_name=None, prediction_col_name=None, wait=True, timeout_seconds=300, timestamp_col_name=None, model_name=None, score_type=None, **kwargs)` ¶

Ingest the Table into an existing split in TruEra to use in analytics. All columns of the data frame will be ingested into pre_data except for label / prediction columns if specified.

Parameters:

Name	Type	Description	Default
`data_split_name`	`str`	Name of the data split.	required
`id_col_name`	`str`	Name of the column to use for unique ID.	required
`sample_count`	`int`	Maximum rows to use when creating the split. Defaults to 5000.	`5000`
`sample_kind`	`str`	Specifies the strategy to use while sub-sampling the rows. Options are "random" and "first". It is not recommended to use "first" as it may result in a non-uniform sampling. Defaults to "random".	`'random'`
`label_col_name`	`str`	Name of the label/ground truth/target column in the table.	`None`
`prediction_col_name`	`str`	Name of the prediction column in the table.	`None`
`timestamp_col_name`	`str`	Name of the timestamp column (if using).	`None`
`seed`	`int`	Seed for reproducing the same ingestion - defaults to a random seed.	`None`
`wait`	`bool`	Whether to wait for the TruEra service to complete creating the data split. Defaults to True.	`True`
`timeout_seconds`	`int`	Timeout used when `wait` is set to True. Defaults to 300.	`300`

Returns:

Type	Description
`Mapping[str, str]`	Returns a dictionary with `data_split_name`, `operation_id` and `status` of the operation.

`filter(self, expression)` ¶

[Alpha] Filter a table by providing a boolean expression.

Parameters:

Name	Type	Description	Default
`expression`	`str`	The expression to filter the table. Simple SQL expressions are supported: `=` : filters for equality, ex: amount = 1000 `!=` : filters for inequality, ex: amount != 1000 `<` : filters for less-than, ex: amount < 1000 `<=` : filters for less-than-or-equal, ex: amount <= 1000 `>` : filters for greater-than, ex: amount > 1000 `>=` : filters for great-than-or-equal, ex: amount >= 1000 `NOT`: filters records if the inner condition is not true, ex: NOT(amount >= 1000) `AND`: filters records if both the conditions are true, ex: (amount >= 1000) AND (state = 'WA') `OR` : filters records if any of the two conditions is true, ex: (amount >= 1000) OR (state = 'WA') String literals should be within quotes (''), numeric literals should not have quotes (''). Left side of a binary expression should be a column name (without quotes), right side of an expression should be a literal. For example, (amount < salary) is not a valid expression, as both left and right side of the expression are column-names.	required

Exceptions:

Type	Description
`ValueError`	Raised if provided expression is `None` or empty.

Returns:

Type	Description
`Table`	Returns a table which points to the filtered rows.

`get_sample_rows(self, count=10, *, wait=True, timeout_seconds=300)` ¶

Get sampled rows from the table.

Parameters:

Name	Type	Description	Default
`count`	`int`	The number of rows to sample. Maximum allowed is 2000. Defaults to 100.	`10`
`wait`	`bool`	If set to true, the client will wait until timeout to get data from the service. This is useful when the data is still being fetched or filtered. Defaults to True.	`True`
`timeout_seconds`	`int`	Timeout used when `wait` is set to True. Defaults to 300.	`300`

Returns:

Type	Description
`pd.DataFrame`	Returns a pandas DataFrame containing the sampled rows.

`merge_dataframes_and_create_column_spec(id_col_name, timestamp_col_name=None, pre_data=None, post_data=None, predictions=None, labels=None, extra_data=None, feature_influences=None, feature_influence_suffix='_truera-qii_influence')` ¶

Helper function to merge multiple DataFrames into one and generate a ColumnSpec

Parameters:

Name	Type	Description	Default
`id_col_name`	`str`	Id column name.	required
`timestamp_col_name`	`Optional[str]`	Timestamp column name. Defaults to None.	`None`
`pre_data`	`Optional[pd.DataFrame]`	DataFrame corresponding to pre data. Defaults to None.	`None`
`post_data`	`Optional[pd.DataFrame]`	DataFrame corresponding to post data. Defaults to None.	`None`
`predictions`	`Optional[pd.DataFrame]`	DataFrame corresponding to predictions. Defaults to None.	`None`
`labels`	`Optional[pd.DataFrame]`	DataFrame corresponding to labels. Defaults to None.	`None`
`extra_data`	`Optional[pd.DataFrame]`	DataFrame corresponding to extra data. Defaults to None.	`None`
`feature_influences`	`Optional[pd.DataFrame]`	DataFrame correspondnig to feature influences. Defaults to None.	`None`
`feature_influence_suffix`	`Optional[str ]`	Suffix to append to column names of feature influences in order to prevent duplicate name issues. Defaults to FEATURE_INFLUENCE_SUFFIX_TRUERA_QII.	`'_truera-qii_influence'`

Returns:

Type	Description
`Tuple[pd.DataFrame, ColumnSpec]`	A tuple consisting of the merged DataFrame and corresponding ColumnSpec

`Segment` ¶

`apply(self, data)` ¶

Applies the filter associated with this segment to the provided data.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Data to apply segment on.	required

Returns:

Type	Description
`np.ndarray`	Boolean array of same length of `data`, indicating whether each row meets the filter requirements

`pretty_print(self)` ¶

Print out the filter associated with this segment.

`SegmentGroup` ¶

`get_segments(self)` ¶

Returns all segments present in the given segment group.

Returns:

Type	Description
`Mapping[str, Segment]`	Map from each segment name to its corresponding Segment object.

`pretty_print(self)` ¶

Print out this segmentation and its associated segments.

`SplineFitter` ¶

`init(self, xs, ys, qiis)` `special` ¶

Construct a spline fitter.

Parameters:

Name	Type	Description	Default
`xs`	`pd.DataFrame`	x-values.	required
`ys`	`Union[pd.DataFrame, np.ndarray]`	y-values.	required
`qiis`	`pd.DataFrame`	QII/influences/shapley-values.	required

`construct_gam(self, n_splines=10, spline_orders=3, constraints=None, outer_model=LogisticRegression())` ¶

Construct a GAM based off QII splines.

Parameters:

Name	Type	Description	Default
`n_splines`	`int`	Number of splines per feature. Defaults to 10.	`10`
`spline_orders`	`int`	Order of splines. Defaults to 3.	`3`
`constraints`	`Union[str, Mapping[str, str]]`	If a single str, then the constraints for all features. Otherwise a dict from feature to constraints. All constraints must be one of [None, "monotonic_inc", "monotonic_dec"]. Defaults to None.	`None`
`outer_model`	`Any`	Model to combine splines. This must work with sklearn.pipeline.Pipeline. Defaults to LogisticRegression().	`LogisticRegression()`

Returns:

Type	Description
`Pipeline`	GAM model.

`fit_spline(self, feature, n_splines=10, spline_order=1, constraints=None)` ¶

Compute a spline for a single feature.

Parameters:

Name	Type	Description	Default
`feature`	`str`	Feature to compute splines for.	required
`n_splines`	`int`	Number of splines per feature. Defaults to 10.	`10`
`spline_order`	`int`	Order of splines. Defaults to 3.	`1`
`constraints`	`str`	The constraints for the spline --- must be one of [None, "monotonic_inc", "monotonic_dec"]. Defaults to None.	`None`

Returns:

Type	Description
`Callable[[Union[pd.Series, np.ndarray]], np.ndarray]`	Spline.

`plot_isp(self, feature, figsize=(21, 6))` ¶

Plot the influence sensitivity plot (ISP) of a specific feature along with the associated spline.

Parameters:

Name	Type	Description	Default
`feature`	`str`	Feature to plot the ISP of.	required
`figsize`	`Tuple[int, int]`	Size for plot. Defaults to (21, 6).	`(21, 6)`

`plot_isps(self, features=None, figsize=(21, 6))` ¶

Plot the influence sensitivity plot (ISP) of a set of features along with the associated spline.

Parameters:

Name	Type	Description	Default
`features`	`Sequence[str]`	Features to plot the ISP of. Defaults to None, which is all features.	`None`
`figsize`	`Tuple[int, int]`	Size for plot. Defaults to (21, 6).	`(21, 6)`

`Tester (ABC)` ¶

`add_fairness_test(self, test_name, metric='DISPARATE_IMPACT_RATIO', *, data_split_names=None, data_split_name_regex=None, all_protected_segments=False, protected_segments=None, comparison_segments=None, data_collection_names=None, all_data_collections=False, warn_if_less_than=None, warn_if_greater_than=None, warn_if_within=None, warn_if_outside=None, fail_if_less_than=None, fail_if_greater_than=None, fail_if_within=None, fail_if_outside=None, description=None, overwrite=False)` ¶

Add a fairness test to the current data collection in context. To set warning condition, please provide one of [warn_if_less_than, warn_if_greater_than, warn_if_within, warn_if_outside]. Similarly, to set fail condition please provide one of [fail_if_less_than, fail_if_greater_than, fail_if_within, fail_if_outside].

Parameters:

Name	Type	Description	Default
`test_name`	`str`	The name of the test.	required
`metric`	`str`	Fairness metric for the test. Must be one of the options returned by `list_fairness_metrics`.	`'DISPARATE_IMPACT_RATIO'`
`data_split_names`	`Optional[Sequence[str]]`	List of the data splits that we want to use for the test.	`None`
`all_protected_segments`	`bool`	If set to `True`, the test will apply to all protected segments (including future ones). Defaults to `False` which means the test will only apply to the protected segments specified in `protected_segments`.	`False`
`protected_segments`	`Optional[Sequence[Tuple[str, str]]]`	List of `(segment_group_name, segment_name)` tuple to be used as protected segments.	`None`
`comparison_segments`	`Optional[Sequence[Tuple[str, str]]]`	List of `(segment_group_name, segment_name)` tuple to be used as comparison segments. Defaults to `None` which means the complement of the protected segment.	`None`
`data_collection_names`	`Optional[Sequence[str]]`	List of the data collections for the test. If not specified, the test will only apply to the current data collection in context.	`None`
`all_data_collections`	`bool`	If set to `True`, the test will apply to all data collections (including future ones). Defaults to `False` which means the test will only apply to the current data collection in context (if `data_collection_names` is not specified).	`False`
`warn_if_less_than`	`Optional[float]`	Warn if score is less than the value specified in this argument.	`None`
`warn_if_greater_than`	`Optional[float]`	Warn if score is greater than the value specified in this argument.	`None`
`warn_if_within`	`Optional[Tuple[float, float]]`	Warn if `value[0] < score < value[1]`.	`None`
`warn_if_outside`	`Optional[Tuple[float, float]]`	Warn if `score < value[0] OR score > value[1]`.	`None`
`fail_if_less_than`	`Optional[float]`	Fail if score is less than the value specified in this argument.	`None`
`fail_if_greater_than`	`Optional[float]`	Fail if score is greater than the value specified in this argument.	`None`
`fail_if_within`	`Optional[Tuple[float, float]]`	Fail if `value[0] < score < value[1]`.	`None`
`fail_if_outside`	`Optional[Tuple[float, float]]`	Fail if `score < value[0] OR score > value[1]`.	`None`
`description`	`Optional[str]`	Text description of the test.	`None`
`overwrite`	`bool`	If set to `True`, will overwrite the thresholds for existing test specified under the given data_split_name, segment, and metric. Defaults to `False`.	`False`

Examples:

# Explicitly specifying comparison segment
>>> tru.tester.add_fairness_test(
    test_name="Fairness Test",
    data_split_names=["split1_name", "split2_name"],
    protected_segments=[("segment_group_name", "protected_segment_name")],
    comparison_segments=[("segment_group_name", "comparison_segment_name")],
    comparison_segment_name=<comparison segment name>,
    metric="DISPARATE_IMPACT_RATIO",
    warn_if_outside=[0.8, 1.25],
    fail_if_outside=[0.5, 2]
)

# Not specifying comparison segment means the comparison segment is the complement of protected segment
# will be used as comparison
>>> tru.tester.add_fairness_test(
    test_name="Fairness Test",
    data_split_names=["split1_name", "split2_name"],
    protected_segments=[("segment_group_name", "protected_segment_name")],
    metric="DISPARATE_IMPACT_RATIO",
    warn_if_outside=[0.9, 1.15],
    fail_if_outside=[0.8, 1.25]
)

`add_feature_importance_test(self, test_name, *, data_split_names=None, data_split_name_regex=None, min_importance_value, background_split_name=None, score_type=None, segments=None, data_collection_names=None, warn_if_greater_than=None, fail_if_greater_than=None, description=None, overwrite=False)` ¶

Add a feature importance test to the current data collection in context. To set warning condition, please provide warn_if_greater_than. Similarly, to set fail condition please provide fail_if_greater_than.

Parameters:

Name	Type	Description	Default
`test_name`	`str`	The name of the test.	required
`data_split_names`	`Optional[Sequence[str]]`	List of the data splits that we want to use for the test.	`None`
`data_split_name_regex`	`Optional[str]`	Regex of the data split name that we want to use for the test (future data splits that match this naming pattern will automatically be included in the test).	`None`
`min_importance_value`	`float`	Minimum global importance value of a feature.	required
`background_split_name`	`Optional[str]`	The name of the data split to be used as background data for computing feature influences. If None, this value will be inferred from the project settings. Defaults to None.	`None`
`score_type`	`Optional[str]`	The score type to use when computing influences. If None, this value will be inferred from the project settings. Defaults to None. For a list of valid score types, see `list_valid_score_types`.	`None`
`segments`	`Optional[Tuple[str, str]]`	List of `(segment_group_name, segment_name)` tuple to be used as filter. A test will be defined on each of the given segment filters.	`None`
`warn_if_greater_than`	`Optional[float]`	Warn if more than this number of features have global importance values lower than `min_importance_value`.	`None`
`fail_if_greater_than`	`Optional[float]`	Fail if more than this number of features have global importance values lower than `min_importance_value`.	`None`
`description`	`Optional[str]`	Text description of the test.	`None`
`overwrite`	`bool`	If set to `True`, will overwrite the thresholds for existing test specified under the given `data_split_name` and segment. Defaults to `False`.	`False`

Examples:

>>> tru.tester.add_feature_importance_test(
    test_name="Feature Importance Test",
    data_split_names=["split1_name", "split2_name"],
    min_importance_value=0.01,
    background_split_name="background split name",
    score_type=<score_type>, # (e.g., "regression", or "logits"/"probits"
                            # for the classification project)
    warn_if_greater_than=5, # warn if number of features with global importance values lower than `min_importance_value` is > 5
    fail_if_greater_than=10
)

add_performance_test(self, test_name, metric, *, data_split_names=None, data_split_name_regex=None, data_collection_names=None, all_data_collections=False, segments=None, warn_if_less_than=None, warn_if_greater_than=None, warn_if_within=None, warn_if_outside=None, warn_threshold_type='ABSOLUTE', fail_if_less_than=None, fail_if_greater_than=None, fail_if_within=None, fail_if_outside=None, fail_threshold_type='ABSOLUTE', reference_split_name=None, reference_model_name=None, description=None, overwrite=False) ¶

Add a performance test group to the current data collection in context. To set warning condition, please provide one of [warn_if_less_than, warn_if_greater_than, warn_if_within, warn_if_outside]. Similarly, to set fail condition please provide one of [fail_if_less_than, fail_if_greater_than, fail_if_within, fail_if_outside].

Parameters:

Name	Type	Description	Default
`test_name`	`str`	The name of the test.	required
`metric`	`str`	Performance metric for the test. Must be one of the options returned by `list_performance_metrics`.	required
`data_split_names`	`Optional[Sequence[str]]`	List of the data splits that we want to use for the test.	`None`
`data_split_name_regex`	`Optional[str]`	Regex of the data split name that we want to use for the test (future data splits that match this naming pattern will automatically be included in the test).	`None`
`data_collection_names`	`Optional[Sequence[str]]`	List of the data collections for the test. If not specified, the test will only apply to the current data collection in context.	`None`
`all_data_collections`	`bool`	If set to `True`, the test will apply to all data collections (including future ones). Defaults to `False` which means the test will only apply to the current data collection in context (if `data_collection_names` is not specified).	`False`
`segments`	`Optional[Sequence[Tuple[str, str]]]`	List of `(segment_group_name, segment_name)` tuple to be used as filter. A test will be defined on each of the given segment filters.	`None`
`warn_if_less_than`	`Optional[float]`	Warn if score is less than the value specified in this argument.	`None`
`warn_if_greater_than`	`Optional[float]`	Warn if score is greater than the value specified in this argument.	`None`
`warn_if_within`	`Optional[Tuple[float, float]]`	Warn if `value[0] < score < value[1]`.	`None`
`warn_if_outside`	`Optional[Tuple[float, float]]`	Warn if `score < value[0] OR score > value[1]`.	`None`
`warn_threshold_type`	`str`	Must be one of ["ABSOLUTE", "RELATIVE"]. Describe whether the warning threshold is defined as absolute value or relative to split in `reference_split_name`. If it's relative, the effective threshold is `score_of_reference_split * (1 + value)`. Defaults to "ABSOLUTE".	`'ABSOLUTE'`
`fail_if_less_than`	`Optional[float]`	Fail if score is less than the value specified in this argument.	`None`
`fail_if_greater_than`	`Optional[float]`	Fail if score is greater than the value specified in this argument.	`None`
`fail_if_within`	`Optional[Tuple[float, float]]`	Fail if `value[0] < score < value[1]`.	`None`
`fail_if_outside`	`Optional[Tuple[float, float]]`	Fail if `score < value[0] OR score > value[1]`.	`None`
`fail_threshold_type`	`str`	Must be one of ["ABSOLUTE", "RELATIVE"]. Describe whether the fail threshold is defined as absolute value or relative to split in `reference_split_name`. If it's relative, the effective threshold is `score_of_reference_split * (1 + value)`. Defaults to "ABSOLUTE".	`'ABSOLUTE'`
`reference_split_name`	`Optional[str]`	Name of the reference split used for the "RELATIVE" threshold type. If not specified and `reference_model_name` is also not provided, the relative threshold will be calculated with respect to each models' train split (for models whose train split is not specified, then those will be treated as if no thresholds were specified).	`None`
`reference_model_name`	`Optional[str]`	Name of the reference model used for the "RELATIVE" threshold type.	`None`
`description`	`Optional[str]`	Text description of the test.	`None`
`overwrite`	`bool`	If set to `True`, will overwrite the thresholds for existing test specified under the given data_split_name, segment, and metric. Defaults to `False`.	`False`

Examples of adding performance test with absolute threshold:

# Performance test on multiple data splits with a single value threshold
>>> tru.tester.add_performance_test(
    test_name="Accuracy Test 1",
    data_split_names=["split1_name", "split2_name"],
    metric="CLASSIFICATION_ACCURACY",
    warn_if_less_than=0.85,
    fail_if_less_than=0.82
)

# Alternative, we can also specify data split name using regex
>>> tru.tester.add_performance_test(
    test_name="Accuracy Test 2",
    data_split_name_regex=".*-California", # this test will be run on all data splits where the name contains "-California"
    all_data_collections=True, # this test will be applicable to all data collections
    metric="CLASSIFICATION_ACCURACY",
    warn_if_less_than=0.85,
    fail_if_less_than=0.82
)

# Performance test using a segment with a single value threshold
>>> tru.tester.add_performance_test(
    test_name="Accuracy Test 3",
    data_split_names=["split1_name", "split2_name"],
    segment_group_name="segment_group_name",
    segment_name="segment_name",
    metric="FALSE_POSITIVE_RATE",
    warn_if_greater_than=0.05,
    fail_if_greater_than=0.1
)

# Performance test with a range threshold
>>> tru.tester.add_performance_test(
    test_name="Accuracy Test 4",
    data_split_names=["split1_name", "split2_name"],
    metric="FALSE_NEGATIVE_RATE",
    warn_if_outside=(0.05, 0.1),
    fail_if_outside=(0.02, 0.15)
)

Examples of adding performance test with relative threshold:

# Explicitly specifying the reference split of a RELATIVE threshold
>>> tru.tester.add_performance_test(
    test_name="Accuracy Test 5",
    data_split_names=["split1_name", "split2_name"],
    metric="CLASSIFICATION_ACCURACY",
    warn_if_less_than=-0.05, # warn if accuracy of split < (1 + -0.05) * accuracy of reference split
    warn_threshold_type="RELATIVE",
    fail_if_less_than=-0.08,
    fail_threshold_type="RELATIVE",
    reference_split_name="reference_split_name"
)

# Not explicitly specifying the reference split on a RELATIVE threshold means
# the reference split is each model's train split
>>> tru.tester.add_performance_test(
    test_name="Accuracy Test 6",
    data_split_names=["split1_name", "split2_name"],
    metric="FALSE_POSITIVE_RATE",
    warn_if_greater_than=0.02,
    warn_threshold_type="RELATIVE",
    fail_if_greater_than=0.021,
    fail_threshold_type="RELATIVE"
)

# RELATIVE test using reference model instead of reference split
>>> tru.tester.add_performance_test(
    test_name="Accuracy Test 7",
    data_split_names=["split1_name", "split2_name"],
    metric="CLASSIFICATION_ACCURACY",
    warn_if_less_than=0,
    warn_threshold_type="RELATIVE",
    fail_if_less_than=-0.01,
    fail_threshold_type="RELATIVE",
    reference_model_name="reference_model_name"

# RELATIVE test using both reference model and reference split
>>> tru.tester.add_performance_test(
    test_name="Accuracy Test 8",
    data_split_names=["split1_name", "split2_name"],
    metric="CLASSIFICATION_ACCURACY",
    warn_if_less_than=0,
    warn_threshold_type="RELATIVE",
    fail_if_less_than=-0.01,
    fail_threshold_type="RELATIVE",
    reference_model_name="reference_model_name",
    reference_split_name="reference_split_name"
)

`add_stability_test(self, test_name, metric='DIFFERENCE_OF_MEAN', *, comparison_data_split_names=None, comparison_data_split_name_regex=None, base_data_split_name=None, data_collection_names=None, all_data_collections=False, segments=None, warn_if_less_than=None, warn_if_greater_than=None, warn_if_within=None, warn_if_outside=None, fail_if_less_than=None, fail_if_greater_than=None, fail_if_within=None, fail_if_outside=None, description=None, overwrite=False)` ¶

Add a stability test to the current data collection in context. To set warning condition, please provide one of [warn_if_less_than, warn_if_greater_than, warn_if_within, warn_if_outside]. Similarly, to set fail condition please provide one of [fail_if_less_than, fail_if_greater_than, fail_if_within, fail_if_outside].

Parameters:

Name	Type	Description	Default
`test_name`	`str`	The name of the test.	required
`metric`	`str`	Stability metric for the test. Must be one ["WASSERSTEIN", "DIFFERENCE_OF_MEAN", "POPULATION_STABILITY_INDEX"]	`'DIFFERENCE_OF_MEAN'`
`comparison_data_split_names`	`Optional[Sequence[str]]`	List of the data splits that we want to use for the test.	`None`
`comparison_data_split_name_regex`	`Optional[str]`	Regex of the data split name that we want to use for the test (future data splits that match this naming pattern will automatically be included in the test).	`None`
`base_data_split_name`	`Optional[str]`	Name of the reference data split to use as the comparison baseline for the test. If `None`, will be the model's train split.	`None`
`data_collection_names`	`Optional[Sequence[str]]`	List of the data collections for the test. If not specified, the test will only apply to the current data collection in context.	`None`
`all_data_collections`	`bool`	If set to `True`, the test will apply to all data collections (including future ones). Defaults to `False` which means the test will only apply to the current data collection in context (if `data_collection_names` is not specified).	`False`
`segments`	`Optional[Sequence[Tuple[str, str]]]`	List of `(segment_group_name, segment_name)` tuple to be used as filter. A test will be defined on each of the given segment filters.	`None`
`warn_if_less_than`	`Optional[float]`	Warn if score is less than the value specified in this argument.	`None`
`warn_if_greater_than`	`Optional[float]`	Warn if score is greater than the value specified in this argument.	`None`
`warn_if_within`	`Optional[Tuple[float, float]]`	Warn if `value[0] < score < value[1]`.	`None`
`warn_if_outside`	`Optional[Tuple[float, float]]`	Warn if `score < value[0] OR score > value[1]`.	`None`
`fail_if_less_than`	`Optional[float]`	Fail if score is less than the value specified in this argument.	`None`
`fail_if_greater_than`	`Optional[float]`	Fail if score is greater than the value specified in this argument.	`None`
`fail_if_within`	`Optional[Tuple[float, float]]`	Fail if `value[0] < score < value[1]`.	`None`
`fail_if_outside`	`Optional[Tuple[float, float]]`	Fail if `score < value[0] OR score > value[1]`.	`None`
`description`	`Optional[str]`	Text description of the test.	`None`
`overwrite`	`bool`	If set to `True`, will overwrite the thresholds for existing test specified under the given comparison_data_split_name, segment, and metric. Defaults to `False`.	`False`

Examples:

>>> tru.tester.add_stability_test(
    test_name="Stability Test",
    comparison_data_split_names=["split1_name", "split2_name"],
    base_data_split_name="reference_split_name",
    metric="DIFFERENCE_OF_MEAN",
    warn_if_outside=[-1, 1],
    fail_if_outside=[-2, 2]
)

`delete_tests(self, test_name=None, test_type=None, data_split_name=None, segment_group_name=None, segment_name=None, metric=None)` ¶

Delete tests.

Parameters:

Name	Type	Description	Default
`test_name`	`Optional[str]`	Only delete tests with the given name.	`None`
`test_type`	`Optional[str]`	Only delete tests of this type. Must be one of ["performance", "stability", "fairness"] or None. If None, delete all test types. Defaults to None.	`None`
`data_split_name`	`Optional[str]`	Only delete tests associated with this data split. Defaults to None.	`None`
`segment_group_name`	`Optional[str]`	Only delete tests associated with this segment group. Defaults to None.	`None`
`segment_name`	`Optional[str]`	Only delete tests associated with this segment. Defaults to None.	`None`
`metric`	`Optional[str]`	Only delete tests associated with this metric. Defaults to None.	`None`

`get_model_leaderboard(self, sort_by='performance', wait=True)` ¶

Get the summary of test outcomes for all models in the data collection.

Parameters:

Name	Type	Description	Default
`sort_by`	`str`	Rank models according to the test type specified in this arg (models with the fewest test failures will be at the top). Must be one of ["performance", "stability", "fairness"]. Defaults to "performance".	`'performance'`
`wait`	`bool`	Whether to wait for test results to finish computing. Defaults to True.	`True`

Returns:

Type	Description
`ModelTestLeaderboard`	A `ModelTestLeaderboard` object containing the summary of test outcomes for all models in the data collection. On Jupyter notebooks, this object will be displayed as a nicely formatted HTML table. This object also has a `pretty_print` as well as `as_json` and `as_dict` representation.

`get_model_test_results(self, data_split_name=None, comparison_models=None, test_types=None, wait=True)` ¶

Get the test results for the model in context.

Parameters:

Name	Type	Description	Default
`data_split_name`	`Optional[str]`	If provided, filters to the tests associated with this split.	`None`
`comparison_models`	`Optional[Sequence[str]]`	If provided, compare the test results against this list of models.	`None`
`test_types`	`Optional[Sequence[str]]`	If provided, filter to only the given test-types. Must be a subset of ["performance", "stability", "fairness"] or None (which defaults to all). Defaults to None.	`None`
`wait`	`bool`	Whether to wait for test results to finish computing.	`True`

Returns:

Type	Description
`ModelTestResults`	A `ModelTestResults` object containing the test results for the model in context. On Jupyter notebooks, this object will be displayed as a nicely formatted HTML table. This object also has a `pretty_print` as well as `as_json` and `as_dict` representation.

`get_model_tests(self, data_split_name=None)` ¶

Get the details of all the model tests in the current data collection or the model tests associated with the given data split.

Parameters:

Name	Type	Description	Default
`data_split_name`	`Optional[str]`	If provided, filters to the tests associated with this split.	`None`

Returns:

Type	Description
`ModelTestDetails`	A `ModelTestDetails` object containing the details for each test that has been created. On Jupyter notebooks, this object will be displayed as a nicely formatted HTML table. This object also has a `pretty_print` as well as `as_json` and `as_dict` representation.

Python SDK Technical Reference¶

TrueraWorkspace (BaseTrueraWorkspace) ¶

__init__(self, connection_string, authentication, log_level=20, workspace_name='', **kwargs) special ¶

activate_client_setting(self, setting_name) inherited ¶

add_credential(self, name, secret, identity=None) ¶

add_data(self, data, *, data_split_name, column_spec, model_output_context=None, is_production_data=False, **kwargs) ¶

add_data_collection(self, data_collection_name, pre_to_post_feature_map=None, provide_transform_with_model=None) ¶

add_data_source(self, name, uri, credential=None, **kwargs) ¶

add_extra_data(self, extra_data, extras_col_names, id_col_name, **kwargs) ¶

add_feature_metadata(self, feature_description_map=None, group_to_feature_map=None, missing_values=None, force_update=False) ¶

add_labels(self, label_data, label_col_name, id_col_name, **kwargs) ¶

add_model(self, model_name, train_split_name=None, train_parameters=None) ¶

add_model_error_influences(self, error_influence_data, score_type=None, *, data_split_name=None, background_split_name=None, id_col_name=None, timestamp_col_name=None, influence_type=None) ¶

add_model_feature_influences(self, feature_influence_data, *, id_col_name, data_split_name=None, background_split_name=None, timestamp_col_name=None, influence_type=None, score_type=None) ¶

add_model_metadata(self, train_split_name=None, train_parameters=None, overwrite=False) ¶

add_model_predictions(self, prediction_data, id_col_name=None, *, prediction_col_name=None, data_split_name=None, ranking_group_id_column_name=None, ranking_item_id_column_name=None, score_type=None) ¶

add_nn_data_split(self, data_split_name, truera_wrappers, split_type='all', *, pre_data=None, label_data=None, label_col_name=None, id_col_name=None, extra_data_df=None) ¶

add_nn_model(self, model_name, truera_wrappers, attribution_config, model=None, train_split_name=None, train_parameters=None, **kwargs) ¶

add_packaged_python_model(self, model_name, model_dir, *, data_collection_name=None, train_split_name=None, train_parameters=None, verify_model=True, compute_predictions=None, compute_feature_influences=False, compute_for_all_splits=False) ¶

add_production_data(self, data, *, column_spec, model_output_context=None, **kwargs) ¶

add_project(self, project, score_type, input_type='tabular', num_default_influences=None) ¶

add_segment_group(self, name, segment_definitions) ¶

attach_packaged_python_model_object(self, model_object_dir, verify_model=True) ¶

attach_python_model_object(self, model_object, additional_pip_dependencies=None, verify_model=True) ¶

cancel_scheduled_ingestion(self, workflow_id) ¶

compute_all(self, computations=['predictions', 'feature influences', 'error influences'], models=None, data_splits=None, data_collection=None, num_influences=None) inherited ¶

compute_error_influences(self, start=0, stop=None, score_type=None, system_data=False, by_group=False, num_per_group=None, wait=True, ingest=True) ¶

compute_feature_influences(self, start=0, stop=None, score_type=None, system_data=False, by_group=False, num_per_group=None, wait=True, ingest=True) ¶

compute_predictions(self, start=0, stop=None, system_data=False, wait=True, ingest=True) ¶

create_packaged_python_model(self, output_dir, model_obj=None, additional_pip_dependencies=None, additional_modules=None, model_path=None, model_code_files=None, **kwargs) ¶

deactivate_client_setting(self, setting_name) inherited ¶

delete_credential(self, name) ¶

delete_data_collection(self, data_collection_name=None, *, recursive=False) ¶

delete_data_source(self, name) ¶

delete_data_split(self, data_split_name=None, *, recursive=False) ¶

delete_model(self, model_name=None, *, recursive=False) ¶

delete_model_metadata(self) ¶

delete_project(self, project_name=None) ¶

delete_segment_group(self, name) ¶

get_client_setting_value(self, setting_name) inherited ¶

get_credential_metadata(self, name) ¶

get_data_collections(self) ¶

get_data_source(self, name) ¶

get_data_sources(self) ¶

get_data_splits(self) ¶

get_default_performance_metrics(self) ¶

get_error_influences(self, start=0, stop=None, score_type=None, system_data=False, by_group=False, num_per_group=None) ¶

get_explainer(self, base_data_split=None, comparison_data_splits=None) ¶

get_feature_influences(self, start=0, stop=None, score_type=None, system_data=False, by_group=False, num_per_group=None) ¶

get_feature_names(self) inherited ¶

get_influence_type(self) ¶

get_influences_background_data_split(self, data_collection_name=None) ¶

get_ingestion_client(self) ¶

get_maximum_model_runner_failure_rate(self) ¶

get_model_metadata(self) ¶

get_model_threshold(self) ¶

get_models(self) ¶

get_nn_user_configs(self) ¶

get_num_default_influences(self) ¶

get_num_internal_qii_samples(self) ¶

get_predictions(self, start=0, stop=None, system_data=False, by_group=False, num_per_group=None) ¶

get_projects(self) ¶

get_ranking_k(self) ¶

get_scheduled_ingestion(self, workflow_id) ¶

get_segment_groups(self) ¶

get_xs(self, start=0, stop=None, extra_data=False, system_data=False, by_group=False, num_per_group=None) ¶

get_ys(self, start=0, stop=None, system_data=False, by_group=False, num_per_group=None) ¶

get_ys_pred(self, start=0, stop=None, system_data=False, by_group=False, num_per_group=None, wait=True) ¶

list_monitoring_tables(self) ¶

list_performance_metrics(self) ¶

list_scheduled_ingestions(self, last_key=None, limit=50) ¶

list_valid_score_types(self) inherited ¶

register_schema(self, schema) ¶

schedule_existing_data_split(self, split_name, cron_schedule, override_split_name=None, append=True) ¶

schedule_ingestion(self, raw_json, cron_schedule) ¶

serialize_split(self, split_name, override_split_name=None) ¶

set_as_protected_segment(self, segment_group_name, segment_name) ¶

set_data_collection(self, data_collection_name) ¶

set_data_split(self, data_split_name) ¶

set_default_performance_metrics(self, performance_metrics) ¶

`TrueraWorkspace (BaseTrueraWorkspace)` ¶

`init(self, connection_string, authentication, log_level=20, workspace_name='', **kwargs)` `special` ¶

`activate_client_setting(self, setting_name)` `inherited` ¶

`add_credential(self, name, secret, identity=None)` ¶

`add_data(self, data, *, data_split_name, column_spec, model_output_context=None, is_production_data=False, **kwargs)` ¶

`add_data_collection(self, data_collection_name, pre_to_post_feature_map=None, provide_transform_with_model=None)` ¶

`add_data_source(self, name, uri, credential=None, **kwargs)` ¶

`add_extra_data(self, extra_data, extras_col_names, id_col_name, **kwargs)` ¶

`add_feature_metadata(self, feature_description_map=None, group_to_feature_map=None, missing_values=None, force_update=False)` ¶

`add_labels(self, label_data, label_col_name, id_col_name, **kwargs)` ¶

`add_model(self, model_name, train_split_name=None, train_parameters=None)` ¶

`add_model_error_influences(self, error_influence_data, score_type=None, *, data_split_name=None, background_split_name=None, id_col_name=None, timestamp_col_name=None, influence_type=None)` ¶

`add_model_feature_influences(self, feature_influence_data, *, id_col_name, data_split_name=None, background_split_name=None, timestamp_col_name=None, influence_type=None, score_type=None)` ¶

`add_model_metadata(self, train_split_name=None, train_parameters=None, overwrite=False)` ¶

`add_model_predictions(self, prediction_data, id_col_name=None, *, prediction_col_name=None, data_split_name=None, ranking_group_id_column_name=None, ranking_item_id_column_name=None, score_type=None)` ¶

`add_nn_data_split(self, data_split_name, truera_wrappers, split_type='all', *, pre_data=None, label_data=None, label_col_name=None, id_col_name=None, extra_data_df=None)` ¶

`add_nn_model(self, model_name, truera_wrappers, attribution_config, model=None, train_split_name=None, train_parameters=None, **kwargs)` ¶

`add_packaged_python_model(self, model_name, model_dir, *, data_collection_name=None, train_split_name=None, train_parameters=None, verify_model=True, compute_predictions=None, compute_feature_influences=False, compute_for_all_splits=False)` ¶

`add_production_data(self, data, *, column_spec, model_output_context=None, **kwargs)` ¶

`add_project(self, project, score_type, input_type='tabular', num_default_influences=None)` ¶

`add_segment_group(self, name, segment_definitions)` ¶

`attach_packaged_python_model_object(self, model_object_dir, verify_model=True)` ¶

`attach_python_model_object(self, model_object, additional_pip_dependencies=None, verify_model=True)` ¶

`cancel_scheduled_ingestion(self, workflow_id)` ¶

`compute_all(self, computations=['predictions', 'feature influences', 'error influences'], models=None, data_splits=None, data_collection=None, num_influences=None)` `inherited` ¶

`compute_error_influences(self, start=0, stop=None, score_type=None, system_data=False, by_group=False, num_per_group=None, wait=True, ingest=True)` ¶

`compute_feature_influences(self, start=0, stop=None, score_type=None, system_data=False, by_group=False, num_per_group=None, wait=True, ingest=True)` ¶

`compute_predictions(self, start=0, stop=None, system_data=False, wait=True, ingest=True)` ¶

`create_packaged_python_model(self, output_dir, model_obj=None, additional_pip_dependencies=None, additional_modules=None, model_path=None, model_code_files=None, **kwargs)` ¶

`deactivate_client_setting(self, setting_name)` `inherited` ¶

`delete_credential(self, name)` ¶

`delete_data_collection(self, data_collection_name=None, *, recursive=False)` ¶

`delete_data_source(self, name)` ¶

`delete_data_split(self, data_split_name=None, *, recursive=False)` ¶

`delete_model(self, model_name=None, *, recursive=False)` ¶

`delete_model_metadata(self)` ¶

`delete_project(self, project_name=None)` ¶

`delete_segment_group(self, name)` ¶

`get_client_setting_value(self, setting_name)` `inherited` ¶

`get_credential_metadata(self, name)` ¶

`get_data_collections(self)` ¶

`get_data_source(self, name)` ¶

`get_data_sources(self)` ¶

`get_data_splits(self)` ¶

`get_default_performance_metrics(self)` ¶

`get_error_influences(self, start=0, stop=None, score_type=None, system_data=False, by_group=False, num_per_group=None)` ¶

`get_explainer(self, base_data_split=None, comparison_data_splits=None)` ¶

`get_feature_influences(self, start=0, stop=None, score_type=None, system_data=False, by_group=False, num_per_group=None)` ¶

`get_feature_names(self)` `inherited` ¶

`get_influence_type(self)` ¶

`get_influences_background_data_split(self, data_collection_name=None)` ¶

`get_ingestion_client(self)` ¶

`get_maximum_model_runner_failure_rate(self)` ¶

`get_model_metadata(self)` ¶

`get_model_threshold(self)` ¶

`get_models(self)` ¶

`get_nn_user_configs(self)` ¶

`get_num_default_influences(self)` ¶

`get_num_internal_qii_samples(self)` ¶

`get_predictions(self, start=0, stop=None, system_data=False, by_group=False, num_per_group=None)` ¶

`get_projects(self)` ¶

`get_ranking_k(self)` ¶

`get_scheduled_ingestion(self, workflow_id)` ¶

`get_segment_groups(self)` ¶

`get_xs(self, start=0, stop=None, extra_data=False, system_data=False, by_group=False, num_per_group=None)` ¶

`get_ys(self, start=0, stop=None, system_data=False, by_group=False, num_per_group=None)` ¶

`get_ys_pred(self, start=0, stop=None, system_data=False, by_group=False, num_per_group=None, wait=True)` ¶

`list_monitoring_tables(self)` ¶

`list_performance_metrics(self)` ¶

`list_scheduled_ingestions(self, last_key=None, limit=50)` ¶

`list_valid_score_types(self)` `inherited` ¶

`register_schema(self, schema)` ¶

`schedule_existing_data_split(self, split_name, cron_schedule, override_split_name=None, append=True)` ¶

`schedule_ingestion(self, raw_json, cron_schedule)` ¶

`serialize_split(self, split_name, override_split_name=None)` ¶

`set_as_protected_segment(self, segment_group_name, segment_name)` ¶

`set_data_collection(self, data_collection_name)` ¶

`set_data_split(self, data_split_name)` ¶

`set_default_performance_metrics(self, performance_metrics)` ¶

`set_influence_type(self, algorithm)` ¶