Organizing a TruEra Project¶
A TruEra project organizes you work into a collection of related data and models.
The general project structure looks like this (click and hold to enlarge):
Reflected in the diagram above and active throughout the TruEra ecosystem, the essential concepts supporting TruEra project organization include:
Project – a collection of machine learning experiments intended to solve a defined business problem characterized by discrete requirements and specific KPIs. Each project must contain at least one data collection and model.
Model – a trained machine learning classification or regression model, packaged to enable calculation of various model outcomes or results. Models receive inputs in a specified form known as the input data schema and then output a calculated prediction.
Data Collection – an organized inventory of data used within a particular project. A data collection consists of values corresponding to features, labels, and extra metadata arranged according to a common schema. The data itself can be provided to TruEra (ingested) in a number of ways — from flat files (.csv), pandas.DataFrame objects, and even via Data Lake connectors like Amazon S3 buckets and Windows Azure Storage Blobs (wasb), among others.
Data Splits - A data split is a specific subset of the project's data collection sharing the following characteristics:
- Input data [ready to be] fed into the model (i.e., feature values).
- Ground truth labels corresponding to the input data, if available.
- Extra data used for deeper analysis such as creating segments.
Splits are denoted by their type or purpose. Valid types include:
all– default split type used to indicate "standard" data fed into models
train– training data used for models
test– test data used for the models during training
validate– validation data used to evaluate models
oot– out-of-time (OOT) or out-of-sample (OOS) data
custom– user-customized split type.
Features are continually expanded throughout the course of a project so that it is not uncommon to build upon many different splits and models.
See Important Concepts for more on TruEra fundamentals.
Click Next below to continue.