Skip to content

Organizing a TruEra Project

A TruEra project organizes you work into a collection of related data and models.

The general project structure looks like this (click and hold to enlarge):

project structure

Reflected in the diagram above and active throughout the TruEra ecosystem, the essential concepts supporting TruEra project organization include:

Project – a collection of machine learning experiments intended to solve a defined business problem characterized by discrete requirements and specific KPIs. Each project must contain at least one data collection and model.

Model – a trained machine learning classification or regression model, packaged to enable calculation of various model outcomes or results. Models receive inputs in a specified form known as the input data schema and then output a calculated prediction.

Data Collection – an organized inventory of data used within a particular project. A data collection consists of values corresponding to features, labels, and extra metadata arranged according to a common schema. The data itself can be provided to TruEra (ingested) in a number of ways — from flat files (.csv), pandas.DataFrame objects, and even via Data Lake connectors like Amazon S3 buckets and Windows Azure Storage Blobs (wasb), among others.

Data Splits - A data split is a specific subset of the project's data collection sharing the following characteristics:

  • Input data [ready to be] fed into the model (i.e., feature values).
  • Ground truth labels corresponding to the input data, if available.
  • Extra data used for deeper analysis such as creating segments.

Splits are denoted by their type or purpose. Valid types include:

  • all – default split type used to indicate "standard" data fed into models
  • train – training data used for models
  • test – test data used for the models during training
  • validate – validation data used to evaluate models
  • oot – out-of-time (OOT) or out-of-sample (OOS) data
  • custom – user-customized split type.

Features are continually expanded throughout the course of a project so that it is not uncommon to build upon many different splits and models.

See Important Concepts for more on TruEra fundamentals.

Click Next below to continue.