Skip to content

Organizing a TruEra Project

A TruEra project organizes you work into a collection of related data and models.

The general project structure looks like this (click and hold to enlarge):

project structure

Reflected in the diagram above and active throughout the TruEra ecosystem, the essential concepts supporting TruEra project organization include:

Project – a collection of machine learning experiments intended to solve a defined business problem characterized by discrete requirements and specific

Key Performance Indicators

Achievable targets that help measure progress against strategic objectives.
. Each project must contain at least one data collection and model.

Model – a trained machine learning classification or regression model, packaged to enable calculation of various model outcomes or results. Models receive inputs in a specified form known as the input data schema and then output a calculated prediction.

Data Collection – an organized inventory of data used within a particular project. A data collection consists of values corresponding to features, labels, and extra metadata arranged according to a common schema. The data itself can be provided to TruEra (ingested) in a number of ways — from flat files (.csv), pandas.DataFrame objects, and even via
Data Lake
Centralized repository that stores structured and unstructured data at any scale
connectors like Amazon S3 buckets and Windows Azure Storage Blobs (WASB), among others.

Data Splits - A data split is a specific subset of the project's data collection sharing the following characteristics:

  • Input data [ready to be] fed into the model (i.e., feature values)
  • Ground truth
    Information known to be real or true, provided by direct observation and measurement (i.e.. empirical evidence), as opposed to information provided by inference.
    labels corresponding to the input data, if available
  • Extra data used for deeper analysis such as creating segments.

Features are continually expanded throughout the course of a project so that it is not uncommon to build upon many different splits and models.

See Important Concepts for more on TruEra fundamentals.

Click Next below to continue.