Quick Start Guide

Installation

H1st runs on Python 3.8 or above. Install via pip:

pip install --upgrade pip
pip3 install h1st

This is recommended to use a virtual environment to install H1st and manage its required dependencies. For the users who have to run h1st on OS-Python environment:

For Windows, please use 64bit version and install VS Build Tools before installing H1st. For Ubuntu, please ensure to have “python3-dev” and “python3-distutils” apt-packages installed before installing H1st For MacOS, please use the Official Python releases. This is NOT recommended to use the Homebrew to install the Python, especially Python 3.10, due to its incompatibility issues.

Model Repository

When using the H1st framework, for machine-learning models, storage, loading and versioning is handled by the framework, but you still need to designate where the models will be stored.

This is done in one of 2-ways: * Set the H1ST_MODEL_REPO_PATH environment variable. This can either point to a local storage repository or an S3 bucket. * Create a config.py file in the same directory as your model module, and define MODEL_REPO_PATH = ‘/path/to/model/repo/’

H1st Graph

An H1st Graph object encapsulates an execution flow chart that allows Models and Actions to be smoothly tied together. This structure enables both ML and Human-knowledge models to be incorporated into a seamless data processing and inference pipeline.

This is an example of a very simple graph which prints hello for each even number x in the input stream, using a conditional RuleBasedModel. The rule-based model extends the h1st Model class implying that it processes the incoming data to produce some prediction, transformation or analysis on that data. The HelloPrinter class extends the h1st Action class implying that it performs a simple action in response to some input.

In this case the model determines whether or not the input data is an even number. The Action HelloPrinter is only passed the even-number data where the model returned True.

The H1st graph itself is created by adding nodes incrementally.

from h1st.h1flow.h1flow import Graph
from h1st.h1flow.h1step import Decision, NoOp

g = Graph()
g.start()

# In the first Node, the data is passed to the RuleBasedModel's "process"
# method and a Decision is rendered from model output.
g.add(Decision(RuleBasedModel(),
                  result_field="predictions",
                  decision_field="prediction"))

# In the second Node, if the True or "yes" data is passed to HelloPrinter's
# "call" function, and the False or "no" data is passed to NoOp's "call"
# function. NoOp is default null Action and nothing happens.
g.add(yes=HelloPrinter(), no=NoOp())
g.end()

# Now that the graph is built, we run pass input data through that graph
results = g.predict({"values": range(6)})

Note that the first Node is an h1st Decision which redirects the data flow into the later yes and no nodes based on the RuleBasedModel’s predictions. Inference is done in batch-mode so first the RuleBasedModel predicts for all input values, then the “yes” or True values are passed to the HelloPrinter action while the “no” or False values go to the NoOp null Action.

In terms of data flow, the input to the Graph predict method is passed to the RuleBasedModel’s process function which in turn produces a dict. The Decision object forwards a dict containing the result_field “predictions” to the HelloPrinter’s call function if the Decision is True and does nothing (NoOp) if the decision is False. In this way, the Graph simplifies the construction of complex action and information relay flows.

Hello world 0!
Hello world 2!
Hello world 4!

MLModeler and MLModel

In h1st, we explicitly split the machine learning activities into two categories and assign them to MLModeler and MLModel. MLModeler is responsible for data loading, data exploration, data preparation and model training/building while MLModel generates predictions, persists and loads model parameters.

The easiest way to understand H1st Model is as a standardized format for writing information processing nodes. Furthermore, the root H1st Model class already handles most of the functionality needed to manage the full life-cycle of the model from persisting, loading, and version control. For machine-learning models or any model that requires the fitting of parameters or various processes for model creation, the H1st system highly recommends the creation of an accompanying H1st Modeler. This is because Model creation lies outside of the operational cycle. In this way, to implement an H1st Model all you really need is to implement the process function. A Modeler will implement activities such as model training/building/evaluation, data loading, data preparation, and data exploration.

The MLModel class adds on to the base Model by adding a predict method which aliases the process method to support traditional ML design flows. Additionally, the MLModel help clarify in complex systems which components are powered by machine-learning and which are not, since many types of `Model`s in an H1st AI system can act on data through rules, logic or analysis.

Below is an example of an H1st MLModel and MLModeler that utilize an underlying scikit-learn RandomForestClassifier model. Note that while MLModel’s have a predict function, this function simply is an alias for the Model.process function, so only process need be implemented.

Custom MLModeler and MLModel

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

from h1st.model.ml_model import MLModel
from h1st.model.ml_modeler import MLModeler

class MyMLModel(MLModel):
    def predict(self, data: dict) -> dict:
        """Run model inference on incoming data and return predictions in a
           dictionary.
           Input data should have key 'X' with data values
        """
        predictions = self.base_model.predict(data['X'])
        return {'predictions': predictions}


class MyMLModeler(h1.model.model.MLModeler):
    def __init__(self):
        super().__init__()
        self.model_class = MyMLModel

    def train_base_model(self, prepared_data):
        """trains and returns the base ML model that will be wrapped by the
           H1st MyMLModel
        """
        X, y = prepared_data['x_train'], prepared_data['y_train']
        model = RandomForestClassifier(random_state=0)
        model.fit(X, y)
        return model

    def load_data(self):
        """Implementing this function is optional, alternatively data can
           be passed directly to the build_model function. If implemented,
           the build_model function can be run without any input.
        """
        pass

    def  evaluate_model(self, data: dict, ml_model: MLModel) -> dict:
        """Optional, if implemented then metrics will be attached to the
           trained model created by the build_model method, and can be
           persisted along with the model
        """
        x_test = {'X': data['x_test']}
        y_test = data['y_test']
        y_pred = ml_model.predict(x_test)['predictions']
        accuracy = accuracy_score(y_test, y_pred)
        return {'accuracy_score': accuracy}

By calling MLModeler’s build_model method, you get an instance of MLModel and are able to run inference on new data and evaluate the model’s accuracy.

Model training and prediction

X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
prepared_data = {'x_train': x_train, 'y': y_train,
                 'x_test': x_test, 'y_test': y_test}
my_modeler = MyMLModeler()
my_model = my_modeler.build_model(prepared_data)
accuracy = my_model.metrics['accuracy_score'] * 100
print(f"Accuracy (test): {accuracy:.1f}%")

When you are satisfied with the model, you can persist its parameters for later usage such as model serving.

Model persistence and loading

# This saves the model in the the model repository, auto-generating the
# latest version number
version = model.persist()
# Alternatively, a specific verion can be specified with model.persist(<version>)

# This loads the latest version of the model from the model repository
my_model_2 = MyMLModel().load_params()
# Alternatively, a specific version can be loaded with:
# my_model_2 = MyMLModel.load_params(<version>)

# Now you can run your predictions
y_pred = my_model_2.predict({'X': X_test})['predictions']
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy (test): %0.1f%% " % (accuracy * 100))

# Additionally, any stats and metrics set during model creation were also
# persisted
accuracy = my_model_2.metrics['accuracy_score'] * 100
print(f"Stored Accuracy (test): {accuracy:.1f}%")

So each of your model classes only needs the process method to be implemented and should extend the appropriate H1st model. Each of your modelers, only needs the model_class attribute defined (either in the __init__ or as a class var) and needs the train_base_model method implemented. Optionally, you can implement load_data and evaluate_model methods for enhanced functionality, but everything else is taken care of by the framework.

For automation of model persistance and loading you only need to set the environment variable H1ST_MODEL_REPO_PATH or define MODEL_REPO_PATH in your config.py file (see Installation section). The framework uses this to automate either local storage or storage in an S3 bucket. Currently, the H1st framework supports easy persistance and loading of any Model to a model repository as long as the base_model is serializable.

Pretty simple, isn’t it. Enjoy building your machine learning models!!!

Rule-Based Model

A rule-based model is an example of applying human knowledge to solve a problem. You could use boolean logic or fuzzy logic and make decisions based on statistics or myriad other ways that humans do to solve problems.

A rule-based model is very useful for solving the cold start problem, where data is not available.

In the H1st framework, a human rule model can be implemented by sub-classing the h1st Model class or PredictiveModel class and implementing only the process() function. Basically, it’s a just a model with no training (though training is not forbidden and is sometimes is useful for human models too).

If for some reason your rule model does require training or parameter tuning, then you can implement a Modeler in the same way as the MLModeler except just extending the H1st Modeler class.

This particular simple model “predicts” if each given value in a stream is an even number or not.

from h1st.model.model import Model

class RuleBasedModel(Model):
    """
    Simple rule-based model that "predicts" if a given value is an even number
    """
    def predict(self, input_data: dict) -> dict:
        predictions = [x % 2 == 0 for x in input_data["values"]]
        return {"predictions": predictions}

m = RuleBasedModel()
xs = list(range(6))
results = m.predict({"values": xs})
predictions = results["predictions"]
print(f"RuleBasedModel's predictions for {xs} are {predictions}")

RuleBasedModel's predictions for [0, 1, 2, 3, 4, 5] are [True, False, True, False, True, False]

Oracle

The Oracle is a key component of the H1st framework, and automates the creation of K1st Predictive Nodes from just a Teacher model containing human knowledge, and unlabeled data. Below is an example of how to use the Oracle class to create a full predictive ensemble (rule-based + ML) for predicting setosa iris’s with the iris dataset.

from h1st.mode.predictive_model import PredictiveModel
from h1st.model.oracle import Oracle
from sklearn.datasets import load_iris

class IrisRuleModel(PredictiveModel):
    sepal_length_max: float = 6.0
    sepal_length_min: float = 4.0
    sepal_width_min: float = 3.0
    sepal_width_max: float = 4.6

    def process(self, data: data) -> dict:
        """define a process method to take the input data and output a
           'prediction'
        """
        df = data['X']
        return {
            'predictions': pd.Series(map(
                self.predict_setosa, df['sepal_length'], df['sepal_width']
            ))}


    def predict_setosa(self, sepal_length, sepal_width):
        """Just a helper function"""
        return 0 if (self.sepal_length_min <= sepal_length <= self.sepal_length_max) \
                & (self.sepal_width_min <= sepal_width <= self.sepal_width_max) \
            else 1

# Load data
df_X, y = load_iris(as_frame=True, return_X_y=True)
df_X.columns = ['sepal_length','sepal_width','petal_length','petal_width']

# Build the Oracle
oracle = Oracle(teacher=IrisRuleModel())
oracle.build(
    data={'X': df_X},
    features=['sepal_length','sepal_width']
)
# This seems simple, but behind the scenes this system has used your
# IrisRuleModel to generate data labels and train an ML model which
# generalizes the knowledge laid out in your Rule Model

# Now your trained Oracle can be used for inference
# Behind the scenes prediction is being done by both the Teacher (rule
# model) and Student (ML model) and both of these predictions are being
# taken into consideration for final oracle prediction output
oracle.predict({'X': df_X[['sepal_length','sepal_width']]})

# If you've setup a path for the model repo (see Installation), then
# you can persist this built oracle for use later
oracle.persist('iris_oracle_v1')


# Finally, you can load this created oracle for use in inference later:
oracle_2 = Oracle(teacher=IrisRuleModel()).load('iris_oracle_v1')
oracle_2.predict({'X': df_X[['sepal_length','sepal_width']]})

Enjoy developing your AI systems!