STRIPPING

Pipelines to speed-up your experimentation

With stripping, you can easily separate your code into steps and execute them in order. During execution, each step is cached to let you iterate faster. All of that with code that doesn't get in your way and requires minimal setup.
With stripping, you can easily separate your code into steps and execute them in order. During execution, each step is cached to let you iterate faster.
from stripping import setup_stripping
st, c = setup_stripping('.stripping')

@st.step()
def load_dataset():
    c.bf_file = join("datasets", "dataset.csv")
    c.bf = pd.read_csv(c.bf_file)
In the code above, every individual task needed to train your model is separate in steps. Each step has a global Context object called c which makes available all variables that you'd like to share across steps.
You'll notice that in the step named load_dataset, the Context object c stores the two variables ds_file and ds, which are later used inside the step named split_dataset.In turn, split_datasetstores the attributes X and y to the context object, which is then used inside the following step, named train_model.
The creation of a shared context and separation of tasks between steps more easily affords the caching of the context for all steps and avoids executing all of the data manipulation steps every time that you execute the training script. Stripping is smart enough to detect changes in any step and executing it again.
@st.step(skip_cache=True)
def train_model():
    c.regressor = RandomForestRegressor(n_estimators=1000,
                                        random_state=0)
    c.regressor.fit(c.X, c.y)

st.execute()
You can also prevent caching of any given step, and stripping will execute the step during every execution even if the code have not changed from one execution to the next.
from stripping import setup_stripping
st, c = setup_stripping('.stripping')

@st.chain()
def load_dataset():
    ds_file = join("datasets", "dataset.csv")
    ds = pd.read_csv(c.bf_file)

    return ds_file, ds

@st.chain()
def split_dataset(ds_file, ds):
    X = ds.iloc[:, 0:6].values
    y = ds.iloc[:, 9].values

    return X, y

@st.chain()
def train_model(X, y):
    regressor = RandomForestRegressor(n_estimators=1000,
                                        random_state=0)
    regressor.fit(X, y)

st.execute()
It is also possible to feed the return from one step into the next and skip the context if needed by using the chain feature.
There is a lot more you can do with Stripping. Visit ourgithubpage, download, and give it a try!