Open Source CommunityOur Open Source community is focused in developing tools that make the Machine Learning work faster, easier, and reproducible. Join our effort in democratizing AI and creating tools that empower engineers to disrupt industries. Check us out on Github!
AurumGithub
Automatic Experiment Tracking and VersioningPart of your job as Data Scientist or Machine Learning engineer is to dilligently keep track of all the experiments you run. The problem is that doing that is not always easy with the tools and processes we tend to use.Aurum will keep track of your experiments without requiring a change in your development workflow. After each execution of a new experiment, Aurum automatically collects all the parameters, code, metrics, and metadata generated in the process and saves this experiment inside git.With Aurum you can also execute multiple instances of the same code with different parameters in parallel, and still record the performance of every experiment separatelly in git.By executing a complete experiment with registered aurum parameters such as the ones found in the above example, you'll execute the experiment with the default values or with the ones provided during the call. The example bellow shows how the parameters would be passed to your script during the call:That not only makes for easy experimentation, it also makes for easier hyperparameter tunning while at the same time keeping track of your experiments. That is exactly why you should go to the projects Github page right now, download and give it a try.
Read more

StrippingGithub
Pipelines to speed-up your experimentationWith stripping, you can easily separate your code into steps and execute them in order. During execution, each step is cached to let you iterate faster. All of that with code that doesn't get in your way and requires minimal setup.With stripping, you can easily separate your code into steps and execute them in order. During execution, each step is cached to let you iterate faster.In the code above, every individual task needed to train your model is separate in steps. Each step has a global Context object called c which makes available all variables that you'd like to share across steps.You'll notice that in the step named load_dataset, the Context object c stores the two variables ds_file and ds, which are later used inside the step named split_dataset.In turn, split_dataset stores the attributes X and y to the context object, which is then used inside the following step, named train_model.The creation of a shared context and separation of tasks between steps more easily affords the caching of the context for all steps and avoids executing all of the data manipulation steps every time that you execute the training script. Stripping is smart enough to detect changes in any step and executing it again.You can also prevent caching of any given step, and stripping will execute the step during every execution even if the code have not changed from one execution to the next.It is also possible to feed the return from one step into the next and skip the context if needed by using the chain feature.There is a lot more you can do with Stripping. Visit our github page, download, and give it a try!
Read more