Artificial Intelligence

Managing versioned machine learning datasets in DVC, and easily share ML projects with colleagues

(July 23, 2019) DVC is a powerful set of tools for managing data files associated with data science or machine learning projects. It works hand-in-hand with a Git repository to track both the code and the datasets in an ML project. A core feature is for versioning datasets, meaning that it correlates the dataset to exactly match what existed at each Git commit. By using a DVC "remote cache" it is very easy to share a project with colleagues, or to copy the dataset to a remote machine.

Introduction to using DVC to manage machine learning project datasets

(July 7, 2019) DVC is a powerful set of tools for managing data files associated with data science or machine learning projects. The code for such a project is committed to a Git repository, and DVC manages the data files in parallel to that repository.