Data Version Control (DVC)

Managing versioned machine learning datasets in DVC, and easily share ML projects with colleagues

(Tue Jul 23 2019 00:00:00 GMT+0300 (Eastern European Summer Time)) DVC is a powerful set of tools for managing data files associated with data science or machine learning projects. It works hand-in-hand with a Git repository to track both the code and the datasets in an ML project. A core feature is for versioning datasets, meaning that it correlates the dataset to exactly match what existed at each Git commit. By using a DVC "remote cache" it is very easy to share a project with colleagues, or to copy the dataset to a remote machine.

Introduction to using DVC to manage machine learning project datasets

(Sun Jul 07 2019 00:00:00 GMT+0300 (Eastern European Summer Time)) DVC is a powerful set of tools for managing data files associated with data science or machine learning projects. The code for such a project is committed to a Git repository, and DVC manages the data files in parallel to that repository.