The following code details the functions and classes that are used extensively in this library. The colab utils are useful when working with a Google Colab notebook. They allow to authenticate and download a dataset from your Drive easily. The rest of the functions are provided in case they can be useful to someone.

Colab utils


Ask the user to connect to his google account to access google drive and google storage.


google drive object

decavision.utils.colab_utils.download_dataset(file_id, save_path, drive)

Download compressed dataset (zip or 7z format) from google drive and extract it.

  • file_id (str) – id of the file to download

  • save_path (str) – location where data is extracted

  • drive (google drive object) – return of authenticate_colab function

Training utils

class decavision.utils.training_utils.CheckpointDownloader(checkpoint_path)

Class to download current state to Google Drive after each iteration of hyperparameter optimization. To be used as callback for scikit-optimize routine. Files are saved in a folder called Checkpoints.


checkpoint_path (str) – location where the checkpoint files are;


If working on colab, upload checkpoint file to google drive.


res (scipy object) – The optimization as a OptimizeResult object.

decavision.utils.training_utils.custom_loss(old_logits, new_logits, old_classes, L=5, temp=5)

Distilling loss used when updating a model with new classes. It forces the model to remember what it learned about the old classes. Increasing the parameter L forces the model to remember more. High temperature puts more importance on the dominant classes and low temperature focuses on everything.

  • old_logits (keras tensor) – classification layer of the old model, without the activation

  • new_logits (keras tensor) – classification layer of the new model, without the activation

  • old_classes (int) – number of classes in the old model

  • L (float) – parameter that controls how much information to remember

  • temp (float) – parameter that controls how many classes are important


loss function to be used during training

Return type

keras loss

decavision.utils.training_utils.f1_score(y_true, y_pred)

Computer F1-score metric

  • y_true (tensor) – True labels

  • y_pred (tensor) – Predicted labels

Data utils

decavision.utils.data_utils.check_RGB(path, target_size=None)

Convert all images in a folder into RGB format and resize them if desired. Images that can’t be opened are deleted. Folder must contain a subfolder for each category.

  • path (str) – path to the image directory

  • target_size (tuple(int,int)) – if specified, images are resized to this size


Check if directory exists and create it if it does not.


path (str) – path to directory to create

decavision.utils.data_utils.download_dataset(download_dir='data/', url='http://data.csail.mit.edu/places/places365/places365standard_easyformat.tar')

Download a dataset in format .zip, .tar, .tar.gz or .tgz and extract the data. Inspired by: https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/download.py

  • download_dir (str) – folder to store the data

  • url (str) – location of the dataset on the internet

decavision.utils.data_utils.prepare_image(image_path, target_size, rescaling=255)

Load and convert image to numpy array to feed it to a neural network. Image is resized, converted to RGB and its pixels are normalized if required. An extra dimension is added to the array.

  • image_path (str) – path to image to be converted

  • target_size (tuple(int,int)) – desired size for the image

  • rescaling (int) – divide all the pixels of the image by this number


processed image, with shape (1,target_size,3)

Return type

numpy array

decavision.utils.data_utils.print_download_progress(count, block_size, total_size)

Function used for printing the download progress. Inspired by: https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/download.py

decavision.utils.data_utils.split_train(path='data/image_dataset', split=0.1, multilabel=False, with_test=False)

Separate images randomly into a training, a validation and potentially a test dataset. Images must be located in a folder called train, which contains a subfolder per category. Val and potentially test folders will be created amd images moved into it from train.

  • path (str) – path to the image_dataset directory

  • split (float) – fraction of each category that we move to the validation (val) subdirectory

  • multilabel (bool) – if we are in a multilabel setting, as such not labels in the image names

  • with_test (bool) – determine if one image of each category is moved to test dataset



Check if machine is running on TPU, GPU or CPU.


whether or not the machine runs on a TPU bool: whether or not the machine runs on a GPU

Return type



Check if the OS is windows or anything else to return the right separator.


‘' for windows and ‘/’ for others.

Return type



Delete all files in a given folder. First try to delete locally and if it fails try to delete in google cloud storage. If folder is in GCS, the link must include the gs:// part and the folder will be deleted as well.


folder (str) – path of folder where files are located


Create bucket object to be used to access files in google could storage bucket.


folder (str) – name of the google storage folder, must be of the form gs://bucketname/prefix


google storage object string: prefix of the bucket, to be used to access files

Return type

bucket object


Check if path is to a google cloud storage directory or a local one. Determined from the presence of ‘gs’ at the beginning of the path.


path (str) – path to assess


True if path is on gcs and False if local

Return type


decavision.utils.utils.load_classes(path, name='classes')

Open csv and create list from its content. If csv is on google cloud storage, the file is downloaded into working directory and then opened.

  • path (str) – location of the csv file

  • name (str) – name of csv file to open, without extension


first line of the csv file

Return type


decavision.utils.utils.load_model_clear(path, include_top=True)

Clear tensorflow session and load keras .h5 model.

  • path (str) – location of the model, including its name

  • include_top (bool) – whether or not to include the last layer of the model


loaded model

Return type

tf.keras model

decavision.utils.utils.upload_file_gcs(gcp_path, file_path)

Copy local file to a folder in google cloud storage and call it classes.csv.

  • gcp_path (str) – full path to gcp folder

  • file_path (str) – location of path to upload