Utils

The following code details the functions and classes that are used extensively in this library. The colab utils are useful when working with a Google Colab notebook. They allow to authenticate and download a dataset from your Drive easily. The rest of the functions are provided in case they can be useful to someone.

Colab utils

decavision.utils.colab_utils.authenticate_colab()

Ask the user to connect to his google account to access google drive and google storage.

Returns

google drive object

decavision.utils.colab_utils.download_dataset(file_id, save_path, drive)

Download compressed dataset (zip or 7z format) from google drive and extract it.

Parameters
  • file_id (str) – id of the file to download

  • save_path (str) – location where data is extracted

  • drive (google drive object) – return of authenticate_colab function

Training utils

class decavision.utils.training_utils.CheckpointDownloader(checkpoint_path)

Class to download current state to Google Drive after each iteration of hyperparameter optimization. To be used as callback for scikit-optimize routine. Files are saved in a folder called Checkpoints.

Parameters

checkpoint_path (str) – location where the checkpoint files are;

__call__(res)

If working on colab, upload checkpoint file to google drive.

Parameters

res (scipy object) – The optimization as a OptimizeResult object.

decavision.utils.training_utils.custom_loss(old_logits, new_logits, old_classes, L=5, temp=5)

Distilling loss used when updating a model with new classes. It forces the model to remember what it learned about the old classes. Increasing the parameter L forces the model to remember more. High temperature puts more importance on the dominant classes and low temperature focuses on everything.

Parameters
  • old_logits (keras tensor) – classification layer of the old model, without the activation

  • new_logits (keras tensor) – classification layer of the new model, without the activation

  • old_classes (int) – number of classes in the old model

  • L (float) – parameter that controls how much information to remember

  • temp (float) – parameter that controls how many classes are important

Returns

loss function to be used during training

Return type

keras loss

decavision.utils.training_utils.f1_score(y_true, y_pred)

Computer F1-score metric

Parameters
  • y_true (tensor) – True labels

  • y_pred (tensor) – Predicted labels

Data utils

decavision.utils.data_utils.check_RGB(path, target_size=None)

Convert all images in a folder into RGB format and resize them if desired. Images that can’t be opened are deleted. Folder must contain a subfolder for each category.

Parameters
  • path (str) – path to the image directory

  • target_size (tuple(int,int)) – if specified, images are resized to this size

decavision.utils.data_utils.create_dir(path)

Check if directory exists and create it if it does not.

Parameters

path (str) – path to directory to create

decavision.utils.data_utils.download_dataset(download_dir='data/', url='http://data.csail.mit.edu/places/places365/places365standard_easyformat.tar')

Download a dataset in format .zip, .tar, .tar.gz or .tgz and extract the data. Inspired by: https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/download.py

Parameters
  • download_dir (str) – folder to store the data

  • url (str) – location of the dataset on the internet

decavision.utils.data_utils.prepare_image(image_path, target_size, rescaling=255)

Load and convert image to numpy array to feed it to a neural network. Image is resized, converted to RGB and its pixels are normalized if required. An extra dimension is added to the array.

Parameters
  • image_path (str) – path to image to be converted

  • target_size (tuple(int,int)) – desired size for the image

  • rescaling (int) – divide all the pixels of the image by this number

Returns

processed image, with shape (1,target_size,3)

Return type

numpy array

decavision.utils.data_utils.print_download_progress(count, block_size, total_size)

Function used for printing the download progress. Inspired by: https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/download.py

decavision.utils.data_utils.split_train(path='data/image_dataset', split=0.1, multilabel=False, with_test=False)

Separate images randomly into a training, a validation and potentially a test dataset. Images must be located in a folder called train, which contains a subfolder per category. Val and potentially test folders will be created amd images moved into it from train.

Parameters
  • path (str) – path to the image_dataset directory

  • split (float) – fraction of each category that we move to the validation (val) subdirectory

  • multilabel (bool) – if we are in a multilabel setting, as such not labels in the image names

  • with_test (bool) – determine if one image of each category is moved to test dataset

Utils

decavision.utils.utils.check_PU()

Check if machine is running on TPU, GPU or CPU.

Returns

whether or not the machine runs on a TPU bool: whether or not the machine runs on a GPU

Return type

bool

decavision.utils.utils.check_sep()

Check if the OS is windows or anything else to return the right separator.

Returns

‘' for windows and ‘/’ for others.

Return type

str

decavision.utils.utils.empty_folder(folder)

Delete all files in a given folder. First try to delete locally and if it fails try to delete in google cloud storage. If folder is in GCS, the link must include the gs:// part and the folder will be deleted as well.

Parameters

folder (str) – path of folder where files are located

decavision.utils.utils.gcs_bucket(folder)

Create bucket object to be used to access files in google could storage bucket.

Parameters

folder (str) – name of the google storage folder, must be of the form gs://bucketname/prefix

Returns

google storage object string: prefix of the bucket, to be used to access files

Return type

bucket object

decavision.utils.utils.is_gcs(path)

Check if path is to a google cloud storage directory or a local one. Determined from the presence of ‘gs’ at the beginning of the path.

Parameters

path (str) – path to assess

Returns

True if path is on gcs and False if local

Return type

bool

decavision.utils.utils.load_classes(path, name='classes')

Open csv and create list from its content. If csv is on google cloud storage, the file is downloaded into working directory and then opened.

Parameters
  • path (str) – location of the csv file

  • name (str) – name of csv file to open, without extension

Returns

first line of the csv file

Return type

list

decavision.utils.utils.load_model_clear(path, include_top=True)

Clear tensorflow session and load keras .h5 model.

Parameters
  • path (str) – location of the model, including its name

  • include_top (bool) – whether or not to include the last layer of the model

Returns

loaded model

Return type

tf.keras model

decavision.utils.utils.upload_file_gcs(gcp_path, file_path)

Copy local file to a folder in google cloud storage and call it classes.csv.

Parameters
  • gcp_path (str) – full path to gcp folder

  • file_path (str) – location of path to upload