Utils
The following code details the functions and classes that are used extensively in this library. The colab utils are useful when working with a Google Colab notebook. They allow to authenticate and download a dataset from your Drive easily. The rest of the functions are provided in case they can be useful to someone.
Colab utils
- decavision.utils.colab_utils.authenticate_colab()
Ask the user to connect to his google account to access google drive and google storage.
- Returns
google drive object
- decavision.utils.colab_utils.download_dataset(file_id, save_path, drive)
Download compressed dataset (zip or 7z format) from google drive and extract it.
- Parameters
file_id (str) – id of the file to download
save_path (str) – location where data is extracted
drive (google drive object) – return of authenticate_colab function
Training utils
- class decavision.utils.training_utils.CheckpointDownloader(checkpoint_path)
Class to download current state to Google Drive after each iteration of hyperparameter optimization. To be used as callback for scikit-optimize routine. Files are saved in a folder called Checkpoints.
- Parameters
checkpoint_path (str) – location where the checkpoint files are;
- __call__(res)
If working on colab, upload checkpoint file to google drive.
- Parameters
res (scipy object) – The optimization as a OptimizeResult object.
- decavision.utils.training_utils.custom_loss(old_logits, new_logits, old_classes, L=5, temp=5)
Distilling loss used when updating a model with new classes. It forces the model to remember what it learned about the old classes. Increasing the parameter L forces the model to remember more. High temperature puts more importance on the dominant classes and low temperature focuses on everything.
- Parameters
old_logits (keras tensor) – classification layer of the old model, without the activation
new_logits (keras tensor) – classification layer of the new model, without the activation
old_classes (int) – number of classes in the old model
L (float) – parameter that controls how much information to remember
temp (float) – parameter that controls how many classes are important
- Returns
loss function to be used during training
- Return type
keras loss
- decavision.utils.training_utils.f1_score(y_true, y_pred)
Computer F1-score metric
- Parameters
y_true (tensor) – True labels
y_pred (tensor) – Predicted labels
Data utils
- decavision.utils.data_utils.check_RGB(path, target_size=None)
Convert all images in a folder into RGB format and resize them if desired. Images that can’t be opened are deleted. Folder must contain a subfolder for each category.
- Parameters
path (str) – path to the image directory
target_size (tuple(int,int)) – if specified, images are resized to this size
- decavision.utils.data_utils.create_dir(path)
Check if directory exists and create it if it does not.
- Parameters
path (str) – path to directory to create
- decavision.utils.data_utils.download_dataset(download_dir='data/', url='http://data.csail.mit.edu/places/places365/places365standard_easyformat.tar')
Download a dataset in format .zip, .tar, .tar.gz or .tgz and extract the data. Inspired by: https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/download.py
- Parameters
download_dir (str) – folder to store the data
url (str) – location of the dataset on the internet
- decavision.utils.data_utils.prepare_image(image_path, target_size, rescaling=255)
Load and convert image to numpy array to feed it to a neural network. Image is resized, converted to RGB and its pixels are normalized if required. An extra dimension is added to the array.
- Parameters
image_path (str) – path to image to be converted
target_size (tuple(int,int)) – desired size for the image
rescaling (int) – divide all the pixels of the image by this number
- Returns
processed image, with shape (1,target_size,3)
- Return type
numpy array
- decavision.utils.data_utils.print_download_progress(count, block_size, total_size)
Function used for printing the download progress. Inspired by: https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/download.py
- decavision.utils.data_utils.split_train(path='data/image_dataset', split=0.1, multilabel=False, with_test=False)
Separate images randomly into a training, a validation and potentially a test dataset. Images must be located in a folder called train, which contains a subfolder per category. Val and potentially test folders will be created amd images moved into it from train.
- Parameters
path (str) – path to the image_dataset directory
split (float) – fraction of each category that we move to the validation (val) subdirectory
multilabel (bool) – if we are in a multilabel setting, as such not labels in the image names
with_test (bool) – determine if one image of each category is moved to test dataset
Utils
- decavision.utils.utils.check_PU()
Check if machine is running on TPU, GPU or CPU.
- Returns
whether or not the machine runs on a TPU bool: whether or not the machine runs on a GPU
- Return type
bool
- decavision.utils.utils.check_sep()
Check if the OS is windows or anything else to return the right separator.
- Returns
‘' for windows and ‘/’ for others.
- Return type
str
- decavision.utils.utils.empty_folder(folder)
Delete all files in a given folder. First try to delete locally and if it fails try to delete in google cloud storage. If folder is in GCS, the link must include the gs:// part and the folder will be deleted as well.
- Parameters
folder (str) – path of folder where files are located
- decavision.utils.utils.gcs_bucket(folder)
Create bucket object to be used to access files in google could storage bucket.
- Parameters
folder (str) – name of the google storage folder, must be of the form gs://bucketname/prefix
- Returns
google storage object string: prefix of the bucket, to be used to access files
- Return type
bucket object
- decavision.utils.utils.is_gcs(path)
Check if path is to a google cloud storage directory or a local one. Determined from the presence of ‘gs’ at the beginning of the path.
- Parameters
path (str) – path to assess
- Returns
True if path is on gcs and False if local
- Return type
bool
- decavision.utils.utils.load_classes(path, name='classes')
Open csv and create list from its content. If csv is on google cloud storage, the file is downloaded into working directory and then opened.
- Parameters
path (str) – location of the csv file
name (str) – name of csv file to open, without extension
- Returns
first line of the csv file
- Return type
list
- decavision.utils.utils.load_model_clear(path, include_top=True)
Clear tensorflow session and load keras .h5 model.
- Parameters
path (str) – location of the model, including its name
include_top (bool) – whether or not to include the last layer of the model
- Returns
loaded model
- Return type
tf.keras model
- decavision.utils.utils.upload_file_gcs(gcp_path, file_path)
Copy local file to a folder in google cloud storage and call it classes.csv.
- Parameters
gcp_path (str) – full path to gcp folder
file_path (str) – location of path to upload