Machine Learning Notes – 3D Fourier Ptychography Optimization

Classification

A linear regression model generates the decimal probability of an event occurring. To map this probability to a binary category (the event will or will not occur), you must define a classification threshold

o The classification threshold decides which probabilities are associated with one and which are associated with zero

You don’t always need a classification threshold. If you want to predict the probability of rain, you can just return the prediction as the decimal likelihood of rain.
If, however, you want to predict if someone will or will not need a raincoat, you’ll need to decide which rain likelihoods should correspond to “raincoat” and which correspond to “no raincoat”

o Threshold tuning considers the consequences of different types of incorrect predictions

True positive means the model makes a correct positive prediction
True negative means the model made a correct negative prediction
False positive means the model incorrectly predicted the outcome as positive
False negative means the model incorrectly predicted the outcome as negative
False positives and false negatives have different consequences
It is better to bring a raincoat and not need it (false positive) than to get caught out in the rain without one (false negative)

Prediction Bias

Prediction bias occurs when the average of all predictions made by a model is not equal (or approximately equal) to the average of the dataset

o Ex: if 10% of all dogs in a shelter are labs, the model should predict that a dog is a lab ~10% of the time

o If avg. prediction – avg. observation is not zero, the model is biased

Calibration layers to correct bias are a bad idea

o It is possible to correct bias with brute force (for an 8% bias, add a layer that brings down the prediction mean by 8%)

o This isn’t good because prediction biases signal that something is going wrong inside your model – it’s important to identify the actual problem instead of painting over it

Overly regularizing data, using biased training data, or training on very noisy data are common causes of prediction bias

Regularization

When sparse features are crossed, the model size becomes needlessly large, slowing down operations and using up memory

o Extraneous weights should be made equal to 0 to save space and improve speed

L1 regularization penalizes the absolute value of the sum of all weights in the model

o It causes some weights to be zeroed out completely, unlike L2 regularization which works to keep weights near zero but does not eliminate them entirely

From Google’s machine learning crash course