Classification
- A linear regression model generates the decimal probability of an event occurring. To map this probability to a binary category (the event will or will not occur), you must define a classification threshold
o The classification threshold decides which probabilities are associated with one and which are associated with zero
- You don’t always need a classification threshold. If you want to predict the probability of rain, you can just return the prediction as the decimal likelihood of rain.
- If, however, you want to predict if someone will or will not need a raincoat, you’ll need to decide which rain likelihoods should correspond to “raincoat” and which correspond to “no raincoat”
o Threshold tuning considers the consequences of different types of incorrect predictions
- True positive means the model makes a correct positive prediction
- True negative means the model made a correct negative prediction
- False positive means the model incorrectly predicted the outcome as positive
- False negative means the model incorrectly predicted the outcome as negative
- False positives and false negatives have different consequences
- It is better to bring a raincoat and not need it (false positive) than to get caught out in the rain without one (false negative)
Prediction Bias
- Prediction bias occurs when the average of all predictions made by a model is not equal (or approximately equal) to the average of the dataset
o Ex: if 10% of all dogs in a shelter are labs, the model should predict that a dog is a lab ~10% of the time
o If avg. prediction – avg. observation is not zero, the model is biased
- Calibration layers to correct bias are a bad idea
o It is possible to correct bias with brute force (for an 8% bias, add a layer that brings down the prediction mean by 8%)
o This isn’t good because prediction biases signal that something is going wrong inside your model – it’s important to identify the actual problem instead of painting over it
- Overly regularizing data, using biased training data, or training on very noisy data are common causes of prediction bias
Regularization
- When sparse features are crossed, the model size becomes needlessly large, slowing down operations and using up memory
o Extraneous weights should be made equal to 0 to save space and improve speed
- L1 regularization penalizes the absolute value of the sum of all weights in the model
o It causes some weights to be zeroed out completely, unlike L2 regularization which works to keep weights near zero but does not eliminate them entirely