Machine Learning Notes

Classification

  • A linear regression model generates the decimal probability of an event occurring. To map this probability to a binary category (the event will or will not occur), you must define a classification threshold

o   The classification threshold decides which probabilities are associated with one and which are associated with zero

  • You don’t always need a classification threshold. If you want to predict the probability of rain, you can just return the prediction as the decimal likelihood of rain.
  • If, however, you want to predict if someone will or will not need a raincoat, you’ll need to decide which rain likelihoods should correspond to “raincoat” and which correspond to “no raincoat”

o   Threshold tuning considers the consequences of different types of incorrect predictions

  • True positive means the model makes a correct positive prediction
  • True negative means the model made a correct negative prediction
  • False positive means the model incorrectly predicted the outcome as positive
  • False negative means the model incorrectly predicted the outcome as negative
  • False positives and false negatives have different consequences
  • It is better to bring a raincoat and not need it (false positive) than to get caught out in the rain without one (false negative)

Prediction Bias

  • Prediction bias occurs when the average of all predictions made by a model is not equal (or approximately equal) to the average of the dataset

o   Ex: if 10% of all dogs in a shelter are labs, the model should predict that a dog is a lab ~10% of the time

o   If avg. prediction – avg. observation is not zero, the model is biased

  • Calibration layers to correct bias are a bad idea

o   It is possible to correct bias with brute force (for an 8% bias, add a layer that brings down the prediction mean by 8%)

o   This isn’t good because prediction biases signal that something is going wrong inside your model – it’s important to identify the actual problem instead of painting over it

  • Overly regularizing data, using biased training data, or training on very noisy data are common causes of prediction bias

Regularization

  • When sparse features are crossed, the model size becomes needlessly large, slowing down operations and using up memory

o   Extraneous weights should be made equal to 0 to save space and improve speed

  • L1 regularization penalizes the absolute value of the sum of all weights in the model

o   It causes some weights to be zeroed out completely, unlike L2 regularization which works to keep weights near zero but does not eliminate them entirely

 

From Google’s machine learning crash course