3 Comments

"Using LEACE, the researchers were able to largely remove the outdated correlation between ‘nurses’ and ‘female’ from a large language model." Can you explain why you characterize this correlation as "outdated"? Over 86% of nurses are female in 2023.

Expand full comment

Hi Matthew. Great article and thanks for digging in. Curious if you think that concept erasure will work in the long haul when the AI algorithms are trained to mimic and/or otherwise learn from human behavior and statistics (as is the case with nurses being correlated with women). In other words, while the initial erasure “treatment” showed large improvement initially, doesn’t such an algorithm re-learn the biasness as an innate function of the AI coding? And, if the algorithm is to weight input parameters to correct for biasness, then would the algorithm be trustworthy (or would that depend on the weighting and the coding)? With so much to consider, the ongoing evolution of AI/ML will be quite the ride.

- Bob C (your old neighbor)

Expand full comment