Robust Bayesian Modeling
Yau Group meeting
February 2, 2016
What is a Bayesian Model?
- A joint distribution of parameters β and data x.
- We usually think of exchangable models p(β,x)=p(β|α)⏟priorn∏i=1p(xi|β)⏟likelihood

- Generalise this to accomodate most common models
- conditional models: p(xi|β)→p(xi|yi,β)e.g.=N(xi|wTyi,σ)
- or latent variable models: p(xi|β)=∑zp(xi,zi|β)e.g.=∑kπkN(xi|μk,σk)
Interlude: What is the difference between parameters and latent variables?
- Joint distribution: p(β,z,x)=p(β|α)⏟parametersn∏i=1p(zi|γ)⏟latent variablesp(xi|β,zi)
- Practical notion:
- Number of latent variables grows with number of data points.
- Number of parameters stays fixed.
- There is no real difference
A useful distinction: Local vs global variables
p(β,z,x)=p(β|α)⏟globaln∏i=1p(zi|γ)⏟localp(xi|β,zi)
- The distinction is determined by conditional dependencies: p(xi,zi|x−i,z−i,β)=p(xi,zi|β)
Example: Gaussian mixture model
- Which are the local and which are the global variables?
p(x|z)=∏kN(x|μk,σk)zk;p(z)=∏kπzkk
- global: means μk, standard deviations σk, mixture proportions πk;
- local: cluster assignments zk.
Motivation
- Wang and Blei, 2015 – A General Method for Robust Bayesian Modeling
… all models are wrong …
- Robustness: Inference should be insensitive to small deviations from the model assumptions.
- Wang and Blei introduce a general framework for robust Bayesian modeling.
Key idea: Localisation of global parameters
- Classical model: p(β,x)=p(β|α)n∏i=1p(xi|β)
- All data is drawn from the parameter.
- The hyperparmater α is usually fixed.
- Robust model: p(β,x)=n∏i=1p(βi|α)p(xi|β)
- Every data point is assumed drawn from an individual realisation of the parameter, which is drawn from the prior.
- Outliers are explained by variation in the parameters.
Graphical Model for Localisation
- We now need to fit the hyperparameter α.
- Fixing α would make the data points independent.
Example: Normal observation model
- Localise the precision parameter and use the conjugate prior
p(xi|α)=∫p(xi|βi)p(βi|α)dβi=∫N(xi|μ,σi)Gam−1(σi|α)dσi
p(xi|α)=Student-t(xi|μ,(λ,ν)=f(α))
2nd key idea: Empirical Bayes
- Estimate hyperparameters via maximum likelihood ˆα=arg maxαn∑i=1∫p(xi|βi)p(βi|α)dβi
- aka evidence approximation evidence=p(xi|α)=∫p(xi|βi)p(βi|α)dβi
- Here we use the data to determine the prior, is that legit?
Linear Regression
- Trainin data: yi|xi∼N(ωTxi+b,σi+0.02)σi∼Gamma(k,1)
- Test data: yi|xi∼N(ωTxi+b,0.02)
Logistic Regression
yi|xi∼Bernoulli(σ(ωTxi))
The posterior predictive
- Classical Bayesian model: p(xi|x,α)=∫p(xi|β)p(β|x,α)dβ
- Gives correct predictive distr. only if the data comes from the model.
- Robust Bayesian model p(xi|ˆα)=∫p(xi|βi)p(βi|ˆα)dβi
- Gives correct predictive distr. independent of model mismatch.
- If we want to make predictions under the model, which one should we choose?
References
- Wang and Blei 2015, "A General Method for Robust Bayesian Modeling"
- Gelman et al. 2014 "Bayesian Data Analysis", 3rd Edition
- Murphy 2012, "Machine Learning: A Probabilistic Perspective"
- Carlin and Louis 2000, "Empirical Bayes: Past, Present and Future"
1
Robust Bayesian Modeling
Yau Group meeting
Tammo Rukat
February 2, 2016