Conditional belief networks introduce stochastic binary variables in neural networks. Contrary to a classical neural network, a belief network can predict more than the expected value of the output Y given the input X. It can predict a distribution of outputs Y which is useful when an input can admit multiple outputs whose average is not necessarily a valid answer. Such networks are particularly relevant to inverse problems such as image prediction for denoising, or text to speech. However, traditional sigmoid belief networks are hard to train and are not suited to continuous problems. This work introduces a new family of networks called linearizing belief nets or LBNs. A LBN decomposes into a deep linear network where each linear unit can be turned on or off by non-deterministic binary latent units. It is a universal approximator of real-valued conditional distributions and can be trained using gradient descent. Moreover, the linear pathways efficiently propagate continuous information and they act as multiplicative skip-connections that help optimization by removing gradient diffusion. This yields a model which trains efficiently and improves the state-of-the-art on image denoising and facial expression generation with the Toronto faces dataset.