![]() This allows for the standardized mean and variance of our hidden layers to be determined through what the algorithm thinks is appropriate, which can even be mean = 0, and variance = 1.Įach mini-batch □ of size m (z₁, z₂, …, zₘ) is scaled by the mean and variance computed on just that one mini batch. Rather than centering the cluster of all our values in the linear region of sigmoid (seen in graph a), using scaling and shifting parameters γ and □, we can expand to the range of values which the machine learning algorithm can center wherever it wants, with whatever variance it wants (b). When is this “scale and shift” helpful? Well for example if we have a sigmoid activation function, we might want to have a larger (wider) variance or a mean other than zero to take advantage of the non linearity of the sigmoid function. Otherwise put, we can always rearrange this final equation to get the previous value: If its decided that γ=1 and □=0, yᵢ is no different than x̂ᵢ. This allows us to set the mean of yᵢ, γ, to be whatever we want it to be. If we don’t the distribution of our hidden nodes (z₁, z₂, …, zₘ) to always have mean 0, var 1, so we adjust our values in the activation layer with scaler gamma (γ) and shifter (offset) beta (□), where γ, □ are learnable parameters of the model, hence the formula BNᵧ ᵦ. This more or less gives us the standard mean = 0 and unit variance = 1. X̂ᵢ normalizes with the constant ε added to prevent dividing by too small values, or zero. Σ²_□ is calculated by summing each data point’s difference from the mean value - after having squared it to avoid negative numbers - then dividing by the number of values you counted ( m). Μ_□ is the empirical mean, calculated as you do in 4th grade, by adding up all the values (z₁, z₂, …, zₘ) and dividing by the number of values you counted ( m). Implementing BN - Given some intermediate values (z₁, z₂, …, zₘ) of an arbitrary hidden layer, we apply the algorithm: ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |