**Part 1: Normal distribution and its standard deviation**

Normal distribution is one of the most widely used distributions in statistics (there are others, but we don't really need to discuss them). A picture is worth a thousand words, so here is our culprit:

You might say "gah, it's just some line, so what?". But it's not just any line; this is a line that for whatever magical reason describes all kinds of random distributions in the world around you. Intuitively you already know this from experience; think of people's height, for example. Most guys you know are probably around 5'7"-6'0", give or take, right? You also might know some guys that are shorter or taller than that, but not quite as many. And then of course there the extremely tall (think NBA players) or short guys, but they are few and far in-between. Researchers have collected heights for very large groups of people, and that data is described very nicely by a normal distribution:

But hey, who cares about height. Let's discuss something that's much more relatable to battleship dispersion in this game: artillery fire! There is a handy USMC Field Manual (FM-40) out there, called Tactics, Techniques, and Procedures for Field Artillery Manual Cannon Gunnery. Below is a very handy illustration from Section 3-5. Causes of Dispersion, that shows the schematic distribution of shell bursts of "*ammunition of the same caliber, lot, and charge that are fired from the same position with identical settings used for deflection and quadrant elevation*":

As you can see, the rounds will not all impact in a single point but will fall in a scattered pattern, with most bursts being close to the point of aim, and few of them further away. This scattering of bursts is caused by all kinds of stuff: minor variations in the weight of the projectile, form of the rotating band, moisture content and temperature of the propellant grains, differences in the rate of ignition of the propellant, variations in the temperature of the bore from round to round, conditions of the carriage, variations in air resistance due to wind, etc. But the cool thing is, that the combination of these random factors still produces a burst pattern around the mean point of impact, that -- you guessed it! -- is described very well by a normal distribution curve. Well, to be more precise, by two curves. For each shell burst we can measure the perpendicular distance to the mean range line (Y axis), and to the line of fire (X axis). This way, we can describe all the bursts as two columns of values, one for perpendicular distance to X, and one for perpendicular distance to Y:

Here is another illustration of how that works:

I must stress, this is a __probability__ curve -- if we start shooting actual shells, and then plot the percent of shells from total shells fired vs. burst distance from the aim point for a given distance, the result won't match the normal distribution curve exactly. Theoretically, if we would fire an infinite number of shells it would match ideally, but who's got time for that! Anyways, for all intents and purposes the theoretical normal distribution curve describes the actual real-life shell distribution quite well, which is why it's in this Field Manual to begin with.

Now, let's talk about the normal distribution curve itself in more detail. It's got a few interesting properties that must be kept in mind. For one, the combined area under it, that is the sum of all possible probabilities of all possible variable variations must add up to one. A simple illustration of this concept (though not directly related to normal curves) is this: if you flip a coin a bunch of times , making sure it lands flat every time, and then divide the number of flips by the sum of heads and tails, you'll get one.

A second important detail: this curve is symmetrical. If you draw a line down the middle, the shape of the curve to the left and to the right of that line will be exactly same.

Another much more nuanced detail is the exact *shape* of the curve. Here is a handy illustration:

The character μ simply denotes the average of all data points (for example, the average value for all shell burst distances from either X or Y axis that we've computed above). Character σ is sigma, or standard deviation of the normal curve. It's not all that important how it is computed (link for the curious). What's important is the fact that this value *quantitatively* describes the expected distribution of the observations, e.g. shell bursts. Let's say that we accurately recorded all our bursts, and measured the perpendicular distance to the mean range line for each one. Based on those measurements we've calculated sigma. For the sake of argument, let's say it's one hundred meters. The graph above tells us that we can expect about 68% of shell bursts to be within 100 meters of the mean range line, 95% of shell bursts within 200 meters of the that line, and 99% of shell bursts to be within 300 meters of the line. This is an extremely powerful concept: once we compute our sigma (standard deviation), we can mathematically predict the approximate probability of shell landing within any given distance from the aim line! And if we have sigma for both X and Y axis, then we will know the probability of landing a shell at any given distance away from the aim point. You can imagine how important this is for artillery fire.

It should be noted that this sigma we've computed must be a physical value, it must have units of distance. If you tell me "68% of shells will land within 100 of the mean range line", that tells me nothing. One hundred what? Meters? Inches? Football fields?

Okay, so now we know that the normal curve has a very unique shape, and that the sigma (standard deviation) value gives us a quantitative way of estimating the distance to the center of the curve. What is the effect of changing sigma, how will that change the way our distribution looks? In the very first graph of this post, the sigma is equal to ten. Let's see how the graph will look if we change sigma to 5 and to 20, leaving all else equal:

*Increasing* sigma (standard deviation) makes our distribution wider and shorter. Shells, on average, will be landing further away from the aim point.

*Decreasing* sigma (standard deviation) makes our distribution narrower and taller. Shells, on average, will be landing closer to the aim point.

Keep in mind that the area under the curve is still one for all these examples, which is why the height and width of the curves change in unison. Also, keep in mind that regardless of the exact shape of the normal distribution, each sigma will still give us a quantitative way to measure probability of some observation falling within a certain distance away from the center of the distribution (68% - 95% - 99% for multiples of one, two, and three sigmas respectively).

Incidentally, this means that the WG "sigma value" is not the same as sigma (standard deviation) in statistics that I've described above. From patch notes we know that increasing that "sigma value" makes guns more accurate. That means that the shell distribution around the aim point is more narrow. That means that sigma (standard deviation) has decreased! I have a few hypotheses as to what exactly that "sigma value" is and how it's related to the actual sigma (standard deviation), but I'll share them in a different thread a bit later.

Well, this concludes the first part of Vak's Fun Times With Stats installment! Hopefully this was useful. Stay tuned for the second part about WoWS statistical fallacies!