Using Normal Distributions for Continuous Probability Distributions

The Normal Distribution

The normal distribution is a data distribution that can be used to describe many types of measurements in engineering. Basically, a normal distribution is a bell shaped curve. The role of the normal distribution in statistics has been stated to be analogous to the
role of the straight line in geometry. Figure 1 illustrates a bell curve, superimposed over a histogram of PCC compressive strength data. Such a distribution is very convenient to use because it is characterized by the mean (μ or x) and standard deviation (σ or s).

As Figure 1 shows, most of the strength measurements cluster around the mean (x = 4,824 psi), while fewer measurements are near the lowest (3,875 psi) and highest (5,975 psi) strength values. The theoretical normal distribution extends out infinitely in both directions and never quite reaches the horizontal axis and has a total area under the curve of 1.00 (i.e., 100 percent of the data values are represented by the distribution). Since it extends indefinitely in either direction (minus infinity to plus infinity), it encompasses all of the results that can occur. The area under the curve within these two limits must therefore be equal to unity (i.e., 1.000 or 100 percent). For practical purposes, however, most of the data values (99.73 percent) occur within 3σ of the mean. A far more important result than those mentioned above is also related to the fact that the area under the curve is equal to 100 percent. Because of this, it can be stated that the probability of finding a data value between any two values of x is equal to the area under the normal distribution between those values.

The Normal Distribution Equation

The height of a normal distribution (y) can be defined by its corresponding value of x (refer to Figure 2) by the following equation:

where:	y	=	vertical height of a point on the normal distribution
	x	=	distance along the horizontal axis
	σ	=	standard deviation of the data distribution
	μ	=	mean of the data distribution
	e	=	exponential constant = 2.71828…
	p	=	pi = 3.14159….

Example

Based on hypothetical density data results, calculate the area under the normal curve between 105 and 115 lb/ft³ for a standard deviation of 5 lb/ft³. These calculations are:

The approximate area under the curve is about 0.45 (or 45 percent – see Figure 2), which is close to the “theoretical” value of 48 percent (refer to sketch in Figure 3). The significance of this value is that the probability of a density measurement falling within the range of 105 to 115 lb/ft³is about 0.48 (let’s use the “theoretical” value).

Figure 2: Determination of Approximate Area Under the Normal Distribution

To determine such probabilities in this manner is tedious and time consuming. There is an easier way to determine these probabilities than computing and tabulating y’s for various means and standard deviations. To do this, you must convert the normal distribution to a standard normal distribution and define a variable “z,” which is:

If you substitute z into the normal distribution equation, the following relationship results:

where:	y_z	=	vertical height on the standard normal distribution
	z	=	as defined above

Refer to Figure 3, which illustrates this important transformation. Thus, you can see that the probability of having a density test between 105 and 115 pcf is about 47.7 percent (or 34.1 + 13.6 percent).

Figure 3: Example Proctor Density Distributions
(Normal and Standard Normal)

Fortunately, the “z-statistic” has been published in tables to allow for easy computation. Such a table is shown as Table 1. You can see that:

mean � 1 standard deviations @ 68.2% of area
mean � 2 standard deviations @ 95.4% of area
mean � 3 standard deviations @ 99.8% of area

Recall that all of the area under a normal distribution is 100% and the values above correspond to the z-statistic.

Table 1: Normal Distribution Table (from Ulberg, 1987)

Source:

http://classes.engr.oregonstate.edu/cce/winter2012/ce492/Modules/08_specifications_qa/normal_distribution.htm

Continuous Probability Distributions

1). Normal Distribution Curve (Lind 2005)

Here is a link to University of Leicester and a good explanation of Normally Distributed Populations. http://www.le.ac.uk/biology/gat/virtualfc/Stats/normal.htm

The normal curve is bell-shaped and has a single peak at the center of the distribution.
The arithmetic mean, median, and mode of the distribution are equal and located at the peak. Thus half the area under the curve is above the mean and half is below it.
The normal probability distribution is symmetrical about its mean.
The normal probability distribution is asymptotic. That is the curve gets closer and closer to the X-axis but never actually touches it.
Theoretically, curve extends to infinity.
The standard normal distribution is a normal distribution with a mean of 0 and a standard deviation of 1.
It is also called the z distribution.

A z-value is the distance between a selected value, designated X, and the population mean , divided by the population standard deviation, .

Z- Value

Forest Edward Thompson III (UoPhx 2008)

A normal distribution can be transformed into a standard normal distribution by discovering the z-value. The z-value is determined by subtracting the (X) value from the mean, and divided by the standard deviation.

The z-value is also known as:

z scores

standard normal values

normal deviate

standard normal deviate

z statistics

The Standard Normal Value formula is:

X is the value of any observation, measurements, or numbers
Mu is the mean of the distribution
S is the standard deviation of the distribution

The Standard Normal Value formula is used to find the z-value. The z-value is then used to determine the probability for the standard normal probability distribution. This is discovered by using the statistical graph for the “Areas under the Normal Curve” or the “Standard Normal Probabilities”.

This is an example of the “Areas under the Normal Curve” or the “Standard Normal Probabilities” chart:

Lind, D. A. & Marchal, W. G. & Wathen, S. A. (2004). Statistical techniques in business and economics, 12e: Appendix D, pg. 720: Areas Under the Normal Curve. New York: The McGraw-Hill Companies

To understand how to use the chart:

Use the Standard Normal Value formula to discover the z-value

Once the z-value is discovered refer to your chart to understand its probability

Locate the z-value number on the left side of the chart going vertically (using this chart as an example the z-value numbers are from point 0.0 to 3.0 in the vertical column)

Once the z-value number has been discovered in the vertical column, then use the horizontal row to discover the second part of the z-value (using this chart as an example the z-value numbers are from point 0.00 to 0.09 in the horizontal row)

Example: the z-value is 2.62

Using the chart, find 2.6 in the vertical column and 0.02 in the horizontal row. If z = 2.62, then P (0 to z) =0.4956

These are internet links that will help with more detailed information on normal probability distribution, bell-shape curve, z-value, charts, and statistical calculators:

http://www.cas.buffalo.edu/classes/psy/segal/2072001/z-dist&corr/zdist.htm

http://www.math.com/tables/stat/distributions/z-dist.htm

http://stattrek.com/Lesson2/Normal.aspx

http://davidmlane.com/hyperstat/z_table.html

http://www.math.csusb.edu/faculty/stanton/m262/normal_distribution/normal_distribution.html

http://faculty.uncfsu.edu/dwallace/sz-score.html

Reference

Lind, D. A. & Marchal, W. G. & Wathen, S. A. (2004). Statistical techniques in business and economics, 12e: Chapter 7: Continuous Probability Distributions. New York: The McGraw-Hill Companies

2). The Empirical Rule for A Normally Distributed Population

68% of measurements are within 1 Standard Deviation from the mean
95% of measurement are within 2 Standard Deviations from the mean
99.7% of the measurements are within 3 Standard Deviations from the mean
Nearly Every measurement is within +/-3 Standard Deviation form mean Except Outliers. And that is only 3 sigma…think about 6 sigma.

3). The Normal Approximation to the Binomial

The normal distribution (a continuous distribution) yields a good approximation of the binomial distribution (a discrete distribution) for large values of n.
The normal probability distribution is generally a good approximation to the binomial probability distribution when n(pie) and n(1- (pie) ) are both greater than 5.

With the binomial experiment:

There are only two mutually exclusive outcomes (success or failure) on each trial.
A binomial distribution results from counting the number of successes.
Each trial is independent.
The probability is fixed from trial to trial, and the number of trials n is also fixed.

The value .5 subtracted or added, depending on the problem, to a selected value when a binomial probability distribution (a discrete probability distribution) is being approximated by a continuous probability distribution (the normal distribution).

Source:

http://www.westbrookstevens.com/continuous.htm

Using Normal Distributions

STATISTICS & PROBABILITY » STATISTICS » USING NORMAL DISTRIBUTIONS

If you missed the simulating introduction to Continuous Variables and Normal Distributions it might be worth taking a few minutes to read.

Z Value

The functional form for a normal distribution is a bit complicated. It can also be difficult to compare two variables if their mean and or standard deviations are different, for example heights in centimeters and weights in kilograms, even if both variables can be described by a normal distribution. To get around both of these conflicts we can define a new variable:

(1)

z=x−μσ

This variable gives a measure of how far the variable is from the mean (x−μ) then “normalizes” it by dividing by the standard deviation (σ). This new variable gives us a way of comparing different variables. The z-value tells us how many standard deviations or “how many sigmas” the variable is from its respective mean.

When the distribution function is expressed in terms of the z-value it is sometimes called the “standard normal distribution.” You’ve got to love the creativity!

(2)

f(x)=1σ2π−−√e−12z2

Areas Under the Curve

To calculate the probability that a variable is within a range we have to find the area under the curve… Hooray, calculus! It turns out that there is no indefinite integral of the function! However, smart folks have figured out how to do definite integrals, but they are a bit complex so the folks who have to work with normal distributions rely on tables, which you have in your formula booklet, or calculators.

An example of the table is shown in the collapsible box below. It’s as close as I could find to the one the IB gives you…

+ Show Standard Normal Curve Areas

These tables can be a bit scary, but you simply need to know how to read them.

The left most column tells you how many sigmas above the the mean to 1 decimal place.
The top row gives the second decimal place.
The intersection of a row and column gives the probability.

For example, if we want to know the probability that a variable is no more than 0.51 sigmas above the mean we find select the 6th row down (corresponding to 0.5) and the 2nd column (corresponding to 0.01). The intersection of the 6th row and 2nd column is 0.6950. Which tells us that there is a 69.50% percent chance that a variable is less than 0.51 sigmas above the mean…

Notice that for 0.00 sigmas the probability is 0.5000. Thus showing that there is equal probability of being above or below the mean! So nice when stuff makes sense.

Using Ti-84 to Find Areas Under the Curve

A 6 minute video showing how to get area under the normal distribution given a range of z-values. It also covers the inverse, that is going from area to z-values. This is something you need to know how to do!

+ Show Video

“Simple” Examples

Example 1

Find P(Z≤1.5)

– Hide Example Solution

This problem essentially asks what is the probability that a variable is less than 1.5 sigmas above the mean. On the table of values find the row that corresponds to 1.5 and the row that column that corresponds to 0.00. Which gives a probability of 0.933.

Graphically this problem can be represented as such:

pub?id=18Prn3AmrQGsMHUx1rn8Rx02lqjlEFhel-8KrUTYLzG0&w=402&h=206

– Hide Example Solution

Example 2

Find P(Z≥1.17)

– Hide Example Solution

This problem essentially asks what is the probability that a variable is MORE than 1.17 sigmas above the mean. On the table of values find the row that corresponds to 1.1 and the row that column that corresponds to 0.07. Which gives a probability of 0.8790. However, this is the probability that the value is less than 1.17 sigmas above the mean. Since all the probabilities must sum to 1:

(3)

P(Z>1.17)=1−P(Z<1.17)=0.121

Graphically this problem can be represented as such:

pub?id=1GUgAkd5v3OUMhlTrWCOx1TIwSYqhL2ZWNA5z63MfgBA&w=404&h=198

– Hide Example Solution

Example 3

Find P(−1.16≤Z≤1.32)

– Hide Example Solution

This example is a bit tougher… Graphically this problem can be represented as such:

pub?id=15Mi6GVcPsAFP7UEcR4bt7gLzSms0HcMSJiCj4L8sQLc&w=452&h=251

This problem can be rewritten in the form below.

(4)

P(−1.16≥Z≤1.32)=P(Z≤1.32)−P(Z≤−1.16)

The difficulty comes in our table of values does not allow us to directly calculate P(Z≤−1.16). However we can use the symmetry of the distribution.

(5)

P(Z≤−1.16)=1−P(Z≤1.16)=0.1230

So we can say:

(6)

P(−1.16≤Z≤1.32)=0.9066−0.1230=0.7836

– Hide Example Solution

IB Style Examples – In Progress

Example 4

– Hide Example Solution

a) First we have to find the z-value that corresponds to the 197 cm.

(7)

z=197−187.59.5=1

Then using a GDC or table P(Z>1)=1−P(Z<1)=0.159

So 15.9% of the adults are taller than 197.

b) First we have to find how tall the 99th percentile is. Using a GDC (or table) we need an inverse normal distribution to get a z-value that corresponds to 99%.

(8)

z=2.32

Then we can find the height:

(9)

2.32=h−187.59.5

(10)

h=209.6

Then since the doorway needs to be 17 cm taller. So, the door needs to be 227 cm (3 S.F.).

– Hide Example Solution

Example 5

– Hide Example Solution

a) First we need to find the z-value that corresponds to P(X>153)=0.705. To do this we use the inverse normal (GDC or table).

This gives us z=0.539. Since the probability is greater than 0.5 this means that the z-value must be negative, i.e. 153 is less than the mean. Think about it. Draw a picture.

So we can say:

(11)

−0.539=153−μ5

From this μ=155.7cm

b) Same old, same old… Find the z-value.

(12)

z=156−1535=0.6

Use a GDC or table… P(Z>0.6)=1−P(Z<0.6)=0.274

– Hide Example Solution

Example 6

– Hide Example Solution

a) We need to find P(X<72). To do that we need the z-value.

(13)

z=72−808=−1

(14)

P(Z<−1)=0.159

b) For this we need P(90>X>72) which is the same as P(1>Z>−1) because 90 is one standard deviation above the mean so its z-value is 1. You can use your GDC to quickly get an answer or…

(15)

P(1>Z>1)=P(Z<1)−P(Z<−1)=0.841−0.159=0.682

I’ll let you guys do the shading. It looks very similar to the shading in example 3 above.

c) We need to find the z-value that corresponds to P=0.04. Use your GDC or table to find the inverse normal distribution. This is z=−1.75

Then using the definition of a z-value:

(16)

−1.75=x−808

From this x=66.

– Hide Example Solution

Source:

http://ibmathstuff.wikidot.com/usingnormaldistributions