StatTalk.Org: On probability distribution

Probability distribution is central to anything that you might think about statistics. There are many terms in statistics which are not readily understandable when you hear them for the first time. As the time passes, you would probably pretend to have understood those mysterious words, yet, they might not resonate very well to you. I can reasonably assume that there are people who would not visualize many things on the first attempt. Probability distribution is one of those terms in statistics that you will hear almost all the time, but would probably find somewhat inconvenient to grasp.

I had this difficulty at one time and figured it out that in order to convince yourself of this particular terminology, you need to think outside statistics. That is, you need to be able to explain it without regard to any statistical jargon. I can imagine there are a handsome amount of students out there who would wonder even at the end of their first year courses about what probability distribution is all about. Therefore, I am trying to illustrate it in somewhat informal way. This article may not be taken very seriously but can only be used to convince yourself of something that is not-very-easy to understand if you are thinking statistically.

Probability

Before going on to the concept of probability distribution, it is useful to define probability first. The concept of probability is intended to provide a numerical measure of the likelihood of an event's occurrence. Probability is measured on a scale from 0 to 1. At the extremes on this measuring scale, a probability of zero indicates that the event does not occur, whereas a probability of one indicates that the event is certain to occur. In probability theory, an event is not just something that "happens" or "occurrence" of an outcome, rather, it is related to an experiment, whose outcomes cannot be predicted with certainty, but only the possible outcomes can be listed in advance. An example will clarify the above notions. If you roll a die (which has 6 sides), you can list all possible outcomes that it may result in (i.e., it can show either 1 or 2 or 3 or 4 or 5 or 6) but you cannot say with certainty that a 4 will show up. However, we are assuming that the die is "fair". If the die is "unfair" meaning that it has a tendency to show up, say 4, more often than the other numbers, then you can bet on it with some degree of certainty.

Lets take another example. Suppose Mohammad Ashraful and Ricky Ponting are out there in the middle of the cricket ground to do the toss. The match referee has a biased coin, and it has a tendency to to show Head (tiger head, of course) than the Tail. And fortunately, Ashraful happens to know this biasness in advance. Now, if it is Ashraful's call for the toss, he will certainly say "Head" than "Tail" simply because a Head is more likely to occur. That is to say, a Head has a higher probability of occurring than a tail.

The above two examples (rolling a die and tossing a coin), if performed repeatedly, constitute random experiments. Sometime, these experiments are called "conceptual experiments" because we do not perform them in reality. However, there are many real life situations which can be described or understood better using the same mechanism that are paramount to the coin tossing or die throwing experiments.

Think outside statistics

So, having said all of the above, we would like to know the following:

1. what is the probability that 70% of the students will score more than 90% in the final exam?
2. what is the probability that a train will be late by more than 30 minutes?
3. what is the probability that a faulty motor cycle will start after 4th kick?
4. what is the probability that there will be 10 people in the queue at a given time when you arrive at the bank counter?

Certainly, I am not going to answer the above questions. They are raised in order to understand the concept of probability distribution better. Lets take the 3rd one-- the faulty motor cycle example. Lets consider the following:

1. there is a probability that the motor cycle will start on 1st attempt
2. there is a probability that the motor cycle will start on second attempt
3. there is a probability that the motor cycle will start on third attempt and so on...

Here comes the concept of probability distribution. By distribution, we mean "a set of numbers and their frequency of occurrence collected from measurements over a statistical population" (see, answers.com). However, this definition is not intuitively very clear. So, lets forget about statistics for the time being, and think about the following scenario.

A child is playing on her play area. She has all her toys in a box. Suppose, there are 100 toys in the box. Let us assume that the child has no preference for a particular toy over the other. She picks one toy at a time, plays with it for sometime and tosses it around. This way when she finished playing with all her toys, there will be none left in the box. All the toys will be scattered around the room. If you assign "probability" to each of the toys, one can think of the toys as "probabilities" as if "probability" is a physical thing. Therefore, the allocation of each toy on different points (places) of the room can be thought of as the distribution of toys in the room, or in other words, distribution of "probabilities" in the room.

In the light of the above example, we can say that probability distribution is how the probabilities are scattered over a region or area. As for the child's example, the distribution of the toys (or distribution of the "probabilities") is how the toys are scattered around the room.

Models, reality, replication

Now, how do I calculate the probabilities in the questions above? We can do it by repeatedly performing the same experiment over and over again and recording the outcomes and calculating the probabilities. Indeed, this is the classical definition of probability, which is beyond of scope of this article. Needless to say that while some of the experiments can be repeated, it may not be possible to repeat some experiments over and over again. Refer to the question 1 above and notice that it is not feasible to repeat this experiment in practice.

Therefore, we need some sort of mathematical equations that can mimic a real life situation as closely as possible. We call such equations as "models". They are so called because they are not an exact representation of the real situation, rather they only mimic the characteristics of a real situation under certain restrictions (or conditions).

Distribution of toys in the chil's example will very likely to be different from the distribution of students' marks in the final examinations. Also the distribution of probabilities in the faulty motorcycle example will differ from that of the queueing problem. Therefore, different equations are needed to model (or mimic) different real life situations. This is because different situations have different types of distributions for the probabilities. And hence we have different probability models. Bernoulli, binomial, Poisson, normal, gamma are some of the well known probability models. They mimic different real life situations that they are good fit of.

StatTalk.Org

On probability distribution

0 comments

Categories

Archives

Followers

Recent Posts