iSoul In the beginning is reality

# Approaching the unknown

We have some knowledge but it is not complete knowledge, not even arguably near complete. So what should we do about the areas where knowledge is lacking? We should certainly continue to investigate. But what do we say in the mean time? What can we justify saying about the unknown side of partial knowledge?

There are three basic approaches to the unknown: (a) assume as little as possible about the unknown and project that onto the unknown; (b) assume the unknown is exactly like the known and project the known onto the unknown; or (c) assume the unknown is like what is common or typical with what is known and project that onto the unknown.

Approach (a) uses the principle of indifference, maximum entropy, and a modest estimate of the value of what is known to the unknown. It takes a very cautious, anything-can-happen approach as the safest way to go.

Approach (b) uses the principle of the uniformity of nature, minimum entropy, and a confident estimate of the value of what is known to the unknown. It takes an intrepid, assertive, approach as the most knowledgeable way to go.

Approach (c) uses the law of large numbers, the central limit theorem, the normal distribution, averages, and a moderate estimate of the value of what is known to the unknown. It takes a middle way between overly cautious and overly confident approaches as the best way to go.

The three approaches are not mutually exclusive. All three may use the law of large numbers, the normal distribution, and averages. They all may sometimes use the principle of indifference or the uniformity of nature. So calling these three different approaches is a generalization about the direction that each one takes, knowing that their paths may cross or converge on occasion.

It is also more accurate to say there is a spectrum of approaches, with approaches (a) and (b) at the extremes and approach (c) in the middle. This corresponds to a spectrum of distributions with extremes of low and high variability and the normal distribution in the middle.

This suggests there is a statistic of a distribution that varies from, say, -1 to +1 for extremes of low and high variability that is zero for the normal distribution. So it would be a measure of normality, too. The inverse of the variability or standard deviation might do.

Compare the three approaches with an input-output exercise:

1. Given input 0 with output 10, what is output for input 1?
1. Could be anything
2. The same as for input 0, namely, 10
3. The mean of the outputs, namely, 10
2. Also given input 1 with output 12, what is output for input 2?
1. Still could be anything
2. The linear extrapolation of the two points (10+2n), namely, 14
3. The mean of the outputs, namely, 11
3. Also given input 2 with output 18, what is output for input 3?
1. Still could be anything
2. The quadratic extrapolation of the two points (10+2n+n^2), namely, 25
3. The mean of the outputs, namely, 40/3
4. Now start over but with the additional information that the outputs are integers from 1 to 100.
1. The values 1 to 100 are equally likely
2. The values 1 to 100 are equally likely
3. The values 1 to 100 are equally likely
5. Given input 0 with output 0, what is output for input 1?
1. Bayesian updating
2. The same as for input 0, namely, 0
3. The mean of the outputs, namely, 0
6. Also given input 1 with output 5, what is output for input 2?
1. Bayesian updating
2. The linear extrapolation of the two points (5n), namely, 10
3. The mean of the outputs, namely, 2.5, so 2 or 3 are equally likely
7. Also given input 2 with output 9, what is output for input 3?
1. Bayesian updating
2. Since there are limits, extrapolate a logistic curve ((-15+30*(2^n) / (1+2^n)), namely, 12
3. The mean of the outputs, namely, 14/3, rounded to 5

2008

# Uniqueness and uniformity

If everything were completely unique, we would have no way of identifying them as to what they are. If everything were completely identical, or uniform, we would have no way of distinguishing them. We conclude that the world is somewhere in between: everything is a combination of the unique and the uniform.

If all events were completely independent, or unrelated, we would have no way of identifying them as to what they are. If all events were completely identical, we would have no way of distinguishing them. We conclude that all events are a combination of the independent and the identical.

So it is not possible to have two completely unique or identical individuals. Nor is it possible to have two completely unrelated or identical events.

In statistics, we assume the least about events we don’t know about: we assume they are independent and make the least possible inference. We assume we know nothing other than what we are given in the data. We take multiple trials and use the law of large numbers to infer safe conclusions. Or we adopt a maximum entropy prior distribution as a minimal assumption.

In natural science, we assume the most about things we don’t know about. This is based on an assumption of the uniformity of nature. The natural world that we don’t know is like the natural world we do know about. We assume that what we don’t know about is the same as what we do know about. That is, we assume everything we know is all we need to know – until we know more. Then we revise and make the same assumption.

If we begin natural science with no prior knowledge and pick up a rock, we conclude that everything is rock. If we then step in a puddle, we conclude that everything is a rock or a puddle. If we let go of the rock and it falls to the ground, we conclude that all rocks fall to the ground just like that rock.

In history, the less we assume about events we don’t know, the better. Events are assumed to be unique though somehow related to other events. Through historical study we infer the relation of events. So history is more like statistics than natural science.

Natural history takes the approach of natural science toward studying the past. It assumes that all events in the past are like events in the present. So the past and the present are alike and history is the repetition of similar events. This is an anti-historical approach to history because it ignores or downplays the uniqueness of events.

2008

# Science and statistics

Here is a story about two statisticians and two scientists. They are given a problem: what are the frequencies of the letters in English texts? The junior statistician has no research budget whereas the senior statistician has a modest research budget. Similarly, the junior scientist has no research budget but the senior scientist has a large research budget.

The junior statistician has no budget to collect frequency data and, being a careful statistician, makes no assumptions about what is unknown. So the conclusion is made that the frequency of each letter is 1/26th. A note is added that if funds were available, a better estimate could be produced.

The senior statistician has a modest budget and so arranges to collect a random sample of English texts. Since English is an international language, a sample of countries is selected at random. Written English goes back about 500 years so a sample of years is selected at random. A list of genres is made and a sample of genres is selected at random. Then libraries and other sources are contacted to collect sample texts. They are scanned and analyzed for their letter frequencies. The letter frequencies of the sample are used as the unbiased estimate of the population frequencies. Statistical measures of uncertainty are also presented.

The junior scientist has no budget to collect data but happens to own a CD with classic scientific texts. With a little programming, the letter frequencies on the CD are determined. These frequencies are presented as the frequencies of all English texts. No measures of uncertainty are included. It is simply assumed that English texts are uniform so any sample is as good as another. However, a caveat is made that the conclusion is subject to revision based on future data collection.

The senior scientist has a large research budget from a government grant. Arrangements are made to collect a massive amount of data from electronic sources such as the Internet and several large libraries. The written texts are scanned and combined with the electronic sources into a large database and then the letter frequencies are determined. These frequencies are announced as the letter frequencies for all English texts. No measure of certainty is included. It is not mentioned that future data collection could lead to a revised conclusion.

The senior scientist collects a prize for successfully completing the project. The others are forgotten.

Who had the best approach? Why aren’t scientists and statisticians more alike?

2008

# Evidence of Absence

Evidence of Absence: Completeness of Evidential Datasets

Elliott Sober presents a likelihood argument about the motto “Absence of evidence is not evidence of absence” (Sober 2009).  He states the Law of Likelihood this way:

The Law of Likelihood. Evidence E favors hypothesis H1 over hypothesis H2 precisely when Pr(E│H1) > Pr(E│H2). And the degree to which E favors H1 over H2 is measured by the likelihood ratio Pr(E│H1)/Pr(E│H2).