January 21 2019 RRG Notes

Mutual Information, and Density in Thingspace

In Empty Labels and Replace The Symbol With The Substance, we learned that it’s often more useful to replace words with the concepts that they represent
So why do we bother with defining words at all?
It’s useful to have short words for commonly defined concepts
These short words correspond to the conditional probabilities of the concepts they represent
- In an efficient code the code for the combination of Y₁Z₂ will be just as long as the sum of the codes for Y₁Z₂ unless P(Y₁Z₂) > P(Y₁)P(Z₂)
- This corresponds to the case where we can make an inference about some other property of an object by knowing a particular property
We don’t replace the word “human” with the definition for “human” because the definition is huge
However, we’re only able to do this because all of the attributes of humanity are correlated with one another
- “Human” is a relatively dense cluster in thingspace
The act of defining a word is an implicit promise that the word will help you make inferences
If you define a word to describe a combination of things that are no more likely than chance to be found togther, then the word you’ve defined is a false promise, since it doesn’t help you make any inferences
Use words to describe clusters of characteristics that are correlated with one another

As vast as thingspace is, it’s still much smaller than conceptspace
What is conceptspace?
- A concept is a rule that includes or excludes examples
- For example, if you have a rule that goes: {2: yes, 3: no, 14: yes, 23: no, 8: yes, 9: no}, you might guess that the rule is “even numbers: yes”
The number of concepts grows faster than exponentially (combinatorically) with the number of number of variables describing the objects you’re interested in
So, in order to build rules that we can draw inferences from in a reasonable amount of time with a reasonable amount of data, we need to draw simple boundaries around clusters in thingspace
These boundaries are expressed as intensional definitions
Singling out any particular concept for consideration is something that has to be done carefully, given the superexponential size of conceptspace

If you use the right kind of neural network, its inferences are exactly the same as naive Bayesian reasoning
Algorithms that are “scruffy” or have “emergent properties” often map to variants of Bayesian reasoning

Part of the reason that people get into trouble using words is because they don’t understand how much complexity words hide
Words are merely tags that are assigned to enormously complex concepts
Comparing a word to a concept is like comparing the handle of a paintbrush to the painting created by that brush

When we use words, we’re often sneaking in a reference to ourselves as part of the concepts we’re referring to with those words
For example, if a person says that the grocery store is on the left side of the street, they’re implicitly indicating which direction they’ll be traveling
This phenomenon is known a “speaker deixis”
If we encounter two concepts for which we have the same name (perhaps because of speaker deixis), it appears to us that reality is changeable – that changing the definition of words changes what we see
However, what we’re actually encountering is a question that has hidden variables, where the meaning of the question is dependent on a hidden variable that changes with the concepts our words point to
When we change which concepts our words point to, by changing our words’ definitions, it appears that the answer to the question shifts, when in reality, we’ve asked a different question