- In Empty Labels and Replace The Symbol With The Substance, we learned that it’s often more useful to replace words with the concepts that they represent
- So why do we bother with defining words at all?
- It’s useful to have short words for commonly defined concepts
- These short words correspond to the conditional probabilities of the concepts they represent
- In an efficient code the code for the combination of Y1Z2 will be just as long as the sum of the codes for Y1Z2 unless P(Y1Z2) > P(Y1)P(Z2)
- This corresponds to the case where we can make an inference about some other property of an object by knowing a particular property
- We don’t replace the word “human” with the definition for “human” because the definition is huge
- However, we’re only able to do this because all of the attributes of humanity are correlated with one another
- “Human” is a relatively dense cluster in thingspace
- The act of defining a word is an implicit promise that the word will help you make inferences
- If you define a word to describe a combination of things that are no more likely than chance to be found togther, then the word you’ve defined is a false promise, since it doesn’t help you make any inferences
- Use words to describe clusters of characteristics that are correlated with one another
- As vast as thingspace is, it’s still much smaller than conceptspace
- What is conceptspace?
- A concept is a rule that includes or excludes examples
- For example, if you have a rule that goes:
{2: yes, 3: no, 14: yes, 23: no, 8: yes, 9: no}
, you might guess that the rule is “even numbers: yes”
- The number of concepts grows faster than exponentially (combinatorically) with the number of number of variables describing the objects you’re interested in
- So, in order to build rules that we can draw inferences from in a reasonable amount of time with a reasonable amount of data, we need to draw simple boundaries around clusters in thingspace
- These boundaries are expressed as intensional definitions
- Singling out any particular concept for consideration is something that has to be done carefully, given the superexponential size of conceptspace
- If you use the right kind of neural network, its inferences are exactly the same as naive Bayesian reasoning
- Algorithms that are “scruffy” or have “emergent properties” often map to variants of Bayesian reasoning
- Part of the reason that people get into trouble using words is because they don’t understand how much complexity words hide
- Words are merely tags that are assigned to enormously complex concepts
- Comparing a word to a concept is like comparing the handle of a paintbrush to the painting created by that brush
- When we use words, we’re often sneaking in a reference to ourselves as part of the concepts we’re referring to with those words
- For example, if a person says that the grocery store is on the left side of the street, they’re implicitly indicating which direction they’ll be traveling
- This phenomenon is known a “speaker deixis”
- If we encounter two concepts for which we have the same name (perhaps because of speaker deixis), it appears to us that reality is changeable – that changing the definition of words changes what we see
- However, what we’re actually encountering is a question that has hidden variables, where the meaning of the question is dependent on a hidden variable that changes with the concepts our words point to
- When we change which concepts our words point to, by changing our words’ definitions, it appears that the answer to the question shifts, when in reality, we’ve asked a different question