The Tractatus Logico-Philosophicus stands as one of philosophy's most structurally distinctive works of Wittgenstein. Through comprehensive computational linguistic analysis (at the end you can find all the methodology), this study tries to examine how the work's structural and linguistic features align with its philosophical content, to show patterns that support and exemplify Wittgenstein's arguments about language and logic.
In this case and as part of the final project of the subject Linguistics of Humanities Bachelor Degree taught by M. J. Castellà, I write this essay to help understand why the content is strictly related to the continent in Wittgenstein’s Tractatus.
Instructions for the project:
- The work must be prepared using Claude or ChatGPT (free access version) as an assistant and must deal with some of the subjects of the assignment and will have an essay or, at least, argumentative nature. It must answer a question that the student formulates and that will be refined with a series of dialogues with ChatGPT and with classic bibliographical resources, until obtaining a text that sufficiently satisfies the student's inquisitiveness. The final text may incorporate parts elaborated by the AI, parts written by the student himself and parts of other works conveniently cited.
- From now on, the parts that have been written by the LLM will be put in italics. My contributions will be written in regular font.
Research question
Is there any relation between the content of Wittgenstein's work and the syntactic formulation of the statements that compose it? What tendencies and uses do you make of language in his own work and how can it correlate with the content?
The analysis employs computational tools to examine both lexical and grammatical patterns across the text's hierarchical proposition levels. This includes assessment of lemma distribution, type-token ratios, syntactic complexity, and parts of speech distribution, providing quantitative insights into the text's linguistic architecture.
Before starting, I think it is convenient to briefly explain and define these terms: a lemma is the base or canonical form of a word (as it appears in the dictionary), they are words that are not inflected, neither verbally, nor have a plural or gender inflections.
A token is each individual occurrence of a word in a text, regardless of its form. For example, in the phrase "the cat and the black cats". There are 6 tokens (each word counts as a token). Example: "the cat and the black cats"
- Tokens (6): the, cat, and, the, black, cats
- Lemmas (4): the, cat, and, black
The depth of the syntactic tree is a measure that indicates the levels of syntactic dependency in a sentence. In a syntax tree, each word (token) is connected to a "head" (head) on which it is syntactically dependent. Let's take the sentence "The cat eats fresh fish".
- The
- cat
- fish
- fresh
"Eat" is the root (depth 0)."The", "cat" and "fish" depend on "eat" (depth 1) and "fresh" depends on "fish" (depth 2).
For each word, we count how many "jumps" it takes to reach the root. A higher average depth indicates more syntactically complex sentences. A lower depth indicates simpler or linear structures. Subordination rates are measures that indicate the proportion of subordinate clauses in relation to the total number of clauses. The subordination rate is an important indicator of syntactic complexity of the text, writing style and formality of the discourse. In the case of the Tractatus, these rates can help to understand Wittgenstein's argumentative structure. Once this is defined, let's go with the results of the analysis.
The structural analysis shows a carefully crafted hierarchical organization that appears far from arbitrary. Beginning with seven fundamental propositions at level 0, containing 72 tokens and a syntactic depth of 1.38, the text expands through middle levels before returning to seven propositions in its final level.
Level 1 expands to 25 propositions while maintaining similar syntactic depth (1.42), primarily serving to define and clarify the fundamental propositions. Level 2 marks a significant expansion with 120 propositions and the highest syntactic depth (1.52), suggesting increased complexity in argumentation. The text reaches its maximum extension at level 3 with 242 propositions and 1,944 tokens, though with slightly decreased syntactic depth (1.38). Level 4 contracts to 117 propositions while showing the highest subordination ratio (0.20), indicating complex syntactic structures despite reduced depth.
Finally, level 5 returns to seven propositions with minimal syntactic depth (1.14), creating a symmetrical structure that mirrors Wittgenstein's views on language's limits.
Lexical distribution patterns reflects the text's philosophical aims. Nouns and determiners dominate, comprising 45.5% of all words, while verbs maintain a notably low 8.6% frequency. This distribution suggests a style focused on description and definition rather than action or narrative. The high presence of prepositions (13.6%) indicates complex syntactic relationships, while the limited use of adjectives (5.9%), pronouns (5.5%), and adverbs (5.1%) points to a preference for direct, unembellished expression. The remaining percentage is punctuation, which I have also taken into account for the analysis.
The text's lexical density shows an interesting evolution across levels. Early sections display higher lemma counts, with levels 0 and 1 averaging 3.14 and 3.4 lemmas per proposition respectively. This count decreases towards level 5, reaching 2.28 lemmas, while lexical density paradoxically increases to its maximum value of 1.0. This pattern suggests a progression from elaborate explanation to increasingly precise, economical expression.