adding to intro. defining phonemes before diving into features

2024-02-08 21:11:17 -05:00 · 2024-02-08 21:11:17 -05:00 · 09bff636e3
commit 09bff636e3
parent 4983b40f78
1 changed files with 79 additions and 24 deletions
--- a/in-progress/latl-primitives.scm
+++ b/in-progress/latl-primitives.scm
@ -1,18 +1,41 @@
 #lang s-exp racket
 '(article
- (hgroup
-  (h1 "what must be true of latl primitives")
-  (p (em "what will latl need to ") (u "do") " out of the box?"))
- (p "at it's base latl is a tool for operating on languages, invented (or otherwise. maybe.) it helps then to think of the sorts of things that encompass language. spoken and signed languages will be the assumed base case language, as those are the things real human beings usually use to communicate with each other. the modality of a language needn't be important to the primitives used, but it's always a good practice to state assumptions")
- (section
-  (hgroup
-   (h2 "thinking about language for a moment"))
-  (p "by way of an opening query, i'm posing (to myself) the question: what does a human language need? naturally our base case spoken language has sounds and our base case signed language has gestures. for each of these cases, we have an articulatory mechanism (the vocal tract or the hand and arm in relation to the body) and a perceptual mechanism (the auditory system or the visual system.) i'm starting very basic with these assumptions, because decisions made here will have an outsized effect on the ultimate design. it is uncontroversial to suggest that there is a basic unit of a given language modality, a phoneme (historically, called a chereme in sign languages) which, when stung together, produce the basic units of meaning. (more on the basic units of meaning later)")
+  ((id "latl-primitives"))
+  ;; brief intro - restate the problem
  (section
+   ((id "intro"))
   (hgroup
-    (h3 "phonemes")
-    (p (em "what even are they?")))
-   (p "as just stated a phoneme is the basic articulatory and perceptual unit of a language. there's a fairly wide consensus amongst linguists that, despite being the minimal constituent needed to represent meaning in language, phonemes are ") (strong "not atomic.") " a phoneme can be decomposed into constituent features and minimal pairs of phonemes can be shown to be distinct only in their realization of one feature. by way of example, the [b] in the word 'shabby' and the [m] in the word 'shammy' differ only in that the [m] is pronounced with air passing through the nasal cavity. the feature [+/- nasal] is therefore taken to be a salient feature in english phonology")
+    (h1 "what must be true of latl primitives")
+    (p (em "what will latl need to ") (u "do") " out of the box?"))
+   (p "i've talked a little about "
+      (a ((href "/unsettled/1")) "conlanging and latl")
+      " previously here. the short version is this: making languages (for theoretical conscious beings) is fun! it's been a consistent hobby and artistic pursuit for me for much of my life. i've had different approaches from making extremely regular languages, to simulating the evolution of a family of spoken languages, to a family of synthesizer languages for a fractured machine society. every language project requires keeping track of a dictionary and a grammar (even if they never become more than quick sketches) and any sufficiently involved project can benefit from tools for generating new words that fit a language's 'phonotactics', generating derived words based on grammatical rules, simulating language change over time. conlanging is a hobby with enough overlap with computation, that some conlangers have created tools for some of these tasks. my own projects have become too ambitious to have my work live in spreadsheets over here and latex files over there and text files with the defintions i provide to web-based tools somewhere else entirely. i want to build on the work of those came before and create a substrate upon which any tool a conlanger needs could be built and in which i can define the entirety of a language in one system.")
+   (p "at it's base latl will be a tool for operating on languages, invented (or otherwise... maybe.) it helps then to think of the sorts of things that encompass language. spoken and signed languages will be the assumed base case language, as those are the things real human beings usually use to communicate with each other. the modality of a language needn't be important to the primitives used, but it's always a good practice to state assumptions"))
+  (section
+   ((id "contents"))
+   (ul (li (a ((href "#what-is-language")) "go to what-is-language"))
+       (li (a ((href "#what-conlangers-do")) "go to what-conlangers-do"))
+       (li (a ((href "#proposed-latl-primitives")) "go to proposed-latl-primitives"))
+       (li (a ((href "#signoff")) "go to signoff"))))
+  ;; article contains three main sections:
+  ;; 1. thinking about language
+  ;; 2. talking about conlangers
+  ;; 3. working through some ideas for primitives
+  ;; ---
+  ;; section 1. thinking about language
+  (section
+   ((id "what-is-language"))
+   (hgroup
+    (h2 "thinking about what language is for a moment"))
+   (p "by way of an opening query: what does a language need? naturally our base case spoken languages have sounds and our base case signed languages have gestures. for each of these, we have an articulatory mechanism: the vocal tract, or the hand and arm in relation to the body; and a perceptual mechanism: the auditory system, or the visual system. the unique thing about language among other forms of communication is how languages use time. it might seem basic, but it's easy to forget that 'my cat scratched the post' and 'the post scratched my cat' mean very different things. and the same sounds in a different sequence can become a smattering of ideas as in 'the my scratched cat post' or even lose meaning all together as in 'cra catsm sde thymst opsh'. languages differ on how they use time--some languages allow freer word order than others, but relationships between articulations through time are essential to all known languages. for now, let's start with those articulatory bits! (more on time and on meaning later)")
+   ;; 1.1 phonemes
+   (section
+    ((id "phonemes"))
+    (hgroup
+     (h3 "phonemes")
+     (p (em "what even are they?")))
+    (p "no talk of letters here! no \"'ghoti' is pronounced 'fish'\" jokes! instead, let's imagine how to describe the difference in articulation and in reception between 'mine' and 'fine' and 'wine' and and 'dine' or between ''. these words rhyme! which (for short words) is just a way of saying that most of their sounds are the same, and the similar bits come at the end. it is uncontroversial to suggest that there is a basic articulatory/perceptual unit in any given modality which, when strung together, produces the basic units of meaning. this basic unit, no matter the modality, is called the 'phoneme' (historically, called a 'chereme' in sign languages). i've already hinted at how linguists support the existence of these phonemes. 'dine' and 'fine' together form a 'minimal pair' of words with a different meaning that differ in only one perceptually distinct part of their articulation. human bodies are inexact things--perception is important here! it does us no good to describe an extra-tightly clenched middle finger in a closed hand shape as indicative of a distinct phoneme as it would be unlikely to be perceptible to an interlocutor and so could never disambiguate between two signs. environments are noisy and so articulation is also important! 'fish' and 'shiff' might be hard to distinguish on a windy overlook or in a compressed recording, but they are ") ;; TODO START HERE 
+    (p "there's a fairly wide consensus amongst linguists that, despite being the minimal constituent needed to represent meaning in language, phonemes are " (strong "not atomic.") " a phoneme can be decomposed into constituent features and minimal pairs of phonemes can be shown to be distinct only in their realization of one feature. by way of example, the [b] in the word 'shabby' and the [m] in the word 'shammy' differ only in that the [m] is pronounced with air passing through the nasal cavity. the feature [+/- nasal] is therefore taken to be a salient feature in english phonology")
   (p "all well and good, but things start to get tricky when we start defining features. firstly, there is no single agreed upon set of features by which to analyze all languages of a given modality. as stated, there's broad agreement that phonetic features exist and many proposed features are uncontroversial, yet even linguists analyzing the same language can disagree upon featural details. vowels in particular are quite slippery to analyze, with [+/- back], [+/- close], [+/- front], [+/- low], [+/- high], [+/- tongue root retracted], [+/- rounded] among the features present in different systems. there are also some linguists who, relying on auditory analysis, analyze vowels primarily via formant analysis. (formants are measures of what is sometimes referred to as 'resonance' or 'vowel color' -- they are the pitches above the fundamental frequency with the greatest relative amplitude.) it is this amateur crank's opinion that because articulation and perception are subject to different constraints and pressures, what is deemed a feature can elide a relationship between speach actor and interlocutor. thankfully, should latl allow for user definition of features and their phonemes, it can remain agnostic to the hairy work of actual linguistics")
   (p "users should therefore be able to define their own phonetic feature sets and use those to compose their phonemes. (i'm going to sneak in the undefended assertion here that users should be able to use "(em "other users'") " definitions as well. forgive me.) if you're reading this and are familiar with linguistics, you might now be wondering about the curious case of place of articulation. should place features be treated as hierarchichal -- should [coronal] place of articulation be required for [+/- anterior] feature of the crown of the tongue? if so, how are coarticulations like [tʷ] or [k͡p] to be expressed in featural terms? here again, latl will allow for the definition of hierarchichal features and make no assumptions about their use")
   (p "yet another problem is hiding in the view i've thus provided of phonological features. there is a wide (but not universal) belief that distinctive features in phonology are inherently binary. this is convenient from a computational perspective, but may not be descriptive of real language. firstly, it is possible to analyze [coronal] in the previous paragraph as a unary feature relevant to place of articulation. more distressingly, a proposed feature set that includes [+/- high] and [+/- low] predicts the nonsense value set: {[+ high] [+ low]}. one approach to this conundrum is to propose a feature scale [-1/0/1 height]. this is far from a settled matter, but latl should prioritize a user's ability to define such feature scales over implementation considerations or linguistic debate")
@ -22,13 +45,15 @@
   (p "to recap thus far, we have phonemes, which for the purpose of latl are bundles of features of some value. features may be defined by the user of latl into feature systems, whereby they are usually but not always binary and may each have a dependency on another feature in the system. phonemes may have features of varying saliency allowing for allophony. these allophones are phones whose features are slightly different but retain the salient features of their phoneme, whether that phoneme is specified or an underspecified archiphoneme that could represent multiple phonemes. as an additional item, it is helpful to have a shorthand to refer to phonemes and their allophones, ie "(code "/t/") ", "(code "[tʰ]") ", " (code "[t]") ", and " (code "[ɾ]") " or " (code "/D/"))
   (p "an EBNF grammar (because grammars are fun!) of this relationship might be ")
   (code "phoneme = positive-integer * phone { phoneme } ) ; (* a phoneme must be a set of phones and optional (archi-)phoneme *)" (br)
-   "phone = positive-integer * feature ; (* a phone must be a set of features *)" (br)
-   "feature = ( value, identifier ) | positive-integer * feature ; (* a feature must be a value with some identifier or a set of (dependent) features *)" (br)
-   "value = non-negative integer ;" (br)
-   "identifier = letter, { letter | \"-\" } ; (* lispy identifiers assumed for now *)" (br)
-   "non-negative-integer = digit , { digit } ; (* from here i'll take for granted the definition of digits and letters *)" (br))
+         "phone = positive-integer * feature ; (* a phone must be a set of features *)" (br)
+         "feature = ( value, identifier ) | positive-integer * feature ; (* a feature must be a value with some identifier or a set of (dependent) features *)" (br)
+         "value = non-negative integer ;" (br)
+         "identifier = letter, { letter | \"-\" } ; (* lispy identifiers assumed for now *)" (br)
+         "non-negative-integer = digit , { digit } ; (* from here i'll take for granted the definition of digits and letters *)" (br))
   (p "this grammar is insufficient to the purpose, but i include it to point at the recursive nature of both phonemes and features revealed by the constraints defined so far. an additional constraint must be that features are bound in a global feature system and a featural definition of a phone requires values for every possible feature within that feature system. additionally, a feature value can be any within a bound set where each feature can be associated with a different set; so, [+/- nasal] and [-1/0/1 high] can exist within the same feature system, but any instance of [nasal] must have a value of [+] or [-] and any value of [high] must have [-1], [0], or [1]. the grammar handwaves with non-negative-integer by analogy with enums in many programming languages. this grammar also defines a language that would be repetitive and finicky to work with. instead of optimizing, i'd like to take a moment to consider the phoneme already solved in latl and think a little bit about how they're used"))
-  (section
+   ;; 1.2 lexemes
+   (section
+   ((id "lexemes"))
   (hgroup
    (h3 "lexemes")
    (p (em "zooming out a little to the fundamental unit of meaning")))
@ -38,10 +63,40 @@
   (p "i've also snuck in the pair 'she' and 'her'. traditionally, 'her' is held to be a derived form of 'she' violating our 'root morpheme' assumption. leaving aside the linguistic reasons to consider 'her' a derived form, there's still the question of what plausible derivation rule could turn the sound sequence /ʃi/ into /hɜɹ/? (" (small "the ancestral form of 'her' probably was transparently derived from the ancestral form of 'she', but in this project i'm concerned with how these derivations are obscured by language change through time") ")")
   (p "the write/right example and the she/her example, in slightly different ways, both recall the bidirectional nature of language. an idealized speaker *knows* which specific meaning (specific lexeme?!) of /ɹajt/ they are referring to, but their interlocutor must derive the appropriate meaning from context. likewise, a proficient speaker produces /ʃi/ and /hɜɹ/ in the appropriate position within a sentence without difficulty, while a language learner may struggle to hear the connection between the two forms. (other interesting possibilities include using one or the other form in all locations or in random distribution; analogizing the regularity of /hi/->/hɪm/ ('he'/'him') to /ʃi/->/ʃɪm/ where a /hɜɹ/ is expected; or using /hi/, /ʃi/, /ðej/ 'they' or other third person pronouns interchangeably. all of these point at some other juicy stuff that will have to be shelved for now.) this bidirectionality means that latl will need to support the mapping of a sequence of phonemes to an arbitrary number of lexemes, although for now it's safe to assume that a lexeme has only one associated sequence of phonemes. (ignoring, for the moment, variant pronunciations as in 'the' /ði/~/ðə/)")
   (p "a lexeme will probably need some additional stuff, tho. at the very least a 'dictionary definition' and, of course, a shorthand, ie " (code "/ʃi/") ", " (code "/hɜɹ/") ", or " (code "/ɹajt/")". there's absolutely more to what latl will require from a lexeme (and users should be able to extend the lexeme primitive to their own ends) but that will have to wait for now")
-   ))
- (section
-  (hgroup
-   (h2 "what's in a sound change rule?")
-   (p (em "using previous work as a starting point")))
-  "")
- )
+   (p "before moving on"))
+   ;; 1.3 morphosyntax or morphemes?
+   (section
+   ((id ""))
+   )
+  )
+  ;; section 2. okay but conlangers
+  (section
+   ((id "what-conlangers-do"))
+   (hgroup
+    (h2 "")
+    (p (em "")))
+   (p "")
+   (section
+    (hgroup
+     (h3 "what's in a sound change rule?")
+     (p (em "using previous work as a starting point")))
+    ""))
+  ;; section 3. introducing the primitives
+  (section
+   ((id "proposed-latl-primitives"))
+   (hgroup
+    (h2 "")
+    (p (em "")))
+   (p "")
+   (section
+    ((id ""))
+    (hgroup
+     (h3 "")
+     (p (em "")))
+    (p "")))
+  ;; brief conclusion and next steps
+  (section
+   ((id "signoff"))
+   (hgroup
+    (h2 "signing off")
+    (p (em "what's next for latl thinking?")))))