just checking in a bunch of changes to the *big* latl-primitives post

2024-02-07 14:49:11 -05:00 · 2024-02-07 14:49:11 -05:00 · 5508d23290
commit 5508d23290
parent af165e5539
2 changed files with 70 additions and 10 deletions
--- a/.idea-log
+++ b/.idea-log
@ -1,3 +1,30 @@
 IDEAS in progress
 ----- - - - - - -
 - root/links
 - unsettled/latl-primitives
 IDEAS for future time
 _____ - - - - - - - -
 - unsettled/that-has-no-name
 - - event system
 - - story as graph
 - unsettled/input-messenger
 - settled/go
 - - client only
 - settled/this-post-is-atomic (oxaliq.net utility scripts and atomicity)
 - unsettled/syntax-considered-harmful #playful
 - unsettled/ddd #playful
 - unsettled/library ????
 TH' LOG
 -------
 2024/02/07
 - working through the giant latl-primitives writeup
 - should really fix the post id thing before i publish it
 2023/01/25
 - published fca writeup
 2023/01/19
 - published about posts linked from index page
 - published beginning-latl
@ -5,8 +32,8 @@
  - links ?
 - to write next
  - conlanging-tools
  - fca (latl first attempt)
  - latl-primitives
    - what is in a sound change rule?
  - syntax-considered-harmful ? (playful)
  - (l)ddd (living)-documentation-driven-development ? (playful) 
--- a/in-progress/latl-primitives.scm
+++ b/in-progress/latl-primitives.scm
@ -1,14 +1,47 @@
-(article
+#lang s-exp racket
 '(article
 (hgroup
-  (h1 "latl primitives")
+  (h1 "what must be true of latl primitives")
-  (p (em "what will latl provide out of the box?")))
+  (p (em "what will latl need to ") (u "do") " out of the box?"))
- (p
+ (p "at it's base latl is a tool for operating on languages, invented (or otherwise. maybe.) it helps then to think of the sorts of things that encompass language. spoken and signed languages will be the assumed base case language, as those are the things real human beings usually use to communicate with each other. the modality of a language needn't be important to the primitives used, but it's always a good practice to state assumptions")
  ;; show some diagrams of what is needed. no pseudocode.
  ;; 
  "")
 (section
  (hgroup
-   (h2 "")
+   (h2 "thinking about language for a moment"))
-   (p (em "")))
+  (p "by way of an opening query, i'm posing (to myself) the question: what does a human language need? naturally our base case spoken language has sounds and our base case signed language has gestures. for each of these cases, we have an articulatory mechanism (the vocal tract or the hand and arm in relation to the body) and a perceptual mechanism (the auditory system or the visual system.) i'm starting very basic with these assumptions, because decisions made here will have an outsized effect on the ultimate design. it is uncontroversial to suggest that there is a basic unit of a given language modality, a phoneme (historically, called a chereme in sign languages) which, when stung together, produce the basic units of meaning. (more on the basic units of meaning later)")
  (section
   (hgroup
    (h3 "phonemes")
    (p (em "what even are they?")))
   (p "as just stated a phoneme is the basic articulatory and perceptual unit of a language. there's , is non-atomic. a phoneme can be decomposed into constituent features and minimal pairs of phonemes can be shown to be distinct only in their realization of one feature. by way of example, the [b] in the word 'shabby' and the [m] in the word 'shammy' differ only in that the [m] is pronounced with air passing through the nasal cavity. the feature [+/- nasal] is therefore taken to be a salient feature in english phonology")
   (p "all well and good, but things start to get tricky when we start defining features. firstly, there is no single agreed upon set of features by which to analyze all languages of a given modality. there's broad agreement that features exist and many proposed features are uncontroversial, yet even linguists analyzing the same language can disagree upon featural details. vowels in particular are quite slippery to analyze, with [+/- back], [+/- close], [+/- front], [+/- low], [+/- high], [+/- tongue root retracted], [+/- rounded] among the features present in different systems. there are also some linguists who, relying on auditory analysis, analyze vowels primarily via formant analysis. (formants are measures of what is sometimes referred to as 'resonance' or 'vowel color' -- they are the pitches above the fundamental frequency with the greatest relative amplitude.) it is this amateur crank's opinion that because articulation and perception are subject to different constraints and pressures, what is deemed a feature can elide a relationship between speach actor and interlocutor. thankfully, should latl allow for user definition of features and their phonemes, it can remain agnostic to the hairy work of actual linguistics")
   (p "users should be able to define their own phonetic feature sets and use those to compose their phonemes. (i'm going to sneak in the undefended assertion here that users should be able to use "(em "other users'") " definitions as well. forgive me.) if you're reading this and are familiar with linguistics, you might now be wondering about the curious case of place of articulation. should place features be treated as hierarchichal -- should [coronal] place of articulation be required for [+/- anterior] feature of the crown of the tongue? if so, how are coarticulations like [tʷ] or [k͡p] to be expressed in featural terms? here again, latl will allow for the definition of hierarchichal features and make no assumptions about their use")
   (p "yet another problem is hiding in the view i've thus provided of phonological features. there is a wide (but not universal) belief that distinctive features in phonology are inherently binary. this is convenient from a computational perspective, but may not be descriptive of real language. firstly, it is possible to analyze [coronal] in the previous paragraph as a unary feature relevant to place of articulation. more distressingly, a proposed feature set that includes [+/- high] and [+/- low] predicts the nonsense value set: {[+ high] [+ low]}. one approach to this conundrum is to propose a feature scale [-1/0/1 height]. this is far from a settled matter, but latl should prioritize a user's ability to define such feature scales over implementation considerations or linguistic debate")
   (p "it's been a few paragraphs without any mention of sign languages, so it is worth gesturing at how their phonological features relate to these considerations to ensure latl doesn't start it's life with a modality bias. sign languages are widely understood to have phonological systems that are featural. as is the case with spoken languages, specifics of feature sets vary based on language and researcher. features can be salient to a language and form minimal pairs ie [+/- palm prone] is one way of reading the difference between the ASL fingerspelling signs for /p/ and /k/. research suggests that there is a high degree of hierarchichal complexity in the phonological features of sign languages, which maps very neatly to the place of articulation problem in spoken languages. features related to handshape, such as [+/- flex] or [+/- extension] only make sense in regards to selected fingers. i have not seen any research about featural scales in sign languages, but it would be unsurprising to analogize the same issues arising from nonsense combinations of binary features")
   (p "let's zoom back out to phonemes for a moment to add another wrinkle to the featural representation. the notion (unconscious or not) a speaker of a language has for what constitutes a single sound is understood to be a 'bundle' of features, but not every feature holds the same importance in every environment. by way of example, the /t/ phoneme in my dialect of english can be realized in a number of different ways depending on its location. it can be aspirated [tʰɑk] with [+ spread glottis] (or [+ delayed onset] if you prefer an auditory approach) in 'tock' without aspiration [stɑk] [- spread glottis] in 'stock' or as a flap [ˈbʌ.ɾək] in 'buttock'. this flap differs from the others at least in having [+ sonorant] and [+ voice], but retaining [coronal] [+ anterior]. yet, if i heard *[ɾɑk] in isolation, i would assume the speaker was referring to a stone or a genre of music. this situation is called allophony and latl must maintain a way to treat phonemes like /t/ as salient bundles of features distinct from the more discrete phones [tʰ], [t], [ɾ] whose features are more specified. once again, we see a similar situation with regards the ASL phoneme, /e handshape/ which has allophonic representations [+ open aperture] (the unmarked /e/ familiar in the fingerspelled alphabet) and [- open aperture] in certain environments")
   (p "warning! that [r] in my dialect of english, is an allophone of two different phonemes! the realization of the words /bæt.ər/ and /bæd.ər/ ('batter' and 'badder') is the same: [bæɾ.ɚ]. this 'under-specification' of  not unique to my dialect of english and some linguists propose an archiphoneme /D/ which is a kind of set of /t/ and /d/ to account for this. in this view 'batter' and 'badder' are orthographically distinct, but phonemically both /bæD.ər/. is this 'really' what is going on? i'm not qualified to say, but i am confident that latl can be made to handle this situation without straining our abstratctions too much")
   (p "to recap thus far, we have phonemes, which for the purpose of latl are bundles of features of some value. features may be defined by the user of latl into feature systems, whereby they are usually but not always binary and may each have a dependency on another feature in the system. phonemes may have features of varying saliency allowing for allophony. these allophones are phones whose features are slightly different but retain the salient features of their phoneme, whether that phoneme is specified or an underspecified archiphoneme that could represent multiple phonemes. as an additional item, it is helpful to have a shorthand to refer to phonemes and their allophones, ie "(code "/t/") ", "(code "[tʰ]") ", " (code "[t]") ", and " (code "[ɾ]") " or " (code "/D/"))
   (p "an EBNF grammar (because grammars are fun!) of this relationship might be ")
   (code "phoneme = positive-integer * phone { phoneme } ) ; (* a phoneme must be a set of phones and optional (archi-)phoneme *)" (br)
   "phone = positive-integer * feature ; (* a phone must be a set of features *)" (br)
   "feature = ( value, identifier ) | positive-integer * feature ; (* a feature must be a value with some identifier or a set of (dependent) features *)" (br)
   "value = non-negative integer ;" (br)
   "identifier = letter, { letter | \"-\" } ; (* lispy identifiers assumed for now *)" (br)
   "non-negative-integer = digit , { digit } ; (* from here i'll take for granted the definition of digits and letters *)" (br))
   (p "this grammar is insufficient to the purpose, but i include it to point at the recursive nature of both phonemes and features revealed by the constraints defined so far. an additional constraint must be that features are bound in a global feature system and a featural definition of a phone requires values for every possible feature within that feature system. additionally, a feature value can be any within a bound set where each feature can be associated with a different set; so, [+/- nasal] and [-1/0/1 high] can exist within the same feature system, but any instance of [nasal] must have a value of [+] or [-] and any value of [high] must have [-1], [0], or [1]. the grammar handwaves with non-negative-integer by analogy with enums in many programming languages. this grammar also defines a language that would be repetitive and finicky to work with. instead of optimizing, i'd like to take a moment to consider the phoneme already solved in latl and think a little bit about how they're used"))
  (section
   (hgroup
    (h3 "lexemes")
    (p (em "zooming out a little to the fundamental unit of meaning")))
   (p "note! for the purpose of this exploration, a lexeme is assumed to be synonymous with 'root morpheme'." (small " if you don't know what this note means, please be aware that i'm being a little bit of a crank again. if you do know what this note means and are suspicious, run with me here for a sec; we'll get to it"))
   (p "for now i'll posit that a lexeme is an ordered sequence of phoneme(s) that corresponds to a productive, atomic meaning. a lexeme MAY be subject to derivation rules which transform its meaning or its role in an utterance, for now called 'derived forms'. this definition allows for any 'part of speach' so long as the lexeme is not derived. taking for granted, for a moment, the category 'word', here's a selection of english words that fit this definition of lexeme: 'a', 'she', 'her', 'for', 'four', 'write', 'right', 'quick', 'quit', 'dirigible', 'abstract'")
   (p "included are 'function words' (the closed set of grammatically necessary words without independent meaning) like 'a', 'she', 'her', and 'for'. 'content words' are also included (the open set of words with semantic weight) beginning with 'four'. but of course, i've also chosen these words to illustrate some potential traps. we have some phonetic ambiguities: 'for' and 'four' are distinct in some english dialects, but i pronounce them both /fɔɹ/. 'write' and 'right' are indistinguishable from each other in every english and sound something like /ɹajt/. the situation is tricky in this case semantically as well! this is one sequence of sounds upon which multiple different etymologies (encoding mark-making, correctness, directionality, or politics) have converged. if the written forms are any hint, there should be at least two separate lexemes")
   (p "i've also snuck in the pair 'she' and 'her'. traditionally, 'her' is held to be a derived form of 'she' violating our 'root morpheme' assumption. leaving aside the linguistic reasons to consider 'her' a derived form, there's still the question of what plausible derivation rule could turn the sound sequence /ʃi/ into /hɜɹ/? (" (small "the ancestral form of 'her' probably was transparently derived from the ancestral form of 'she', but in this project i'm concerned with how these derivations are obscured by language change through time") ")")
   (p "the write/right example and the she/her example, in slightly different ways, both recall the bidirectional nature of language. an idealized speaker *knows* which specific meaning (specific lexeme?!) of /ɹajt/ they are referring to, but their interlocutor must derive the appropriate meaning from context. likewise, a proficient speaker produces /ʃi/ and /hɜɹ/ in the appropriate position within a sentence without difficulty, while a language learner may struggle to hear the connection between the two forms. (other interesting possibilities include using one or the other form in all locations or in random distribution; analogizing the regularity of /hi/->/hɪm/ ('he'/'him') to /ʃi/->/ʃɪm/ where a /hɜɹ/ is expected; or using /hi/, /ʃi/, /ðej/ 'they' or other third person pronouns interchangeably. all of these point at some other juicy stuff that will have to be shelved for now.) this bidirectionality means that latl will need to support the mapping of a sequence of phonemes to an arbitrary number of lexemes, although for now it's safe to assume that a lexeme has only one associated sequence of phonemes. (ignoring, for the moment, variant pronunciations as in 'the' /ði/~/ðə/)")
   (p "a lexeme will probably need some additional stuff, tho. at the very least a 'dictionary definition' and, of course, a shorthand, ie " (code "/ʃi/") ", " (code "/hɜɹ/") ", or " (code "/ɹajt/")". there's absolutely more to what latl will require from a lexeme (and users should be able to extend the lexeme primitive to their own ends) but that will have to wait for now")
   ))
 (section
  (hgroup
   (h2 "what's in a sound change rule?")
   (p (em "using previous work as a starting point")))
  "")
 )