proofread latl-primitives; add structure

This commit is contained in:
sorrel 2024-02-25 20:45:11 -05:00
parent b64d7d47e0
commit ac176553f8

View file

@ -6,11 +6,11 @@
((id "intro")) ((id "intro"))
(hgroup (hgroup
(h1 "what must be true of latl primitives") (h1 "what must be true of latl primitives")
(p (em "what will latl need to ") (u "do") " out of the box?")) (p (em "what will latl need to " (u "do") " out of the box?")))
(p "i've talked a little about " (p "i've talked a little about "
(a ((href "/unsettled/1")) "conlanging and latl") (a ((href "/unsettled/1")) "conlanging and latl")
" previously here. the short version is this: making languages (for theoretical conscious beings) is fun! it's been a consistent hobby and artistic pursuit for me for much of my life. i've had different approaches from making extremely regular languages, to simulating the evolution of a family of spoken languages, to a family of synthesizer languages for a fractured machine society. every language project requires keeping track of a dictionary and a grammar (even if they never become more than quick sketches) and any sufficiently involved project can benefit from tools for generating new words that fit a language's 'phonotactics', generating derived words based on grammatical rules, simulating language change over time. conlanging is a hobby with enough overlap with computation, that some conlangers have created tools for some of these tasks. my own projects have become too ambitious to have my work live in spreadsheets over here and latex files over there and text files with the defintions i provide to web-based tools somewhere else entirely. i want to build on the work of those came before and create a substrate upon which any tool a conlanger needs could be built and in which i can define the entirety of a language in one system.") " previously here. the short version is this: making languages (for theoretical conscious beings) is fun! it's been a consistent hobby and artistic pursuit for me for much of my life. i've had different approaches from making extremely regular languages, to simulating the evolution of a family of spoken languages, to a family of synthesizer languages for a fractured machine society. every language project requires keeping track of a dictionary and a grammar (even if they never become more than quick sketches) and any sufficiently involved project can benefit from tools for generating new words that fit a language's 'phonotactics', generating derived words based on grammatical rules, simulating language change over time. conlanging is a hobby with enough overlap with computation, that some conlangers have created tools for some of these tasks. my own projects have become too ambitious to have my work live in spreadsheets over here and latex files over there and text files with the defintions i provide to web-based tools somewhere else entirely. i want to build on the work of those came before and create a substrate upon which any tool a conlanger needs could be built and in which i can define the entirety of a language in one system.")
(p "at it's base latl will be a tool for operating on languages, invented (or otherwise... maybe.) it helps then to think of the sorts of things that encompass language. spoken and signed languages will be the assumed base case language, as those are the things real human beings usually use to communicate with each other. the modality of a language needn't be important to the primitives used, but it's always a good practice to state assumptions")) (p "at it's base latl will be a tool for operating on languages, invented (or otherwise... maybe.) it helps then to think of the sorts of things that encompass language. spoken and signed languages will be the assumed base case, as those are the things real human beings usually use to communicate with each other. the modality of a language needn't be important to the primitives used, but it's always a good practice to state assumptions"))
(section (section
((id "contents")) ((id "contents"))
(ul (li (a ((href "#what-is-language")) "go to what-is-language")) (ul (li (a ((href "#what-is-language")) "go to what-is-language"))
@ -27,20 +27,32 @@
((id "what-is-language")) ((id "what-is-language"))
(hgroup (hgroup
(h2 "thinking about what language is for a moment")) (h2 "thinking about what language is for a moment"))
(p "by way of an opening query: what does a language need? naturally our base case spoken languages have sounds and our base case signed languages have gestures. for each of these, we have an articulatory mechanism: the vocal tract, or the hand and arm in relation to the body; and a perceptual mechanism: the auditory system, or the visual system. the unique thing about language among other forms of communication is how languages use time. it might seem basic, but it's easy to forget that 'my cat scratched the post' and 'the post scratched my cat' mean very different things. and the same sounds in a different sequence can become a smattering of ideas as in 'the my scratched cat post' or even lose meaning all together as in 'cra catsm sde thymst opsh'. languages differ on how they use time--some languages allow freer word order than others, but relationships between articulations through time are essential to all known languages. for now, let's start with those articulatory bits! (more on time and on meaning later)") (p "by way of an opening query: what does a language need? naturally our base case spoken languages have sounds and our base case signed languages have gestures. for each of these, we have an articulatory mechanism: the vocal tract, or the face and hands and arms; and a perceptual mechanism: the auditory system, or the visual system. the unique thing about language among other forms of communication is how languages use time. it might seem basic, but it's easy to forget that 'my cat scratched the post' and 'the post scratched my cat' mean very different things. and the same sounds in a different sequence can become a smattering of ideas as in 'the my scratched cat post' or even lose meaning all together as in 'cra catsm sde thymst opsh'. languages differ on how they use time--some languages allow more freedom for word order than others, but relationships between articulations through time are essential to all known languages. for now, let's start with those articulatory bits! (more on time and on meaning later)")
;; 1.1 phonemes ;; 1.1 phonemes
(section (section
((id "phonemes")) ((id "phonemes"))
(hgroup (hgroup
(h3 "phonemes") (h3 "phonemes")
(p (em "what even are they?"))) (p (em "what even are they?")))
(p "no talk of letters here! no \"'ghoti' is pronounced 'fish'\" jokes! instead, let's imagine how to describe the difference in articulation and in reception between 'mine' and 'fine' and 'wine' and and 'dine' or between ''. these words rhyme! which (for short words) is just a way of saying that most of their sounds are the same, and the similar bits come at the end. it is uncontroversial to suggest that there is a basic articulatory/perceptual unit in any given modality which, when strung together, produces the basic units of meaning. this basic unit, no matter the modality, is called the 'phoneme' (historically, called a 'chereme' in sign languages). i've already hinted at how linguists support the existence of these phonemes. 'dine' and 'fine' together form a 'minimal pair' of words with a different meaning that differ in only one perceptually distinct part of their articulation. human bodies are inexact things--perception is important here! it does us no good to describe an extra-tightly clenched middle finger in a closed hand shape as indicative of a distinct phoneme as it would be unlikely to be perceptible to an interlocutor and so could never disambiguate between two signs. environments are noisy and so articulation is also important! 'fish' and 'shiff' might be hard to distinguish on a windy overlook or in a compressed recording, but they are ") ;; TODO START HERE (p "no talk of letters here! no "
(q ((style "font-family: serif; font-style: italic; padding: 1px 2px;"))
"'ghoti' is pronounced 'fish'")
" jokes! instead, let's imagine how to describe the difference in articulation and in reception between 'mine' and 'fine' and 'wine' and 'dine' or between '"
;; add ASL rhymes
"'. these words rhyme! which (for short words) is a way of saying that most of their sounds are the same, and the similar bits come at the end. "
(footnote "rhyme is a little more complicated than that, encompassing stress patterns") ;; describe ASL rhymes
;; awkward
""
"it is uncontroversial to suggest that there is a basic articulatory/perceptual unit in any given modality which, when strung together, produces the basic units of meaning. this basic unit, no matter the modality, is called the 'phoneme'."
(footnote "historically, called a 'chereme' in sign languages")
"in selecting the words i have, i've already hinted at how linguists support the existence of these phonemes. 'dine' and 'fine' together form a 'minimal pair' of words with a different meaning whose difference is found in only one perceptually distinct part of their articulation. because we know from english language usage that 'mine' and 'fine' and 'wine' and 'dine' are distinct words with distinct meaning, we have a clue that there is some phonemic difference between /m/ and /f/ and /w/ and /d/. by collecting more examples of these minimal pairs ('do' and 'moo' and 'wed' and 'dead' and on and on) we can begin to describe the physical sounds associated with each phoneme and how each is articulated.")
(p "human bodies are inexact things--perception is important here! it does us no good to describe an extra-tightly clenched middle finger in a closed hand shape as indicative of a distinct phoneme as it would be unlikely to be perceptible to an interlocutor and so could never disambiguate between two signs. environments are noisy and so articulation is also important! in my dialect of english the word 'put' /pʊt/, in a noisy environment, might be pronounced roughly [pʰʊtʰ]. in casual speach, however this same word is frequently realized as [pʰɵʔ] with the only audible consonant at the end being the glottal closure of 'uh-oh'. the only ghost of the exaggerated realization is typically an inaudible tongue placement behind the alveolar ridge. a speaker recognizes what the phoneme 'could be' with more effort, but typically such effort is unnecessary for understanding. this suggests that there phonemes are not simply sounds or handshapes or mouth movements. something must be underlying the equality of meaning between [pʰʊtʰ] and [pʰɵʔ]")
(p "there's a fairly wide consensus amongst linguists that, despite being the minimal constituent needed to represent meaning in language, phonemes are " (strong "not atomic.") " a phoneme can be decomposed into constituent features and minimal pairs of phonemes can be shown to be distinct only in their realization of one feature. by way of example, the [b] in the word 'shabby' and the [m] in the word 'shammy' differ only in that the [m] is pronounced with air passing through the nasal cavity. the feature [+/- nasal] is therefore taken to be a salient feature in english phonology") (p "there's a fairly wide consensus amongst linguists that, despite being the minimal constituent needed to represent meaning in language, phonemes are " (strong "not atomic.") " a phoneme can be decomposed into constituent features and minimal pairs of phonemes can be shown to be distinct only in their realization of one feature. by way of example, the [b] in the word 'shabby' and the [m] in the word 'shammy' differ only in that the [m] is pronounced with air passing through the nasal cavity. the feature [+/- nasal] is therefore taken to be a salient feature in english phonology")
(p "all well and good, but things start to get tricky when we start defining features. firstly, there is no single agreed upon set of features by which to analyze all languages of a given modality. as stated, there's broad agreement that phonetic features exist and many proposed features are uncontroversial, yet even linguists analyzing the same language can disagree upon featural details. vowels in particular are quite slippery to analyze, with [+/- back], [+/- close], [+/- front], [+/- low], [+/- high], [+/- tongue root retracted], [+/- rounded] among the features present in different systems. there are also some linguists who, relying on auditory analysis, analyze vowels primarily via formant analysis. (formants are measures of what is sometimes referred to as 'resonance' or 'vowel color' -- they are the pitches above the fundamental frequency with the greatest relative amplitude.) it is this amateur crank's opinion that because articulation and perception are subject to different constraints and pressures, what is deemed a feature can elide a relationship between speach actor and interlocutor. thankfully, should latl allow for user definition of features and their phonemes, it can remain agnostic to the hairy work of actual linguistics") (p "all well and good, but things start to get tricky when we start defining features. firstly, there is no single agreed upon set of features by which to analyze all languages of a given modality. as stated, there's broad agreement that phonetic features exist and many proposed features are uncontroversial, yet even linguists analyzing the same language can disagree upon featural details. vowels in particular are quite slippery to analyze, with [+/- back], [+/- close], [+/- front], [+/- low], [+/- high], [+/- tongue root retracted], [+/- rounded] among the features present in different systems. there are also some linguists who, relying on auditory analysis, analyze vowels primarily via formant analysis. (formants are measures of what is sometimes referred to as 'resonance' or 'vowel color' -- they are the pitches above the fundamental frequency with the greatest relative amplitude.) it is this amateur crank's opinion that because articulation and perception are subject to different constraints and pressures, what is deemed a feature can elide a relationship between speach actor and interlocutor. thankfully, should latl allow for user definition of features and their phonemes, it can remain agnostic to the hairy work of actual linguistics")
(p "users should therefore be able to define their own phonetic feature sets and use those to compose their phonemes. (i'm going to sneak in the undefended assertion here that users should be able to use "(em "other users'") " definitions as well. forgive me.) if you're reading this and are familiar with linguistics, you might now be wondering about the curious case of place of articulation. should place features be treated as hierarchichal -- should [coronal] place of articulation be required for [+/- anterior] feature of the crown of the tongue? if so, how are coarticulations like [tʷ] or [k͡p] to be expressed in featural terms? here again, latl will allow for the definition of hierarchichal features and make no assumptions about their use") (p "users should therefore be able to define their own phonetic feature sets and use those to compose their phonemes. (i'm going to sneak in the undefended assertion here that users should be able to use "(em "other users'") " definitions as well. forgive me.) if you're reading this and are familiar with linguistics, you might now be wondering about the curious case of place of articulation. should place features be treated as hierarchichal -- should [coronal] place of articulation be required for [+/- anterior] feature of the crown of the tongue? if so, how are coarticulations like [tʷ] or [k͡p] to be expressed in featural terms? here again, latl will allow for the definition of hierarchichal features and make no assumptions about their use")
(p "yet another problem is hiding in the view i've thus provided of phonological features. there is a wide (but not universal) belief that distinctive features in phonology are inherently binary. this is convenient from a computational perspective, but may not be descriptive of real language. firstly, it is possible to analyze [coronal] in the previous paragraph as a unary feature relevant to place of articulation. more distressingly, a proposed feature set that includes [+/- high] and [+/- low] predicts the nonsense value set: {[+ high] [+ low]}. one approach to this conundrum is to propose a feature scale [-1/0/1 height]. this is far from a settled matter, but latl should prioritize a user's ability to define such feature scales over implementation considerations or linguistic debate") (p "yet another problem is hiding in the view i've thus provided of phonological features. there is a wide (but not universal) belief that distinctive features in phonology are inherently binary. this is convenient from a computational perspective, but may not be descriptive of real language. firstly, it is possible to analyze [coronal] in the previous paragraph as a unary feature relevant to place of articulation. more distressingly, a proposed feature set that includes [+/- high] and [+/- low] predicts the nonsense value set: {[+ high] [+ low]}. one approach to this conundrum is to propose a feature scale [-1/0/1 height]. this is far from a settled matter, but latl should prioritize a user's ability to define such feature scales over implementation considerations or linguistic debate")
(p "it's been a few paragraphs without any mention of sign languages, so it is worth gesturing at how their phonological features relate to these considerations to ensure latl doesn't start it's life with a modality bias. sign languages are widely understood to have phonological systems that are featural. as is the case with spoken languages, specifics of feature sets vary based on language and researcher. features can be salient to a language and form minimal pairs ie [+/- palm prone] is one way of reading the difference between the ASL fingerspelling signs for /p/ and /k/. research suggests that there is a high degree of hierarchichal complexity in the phonological features of sign languages, which maps very neatly to the place of articulation problem in spoken languages. features related to handshape, such as [+/- flex] or [+/- extension] only make sense in regards to selected fingers. i have not seen any research about featural scales in sign languages, but it would be unsurprising to analogize the same issues arising from nonsense combinations of binary features") ;; TODO add some ASL images (maybe sound for some of the english bits? (p "it's been a few paragraphs without any mention of sign languages, so it is worth gesturing at how their phonological features relate to these considerations to ensure latl doesn't start it's life with a modality bias. sign languages are widely understood to have phonological systems that are featural. as is the case with spoken languages, specifics of feature sets vary based on language and researcher. features can be salient to a language and form minimal pairs ie [+/- palm prone] is one way of reading the difference between the ASL fingerspelling signs for /p/ and /k/. research suggests that there is a high degree of hierarchichal complexity in the phonological features of sign languages, which maps very neatly to the place of articulation problem in spoken languages. features related to handshape, such as [+/- flex] or [+/- extension] only make sense in regards to selected fingers. i have not seen any research about featural scales in sign languages, but it would be unsurprising to analogize the same issues arising from nonsense combinations of binary features") ;; TODO add some ASL images (maybe sound for some of the english bits?
(p "let's zoom back out to phonemes for a moment to add another wrinkle to the featural representation. the notion (unconscious or not) a speaker of a language has for what constitutes a single sound is understood to be a 'bundle' of features, but not every feature holds the same importance in every environment. by way of example, the /t/ phoneme in my dialect of english can be realized in a number of different ways depending on its location. it can be aspirated [tʰɑk] with [+ spread glottis] (or [+ delayed onset] if you prefer an auditory approach) in 'tock', without aspiration [stɑk] [- spread glottis] in 'stock', or as a flap [ˈbʌ.ɾək] in 'buttock'. this flap differs from the others at least in having [+ sonorant] and [+ voice], but retaining [coronal] [+ anterior]. yet, if i heard *[ɾɑk] in isolation, i would assume the speaker was referring to a stone or a genre of music. this situation is called allophony and latl must maintain a way to treat phonemes like /t/ as salient bundles of features distinct from the more discrete phones [tʰ], [t], [ɾ] whose features are more specified. once again, we see a similar situation with regards the ASL phoneme, /e handshape/ which has allophonic representations [+ open aperture] (the unmarked /e/ familiar in the fingerspelled alphabet) and [- open aperture] in certain environments") (p "let's zoom back out to phonemes for a moment to add another wrinkle to the featural representation. the notion (unconscious or not) a speaker of a language has for what constitutes a single sound is understood to be a 'bundle' of features, but not every feature holds the same importance in every environment. by way of example, the /t/ phoneme in my dialect of english can be realized in a number of different ways depending on its location. it can be aspirated [tʰɑk] with [+ spread glottis] (or [+ delayed onset] if you prefer an auditory approach) in 'tock', without aspiration [stɑk] [- spread glottis] in 'stock', or as a flap [ˈbʌ.ɾək] in 'buttock'. this flap differs from the others at least in having [+ sonorant] and [+ voice], but retaining [coronal] [+ anterior]. yet, if i heard *[ɾɑk] in isolation, i would assume the speaker was referring to a stone or a genre of music. this situation is called allophony and latl must maintain a way to treat phonemes like /t/ as salient bundles of features distinct from the more discrete phones [tʰ], [t], [ɾ] (or [ʔ] from the earlier example) whose features are more specified. once again, we see a similar situation with regards the ASL phoneme, /e handshape/ which has allophonic representations [+ open aperture] (the unmarked /e/ familiar in the fingerspelled alphabet) and [- open aperture] in certain environments")
(p "warning! that [r] in my dialect of english, is an allophone of two different phonemes! the realization of the words /bæt.ər/ and /bæd.ər/ ('batter' and 'badder') is the same: [bæɾ.ɚ]. this 'under-specification' of not unique to my dialect of english and some linguists propose an archiphoneme /D/ which is a kind of set of /t/ and /d/ to account for this. in this view 'batter' and 'badder' are orthographically distinct, but phonemically both /bæD.ər/. is this 'really' what is going on? i'm not qualified to say, but i am confident that latl can be made to handle this situation without straining our abstractions too much") (p "warning! that [r] in my dialect of english, is an allophone of two different phonemes! the realization of the words /bæt.ər/ and /bæd.ər/ ('batter' and 'badder') is the same: [bæɾ.ɚ]. this 'under-specification' of not unique to my dialect of english and some linguists propose an archiphoneme /D/ which is a kind of set of /t/ and /d/ to account for this. in this view 'batter' and 'badder' are orthographically distinct, but phonemically both /bæD.ər/. is this 'really' what is going on? i'm not qualified to say, but i am confident that latl can be made to handle this situation without straining our abstractions too much")
(p "to recap thus far, we have phonemes, which for the purpose of latl are bundles of features of some value. features may be defined by the user of latl into feature systems, whereby they are usually but not always binary and may each have a dependency on another feature in the system. phonemes may have features of varying saliency allowing for allophony. these allophones are phones whose features are slightly different but retain the salient features of their phoneme, whether that phoneme is specified or an underspecified archiphoneme that could represent multiple phonemes. as an additional item, it is helpful to have a shorthand to refer to phonemes and their allophones, ie "(code "/t/") ", "(code "[tʰ]") ", " (code "[t]") ", and " (code "[ɾ]") " or " (code "/D/")) (p "to recap thus far, we have phonemes, which for the purpose of latl are bundles of features of some value. features may be defined by the user of latl into feature systems, whereby they are usually but not always binary and may each have a dependency on another feature in the system. phonemes may have features of varying saliency allowing for allophony. these allophones are phones whose features are slightly different but retain the salient features of their phoneme, whether that phoneme is specified or an underspecified archiphoneme that could represent multiple phonemes. as an additional item, it is helpful to have a shorthand to refer to phonemes and their allophones, ie "(code "/t/") ", "(code "[tʰ]") ", " (code "[t]") ", and " (code "[ɾ]") " or " (code "/D/"))
(p "an EBNF grammar (because grammars are fun!) of this relationship might be ") (p "an EBNF grammar (because grammars are fun!) of this relationship might be ")
@ -66,15 +78,18 @@
(p "before moving on")) (p "before moving on"))
;; 1.3 morphosyntax or morphemes? ;; 1.3 morphosyntax or morphemes?
(section (section
((id "")) ((id "morphosyntax"))
) (hgroup
(h3 "morphosyntax")
(p (em "where meaning and phonology and time start getting funky"))))
(p "")
) )
;; section 2. okay but conlangers ;; section 2. okay but conlangers
(section (section
((id "what-conlangers-do")) ((id "what-conlangers-do"))
(hgroup (hgroup
(h2 "") (h2 "what conlangers do")
(p (em ""))) (p (em "moving from a pile of language stuff to a pile of problems to solve")))
(p "") (p "")
(section (section
(hgroup (hgroup