The Sounds and Sound Patterns of Language | Learn Basic English

Vowels
For vowels, a different set of terms is used.

Transcribing English
Phonology
Phonemes
Syllables
Sonority
Allophones

high-mid-low: height of the tongue in the mouth
front-central-back: frontness or backness of the tongue in the mouth
rounded-unrounded: the state of the lips in English, as in many languages this is predictable: rounded for high back and mid back vowels, unrounded for other vowels.
tense-lax : roughly, the degree of tension in the tongue

The terms refer, loosely speaking, to the location of the main tongue constriction within the mouth.

		Front	Central	Back
High	Tense	i		u
	Lax	I		U
Mid	Tense	E	ə	o
	Lax	ε,	Λ	ɔː

schwa

vowel sounds

i	see	[si]
	seat	[divə]
	diva	[pIn]
e	say	[se]
	plain	[plen]
	take	[tek]
ε,	[tek]	[lεt]
	ten	[tεn]
æ	hat	[haet]
	plaid	[plaed]
	laught	[laef]
a	hot	[hat]
	papa	[papə]
ɔː	saw	[sɔː
	caught	[kɔː]
o	sew	[so]
	roam	[rom]
	home	[hom]
u	put	[put]
	took	[tuk]
u	ooze	[uz]
	use	[yuz]
	bloom	[blumn]
	home	[hom]
	fume	[fyum]

Λ,while slightly lower, is extremely similar to ə. is the stressed vowel in "cup", while ə is the unstressed (second) vowel in "papa".

Λ ə	up	[Λp]
	sofa	[sofə]
	attack	[ətaek]

In addition to these simple vowels, English has several diphthongs (i.e. vowel sounds that essentially combine a vowel with a glide or semi-vowel in a single unit). These are written, therefore, with two phonetic symbols, even if they can (in the case of "long i") be written with one symbol in English spelling.

ay	tie	[tay]
	sigh	[say]
	my	[may]
	mine	[mayn]
aw	cow	[kaw]
	bough	[baw]
	cloud	[klawd]
oy	boy	[boy]
	coin	[koyn]

(It should be noted here that, in most dialects of English, all of the tense vowels are actually diphthongs. For example, say, which we have represented above as [se] is actually pronounced [sey] by most speakers.)

Transcribing English

There are lots of things to be careful about when doing phonetic transcription. Most important is to pay attention to the sounds, and don't be distracted by the spelling. English spelling is not designed to faithfully represent the sounds of words and is frequently quite misleading in this respect, so it's best to try to ignore it. For example, a single letter (or combination of letters) "ng" in English spelling can represent two different pronunciations.

Just a velar nasal [ŋ]

singer, hangar
Here "ng" is a digraph, like "ch"

A velar nasal [ŋ] followed by [g]

finger, anger
Here the two letters represent two sounds, like "nk" in thinker

These have to be distinguished in a correct transcription, even though the spellings are the same -- that's a defect of English orthography.

"finger"

"singer"

"think"

Similarly, "th" is ambiguous.

Voiceless fricative [θ] in thing, ether, thigh
Voiced fricative [ð] in this, either, thy

And vowels especially are spelled chaotically -- but in phonetic transcription a particular vowel sound is always written the same way. Some examples:

sound [i] spelling fee, tea, be, key, thief, Leigh
sound [e] spelling say, great, made, prey, Mae
sound [u] spelling do, food, new, sue, soup, rude
diphthong [ay] spelling sigh, I, eye, my, hide, lie
sequence of sounds [si] beginning of word: see, sea, senile, seize, scenic, siege, ceiling, cedar, cease end of word: juicy, glossy, sexy

Phonology: the structure of sound

Recall the basic distinction mentioned earlier.

From The Speech Chain

Phonemes

The phonological elements of a language are the basic, distinctive sounds, also called phonemes. In English, these are the following (for a dialect of Standard American English).

consonants: p, t, k, b, d, g, č, , f, θ, s, š, h, v, ð, z, ž, m, n, ŋ, l, r, w, y
vowels: i, u, I, U, e, o, , ə, :, æ, a, ay, aw, oy

These sounds are said to be "distinctive" because they can be used to make contrasts between different words. This can be illustrated for the stops, using minimal pairs (words that differ in exactly one sound).

pill
till
kill

bill
dill
gill

And for the vowels (We can't get an exact minimal set for the entire range of vowels in the context [h_d], so in some cases the initial consonant also differs. For each individual pair of vowels, however, we could come up with a minimal pair.):

heed
who'd
hid
hood
aid
ode
head
HUD
awed
had
odd
hide
how'd
Boyd

And for the nasals:
rum
run
rung

In English, the velar nasal [h] can't occur at the beginning of a word -- cf. map, nap, *ngap -- which will lead us to the next issue, the way these elements are organized into words.
But first, note that a basic way in which languages differ is their inventory of sounds, or phonemes. For example:

German has the voiceless velar fricative [x], as in Bach "creek". English has voiceless fricatives such as [s] and velars such as [k], but it doesn't have a single phoneme that has both of these properties. German also has the high front rounded vowel [ü], as in kühn "clever". Again, English has high front [i] and rounded [u], but these properies are not combined in one vowel. English [θ] sets it apart from many languages, including German and French. They have several voiceless fricatives, but not the interdental. When you learn a new language, one of the things you have to do is learn the "list" or inventory of sounds. That's what children have to do also, when learning their native language.

Syllables

onset

nucleus

kitten

coda

Sonority

Languages tend to arrange their syllables so that the least sonorous sounds are restricted to the margins of the syllable -- the onset in the simplest case -- and the most sonorous sounds occur in the center of the syllable -- most often a vowel. Here are some typical English syllables that illustrate this pattern.

"soon"
And in "pretending" each syllable corresponds to a peak in sonority.

"blend"
And in "pretending" each syllable corresponds to a peak in sonority.

As a consequence of this sonority requirement, an English word such as film is one syllable:

But if we try to reverse the last two consonants, the hypothetical word fiml comes out as two syllables, since [l] is a new peak, higher in sonority than the preceding nasal. (This new word would end just like pummel.)

Similarly, if we change the [l] in film to an obstruent such as [z] in hypothetical fizm, once again we end up with a new syllable. (It would rhyme with prism.)

These syllabifications aren't something we need to learn for each word: they're a general property of the language. That's why we know how these hypothetical words would be pronounced.
In these last two words, the consonant serves as the sonority peak at the end of the word. The consonant is syllabic, serving as the nucleus in the absence of a vowel. English permits nasals and liquids to serve in this way, at least in unstressed syllables.
prism, bottom, sump'm (for "something"), cap'm (for "captain") hidden, button, kitten, risen
bottle, little, towel swimmer, higher, butter
For [r], the consonant can function as a vowel even in a stressed syllable.
bird, fur, word
In some dialects, such as Standard British, Boston, and Coastal Southern US, any [r] in the rhyme of a syllable (whether nucleus or coda) loses its r-ness and becomes a schwa-like vowel. These are called "r-less" dialects.

actual words with obstruent + liquid (two steps) brick, true, free, crab; play, blue, flea, glib
possible words with obstruent + liquid blick, clee
impossible words with obstruent + nasal (just one step) *bnick, *fnee, *gmue, *dmay
historical loss of initial consonant in obstruent + nasal (letter now silent) knee, knight, gnat, gnaw

snow (cf.no)

stop (cf.top)

spray (cf.spray)

This is a special property of [s] and no other obstruent in English. Essentially, it's because [s] is a perceptually salient sound with loud fricative noise: it doesn't depend in the normal way on syllable structure. Many other languages give similar special treatment to [s] and related sounds; in German (and Yiddish), for example, it's the (alveo)palatal fricative, as in Schmutz "dirt."

Once again, syllable structure is a way in which languages differ.

Hawaiian, for example doesn't allow any coda consonants at all, and a maximum of one consonant in the onset. This means that borrowed words get a lot of extra vowels, to create new syllables of the proper type.

ink > 'înika Norman > Nolemana

Polish, on the other hand, allows more consonants at the beginning or end of a word than English does. This is why some Polish names are hard for English speakers to pronounce, such as Gdansk or Zbigniew Brzezinski.

bzdura "nonsense" babsk "witch" grzbiet [gzhbyet] "back" marnotrawstw [-fstf] "of wastes"

A language learner, when exposed to lots of examples of words and syllables in a new language, comes to understand what structures are possible in that language by observing the attested patterns.

Allophones

phoneme

allophones

square brackets

/slashes/

the phoneme /p/ becomes:	allophone [p]	immediately following [s]
the phoneme /p/ becomes:	allophone [ph]	at the beginning of the word

But the same generalization holds not just for /p/ but for the other voiceless stops, /t/ and /k/. Compare these word pairs:

top~stop, take~stake, tie~sty, etc.
kin~skin, cope~scope, can~scan, etc.

So more accurately, there's a single general statement that covers all these cases, stated in terms of natural classes.

voiceless stops are:	unaspirated	immediately following [s]
voiceless stops are:	aspirated	at the beginning of the word

The aspirated and unaspirated versions of the voiceless stops are in complementary distribution : each occurs in its own context, which does not overlap with the contexts of the other. The rule stated here assumes words of one syllable only. The full statement of where aspiration occurs in English is more complex: voiceless stops are aspirated when they occur syllable-initially and are followed by a stressed vowel (rápid, raphídity); as well as word-initially regardless of stress (photháto). At the beginning of a word, a preceding /s/ prevents the stop from being syllable- or word-initial. If related words (containing the same morpheme, or meaningful element) have different stresses, then we often find alternations in whether the same underlying sound /t/ is pronounced phonetically as plain [t] etc. or aspirated [th] etc.

rápid [p]	rapídity [ph]
authéntic [t]	authentícity [th]
récord [k]	recórd [kh]

This process is completely unconscious for most speakers, and often quite hard to unlearn. English speakers who learn a language like French or Spanish, in which all voiceless stops are unaspirated, typically impose aspiration according to their native rule; but that's wrong for these languages, and sounds foreign. Similarly, a French or Spanish speaker learning English will typically fail to produce aspiration in the right places; this is part of what it means to have a foreign accent. Aspiration in English is a small example of what phonological knowledge consists of:

it's learned unconsciously by children imitating (quite accurately!) the details of the language around them
it's systematic, applying to all words with voiceless stops, not just some random selection
it's defined in terms of a natural class (here "voiceless stops") rather than some arbitrary set of three consonants