I have written several blog posts with the tag ‘lost in autosubtitling’, most recently three days ago, so you may think I have a dim view of technological approaches to language. But sometimes technology gets it right, even when humans have made the mistake in the first place.
Yesterday morning I read a Facebook post in which someone complained about the “peroquialism” in a certain book sometimes considered an Australian classic. My first thought was that it was related to colloquialism – that is, “characteristic of or appropriate to ordinary or familiar conversation rather than formal speech or writing”, but the lack of a first l made that unlikely. (All the speech-related words have loqu– or loc-, from Latin loquī to speak.) When I searched for it, a well-known search engine suggested “Did you mean: parochialism” – that is “excessive narrowness of interests or view”
So how does a well-known search engine know that people searching for “peroquialism” are probably wanting parochialism? I don’t know; I don’t work for a well-known search engine company. “Peroquialism” isn’t an established variant or even a common mistake. The well-known search engine found two instances, both from Australia. One is in a submission from a vocational education and training college in Melbourne to an Australian government discussion paper on the “Quality of assessment in vocational education and training”. The other is in a report by a committee of the Western Australian parliament into “the Governance of Western Australia’s Water Resources”. Both of those are prepared, formal, written sources. Peroquial gets 376 results, though, divided between travel reviews in Spanish “la iglesia peroquial”, people referring to “a peroquial school” in English, and dictionary/spell checker sites.
The first two meanings of parochial are both neutral: “1 of, relating to, or financially supported by one or more church parishes: parochial churches in Great Britain. 2. of or relating to parochial schools or the education they provide.” I would always refer to “parish churches” and “parish councils”, but the Church of England officially has “parochial church councils”. In my experience, “parochial schools” are more associated with the Roman Catholic church and often bear the same name as the church; even then I would say “parish school”, or even “Roman Catholic school”.
Some time ago I downloaded a pdf copy of Metaphors we live by, by George Lakoff and Mark Johnson. Yesterday afternoon I read the first part of it, and was very quickly obvious that this copy is not authorised by the publisher, being riddled with type-setting errors. Relevantly, in the preface, Lakoff and Johnson write:
When we first met, in early January 1979, we found that we shared, also, a sense that the dominant views on meaning in Western philosophy and linguistics are inadequate—that “meaning” in these traditions has very little to do with what people find ineaningfrrl in their lives.
What is “ineaningfrrl”? The well-known search engine immediately suggested ‘Did you mean: meaningful?’ So how did “ineaningfrrl” come about? I guess that the creator of the pdf scanned the text of the book and pasted it into a word processing document, then created the pdf from the word processing document without checking anything. The well-known search engine returns about 50 results for “ineaningfrrl”, all of the relating to this unauthorised pdf. (One of these is someone asking on a language-related forum “What does this word mean?” and not getting an answer. If they have the technological skills to post a question on a language-related forum, then they have the technological skills to search the internet, like I did.)
The original clearly says meaningful. Scanning printed texts often does not cope well with italics. Here, m has been rendered as in, and u as rr. Why didn’t I spot that? Probably because it was so unexpected. Is this the fault of the scanning to text or the human doing it? Probably a bit of both. But the well-known search engine found the right answer even though, similarly, “ineaningfrrl” is not a recognised variant or even a common mistake.
So, two conclusions: technological approaches sometimes get language right (possibly even usually get language right, to the point that we only notice when it gets it wrong), and don’t trust unauthorised pdfs available on the internet.