1. Lexicographers Should Rethink their Work in the Digital Era
Printed dictionaries have built a genuine identity over the years. Lexicographers work for renowned publishers according to specific rules and processes; distribution channels are well-organized and efficient at delivering to educational or public markets. The emergence of new actors, exclusively focused on the Web, is a major upheaval as they deliver large corpora to a worldwide audience. Those Pure Players are now dominating the online dictionary market not only in terms of audience but also by establishing their own brands, independent of existing print brands.
These new actors bring their own vision of what an online dictionary should be. This presents a great opportunity for the industry to rethink the way dictionaries are written and published, inspired by the distinctive strengths of the Internet as a medium which call for clarity of the information, easiness of the service, and above all, intrinsic value of linguistic, i.e. lexicographic data.
Our experience, built through day-to-day management of several major free online dictionary websites, demonstrates the strong draw of dictionary content. Since dictionary websites encompass a very broad spectrum of the language and make it available for free on the Internet, users discover online dictionaries by very diverse means. Their distinct paths to a dictionary reflect their different interests in the content, and also their different expectations for the content delivered.
Making dictionary data amenable to favourable placement in search engines, for searches made in many languages, requires close involvement of lexicographers. These lexicographers must adapt to a process of creating entries for dynamic display on screen in addition to static display in print; understanding the impact of Search Engine Optimization (SEO) on entry structure; integrating a rich network of hyperlinks and making use of non-textual media to enrich their lexical content. Lexicographers are in the spotlight of the digital paradigm!
Quality of the content and publishers’ care over data play a key role in building user loyalty and depth of visit on the Website. On average, in a language learning context, we observe that visits last between 5 and 7 pages, providing the publisher with the opportunity to be in contact with its users for several pages. The question is to do what? For the moment, most of the dictionary websites are dead ends: a user enters for one or several definitions and leaves though his needs or interests can be much deeper. He may require course books, vocabulary lists, exercises for learners, novels, reference content, etc. Affiliation models help propose not only the publisher’s own content but complementary contents, products or services coming from partners. We are currently successfully experiencing with a partner the efficiency of an up-sales model based on dictionary free entries. Dictionary content is not only an efficient attraction point but plays also the role of a user qualification filter for targeted up-sales. Dictionary is an intermediary between a query and a targeted product.
Let’s detail the opportunities offered by the online dictionary market in three areas:
- Search Engine Optimization (SEO): why dictionary content is a marvellous resource to answer a wide range of queries in search tools such as Google, Bing, Yandex or Baidu,
- Reaching local markets worldwide with bilingual content,
- User Generated Content: an unmissable resource.
2. Why Should you Care about SEO?
A search in Google.com for 'dictionary' yields nearly two hundred thousand results among which, more than the first hundred results – it is difficult to assess accurately afterwards – provide free dictionary content . Now, if one amends this query – 'english dictionary', 'online dictionary' – the result lists change, promoting websites in a different order. The algorithmic decisions of a search engine greatly affect the traffic of dictionary sites. So dictionaries have to do with SEO and SEO has to do with lexicography.
2.1 A Dictionary is Applicable to a Broad Range of Interests
Let’s take the example of a monolingual English dictionary of approximately 40,000 entries. In a month, users reach the Website by querying for more than 150,000 keywords in search engines, varying from very generic – 'dictionary' for example – to very specific phrases.
A 250 free pages Premium website we consider by comparison is accessed monthly by 2,500 keywords.
(see left hand side pictures)
Figure 1 – Interest for entries vs. interest for brand
The figure 1 illustrates the search engine traffic to a brand new dictionary website over time, broken down between branded queries (dictionary’s or publisher’s name) and unbranded queries ('word', 'definition of word', 'is word plural?', etc.).
Search engines reveal the nature of a users’ interest in the information: they might be looking for a definition, a pronunciation, a linguistic particularity, etc., and they express their need in their own way into the engines. There is no standard way to search: two different people are likely to query a search engine differently to get to the same entry. In the benchmark we use here, we can state that on average, each dictionary page is accessed by three different search queries monthly. This is a key difference from print where the user is driven to the content only by alphabetical order. The way users search for content must be taken into account when writing and shaping the dictionary, and this process has nothing to do with the print paradigm.
Figure 2 illustrates a SEO phenomenon: users access the dictionary through a very wide range of queries: 60% of the incoming visits come from 99% of the keywords. This is the familiar “Long Tail” effect. In the figure below only the 2,000 keywords bringing the most traffic per month are represented.
(see left hand side pictures)
Figure 2 – Dictionary content is open to many different search queries
Stating it differently, most of the traffic is coming from words which, individually, account for a very little share of the usage. This advocates for a huge number of entries in the product in order to be applicable to the widest variety of searches. An online dictionary is not a physical object that must stay at a manageable size. It is therefore the lexicographers’ responsibility to design a product where users find their way easily between frequent and infrequent words, between different topical sets, etc. The consistency of the product is driven first by the data, not by the output medium.
2.2 Interest in Phrases: a Job for Lexicographers, an Opportunity for Publishers
In the previous section, we have demonstrated the role of unbranded queries to drive traffic to a free website. But what are those queries, in particular as applied to a dictionary?
Some of the unbranded queries very logically contain the standard keywords dictionary or English. However, early players on the free dictionary market have a stronghold at the top of the search engines result lists for such standard queries.
But searches for English phrases like 'flushed with a howling success', 'see to the children’s breakfast' or 'humanitarian grounds definition' for instance, yield very different result lists, leaving room for websites with a smaller audience but with carefully edited and search engine-optimized data. In the free dictionary website benchmark index we use in this presentation, 90% of the visits generated by unbranded queries are phrase-centric instead of being worded around generic dictionary keywords. However, it is important to underline that we cannot identify demand for single word definitions (like 'flower'), since our benchmark index is too largely outscored by pure players.
The same applies to examples – users do want more examples than the limited number available in print – to synonyms and antonyms (users specifically searching for synonym of word); to audio pronunciations, especially for learners; and to collocations.
A lot is still to be done in those areas to both properly promote this targeted content toward search engines and also to provide users with the appropriate lexicographic guidelines to make good use of such content once they find it. This work requires specific linguistic, editorial and publishing skills which lexicographers definitely possess. But it also requires that lexicographers attain a deep understanding of how websites work and how users search, navigate, and experience the Web.
3. Local Languages and Local Markets
To learn or understand a foreign language, people need good bilingual content in which to root their own translation skills. The ability to bridge the gap between two different languages is of major interest and urgent need. As it often is, the Internet is both a creator of, and a solution to this need. Let’s ponder to what extent.
3.1 Native vs. English: Diversity, not Supremacy
Internet users search more and more in their native language. This statement may sound either obvious or debatable, but it can be taken as a fact based on statistical data.
(see left hand side pictures)
Figure 3 – Comparison of volume of searches in Brazil for 'english dictionary' and 'dicionario de ingles'
Figure 3 compares (upper line) the number of times 'dicionario de ingles' has been searched in Google from Brazil with (lower line) the number of searches for 'english dictionary', also in Brazil, from 2004 to 2009. One thing appears clearly: Brazilian Internet users are searching more and more for the same information by typing it in Portuguese rather than in English.
We take here the example of the Brazilian market but the same statement can be made, with differences only in volume but not in trends, about other markets such as Japan, Italy, France or Spain. It is the case in China too but more balanced; and interestingly not in Germany. These facts invite us to consider each market on its own – which is exactly what publishers have done for years with printed editions.
3.2 To Grow, Provide Bilinguals!
Users are searching for bilingual content and this demand is increasing quickly.
(see left hand side pictures)
Figure 4 – Searches growth rate in India: 'hindi english' vs. 'english dictionary' Google queries
In Figure 4, contrary to Figure 3, the line chart does not graph volumes of searches but growth rate of the search queries, relative to the growth rate of a given Google category of queries. In this case, we consider the Google category Reference, which contains all the reference works. The dark middle line represents the average growth rate of searches within the Reference category. The upper line stands for the 'hindi english' query when the lower line stands for the 'english dictionary' one. Those two queries must be read against the Reference line.
The conclusion we can draw from this graph is that demand for Hindi-English bilingual content is growing fast, while apparent demand for monolingual English content is flat. As in the previous section, this statement applies to many other markets than India.
3.3 Conclusion: A Local Approach
According to the previous statements, we can assume that users search the Web for bilingual content in their native language.
Using the same growth rate comparison feature of Google Insights for Search, we can graph the following:
(see left hand side pictures)
Figure 5 – Searches growth rate worldwide: 'espanol ingles' vs. 'spanish english' Google queries
According to the graph, in rough volumes (visible in the “Totals” panel, top right corner of the screenshot above), demand for Spanish-English bilingual content, queried in English, is still much higher than queried in Spanish (index of 58 versus index of 3 on the whole period 2004-2009). However, if we consider growth rates instead of volumes, demand for bilingual content queried in Spanish is growing very fast (upper line of the chart above).
The conclusion that can be drawn is that Internet strategy needs to be designed market-by-market and to take a local approach by a) providing bilingual content and b) translating the interfaces to maximize SEO efficiency. This is not exactly news: some parties are already entering local markets by these means. But it is reassuring to be able to confirm the validity of their choices. The combinations of markets and languages leave room for many players.