CS327 Intelligent Systems

Week 25: Applications of Artificial Intelligence



Intelligent Systems Application Areas

This page is meant as a general source of information for CS237 students on the varieties of intelligent systems application.
AI techniques are used in a wide range of application areas. The page is updated from time to time as students research new systems and add them to the page: this makes it even more "Under Construction" than most…

Some applications of intelligent systems:
 


Expert Systems 


The range of areas in which expert systems are used is very broad. Below are some examples of commercial and research-oriented expert systems in a variety of areas.
 


Whale Watching 


Domain    Ecology, natural history

Type of task performed    Identification

Brief Description

The Whale Watching program allows a user to enter details of a sighting of a
whale, in the form of a small number of questions. The system then
identifies the whale s/he has seen. Uses photos of whales. Developed for the
Canadian Dept of Environment by VVV Acquired Intelligence

Shell or language(s)

Written using Acquire knowledge acquisition system

Technical details

Exists in two version, one running over the Web, the other PC-based.

Douglas Fir Cone and Seed Insects System 

A system similar to Whale Watcher, designed to help seed orchard managers and seed/fir cone collectors.
 

Apple Orchard Advisor

Domain    Agriculture

Type of task performed    Orchard management

Brief Description

Advises on the running of apple orchards, including dealing with pests and diseases.

Shell or language

Written on the Mac in a frame-based expert system shell, based on C programming language.

EXTEL - Brush and Weed Control


Natural Language Processing 


Natural Language Processing is an umbrella term for a range of types of system that automate the processes involved in communicating via a human language such as English, Spanish or Chinese. In many NLP systems the representations and processes used are those we have studied in earlier weeks - knowledge is represented in formats such as predicate logic, frames, semantic networks, conceptual dependency and so on and inferencing is carried out using the search processes we have already seen. Other types of systems are being developed using neural networks, a quite different approach. For information on neural networks, used in particular for speech and handwriting recognition, try these links:

The term Language Engineering is used to describe the attempt to develop technologies to process language, for use in applications such as: Here are some examples of NLP technologies most of which include on-line demos: Language Engineering is the core business of the University's IT Research Institute - a visit to their web site will give some insights into the interesting practical projects under way there.

The easiest way to model the way we extract meaning from a stretch of language and then build up and produce a response is to adopt a convenient fiction that it consists of a series of sequential and discrete steps. In reality, these steps or sub-processes overlap with and influence each other, and indeed the architectures of certain language processing systems reflect this interactivity: for the most part, however, systems proceed one stage at a time.
 
 

A Simplified Stage Model of Natural Language Processing

Input utterance Output utterance
Stage 1 - Processing of raw input Stage 8 -Text to output medium 
Stage 2 - Linguistic Analysis  Stage 7 - Deciding how to say it
Stage 3 - Semantic analysis  Stage 6 - Deciding what to say 
Stage 4 - Understanding intentions  Stage 5 - Generating intentions

Stage 1 - Processing of raw input 

Language can come to the system in the form of digital text, which is passed on fairly straightforwardly to the subsequent stages, probably after some pre-processing involving stripping out formatting information, or in spoken or hand-written form, which is much more difficult to process.

Speech analysis and recognition is an important discipline in itself. The input into a speech analysis system is the speech event captured as wave forms, a series of numbers representing the amplitude of the sound wave at successive points in the utterance. This can be transformed into a more useful description via a speech analysis program which takes the input sounds and recognises the phonemes that make up the stream of speech. Thence a word recognition module, often using template matching, can work out what words the phonemes must represent. For example, ignoring the fact that words rarely occur alone in this way, let's say the input is an American pronunciation of the word "written". The wave forms making up the utterance might be processed to produce a representation something like [riden] (N.B. not an exact rendering - I don't have the IPA font available, but it makes the point). The sound a Scottish person makes pronouncing the same word might be represented as something more like  [Riten], with R representing a rolled "r". The point is that the actual sounds uttered for the "same" word are noticeably different. A word recogniser program would need to know that in some circumstances "r" and "R" are alternative pronunciations (allophones) of the sound that begins the word "written", and that "d" and "t" (or for some people, a glottal stop) are the sounds produced for the middle consonant. The program would then be able to match the "base" phonemic structure of the word against the patterns in its knowledge base, and come up with "written" as a good candidate. In the American case, the word "ridden" would also be likely to be retrieved as it would also match.

Speech recognition is making rapid progress, under the pressure of companies wanting speech interfaces to their telephone accessed services such as enquiry services, booking, banking, automated surveys and so on. There is always a trade-off in speech recognition systems between language coverage and general applicability. If a system has to recognise input from the general public, the number of language items it can reliably recognise will be relatively small: some systems can recognise yes, no and a restricted set of numbers, for instance. A telephone banking system would be a typical example. However, where the system will only be used by known users, who can train it to recognise their particular speech habits, a much larger number of items can be recognised successfully. An office dictation system would come into this category.

Speech links

An excellent interactive introduction to speech analysis is provided by the Oregon Graduate Institute's Center for Spoken Language Understanding. Highly recommended.

In many application areas - postal services, office work and medicine, for instance - the ability to automatically decipher hand-written text would be valuable. Handwriting analysis and recognition systems take as input digitised representations of hand-written texts (from a scanner, for instance, or directly input via a drawing tablet) and transform them into standard ASCII characters forming sequences of words.

Handwriting Recognition Links:

Stage 2 - Linguistic Analysis 

The sequences of words output by Stage 1 are typically analysed to create a representation that shows their morphemic and syntactic structure. For instance, if the input consisted of the sequence "Your horrible dog chased the poor paper boy this morning" some of information we could extract would be that this is a complete English sentence consisting of a noun phrase [your horrible dog] and a verb phrase (the rest). The verb phrase consists of a verb [chased], followed by a noun phrase functioning as direct object [the paper boy] and an adverbial phrase indicating time [this morning]. We would know that the noun [dog] was qualified by two adjectives, a possessive [your] and an attributive [horrible], and that the verb involved a morpheme indicating simple past tense [-d]. We might not be able to give a single analysis of the compound noun [paper boy] if we have to analyse it from scratch. Is it a boy made out of paper,  a boy who collects waste paper, a boy who delivers newspapers or what? Let's hope the system has access to a detailed lexicon or dictionary that can put it out of its misery on this point. "Poor" looks unproblematic at this point - it's just a simple adjective.

The types of system that carry out this stage of the analysis are parts-of-speech taggers, which identify the grammatical elements making up the word sequence, and parsers, which analyse the way the elements combine.

On-line demos of example parts-of-speech taggers and parsers are available from the University of Zurich's site.

At this stage, the analysis is good enough for a very shallow dialogue to take place, given a system that can manipulate syntactic forms. For instance, once the linguistic analysis has taken place and added the details of the "paper boy" sequence into the system's database, we could imagine a dialogue system that could take question "Who chased the paper boy?" and return "The horrible dog chased the paper boy" or just "The dog" or even "The hearer's dog" if it is very tricky. It could manage this by matching the input question with its representation of the original sequence. It would then need a rule such as "For [Who Item1 Item2], where there is an item in the database [Item3 Item1 Item2], output [Item3 Item1 Item2] or [Item3]", which would give us the first two answers. The system's performance could then be made more impressive by adding facilities for switching between 1st and 2nd person pronouns and other forms, varying response types, keeping a dialogue history and so on.

The simpler Eliza-like systems use a very crude version of this type of analysis, sometimes simply pattern matching without tagging or syntactic analysis, to obtain their results. They quickly break down when faced with complex constructions, though, and the more sophisticated systems do include parsers for better performance.

Links to some Eliza-like chatterbots:


Stage 3 - Semantic analysis 

The linguistic analysis outputs a grammatical description of the word sequence. This still doesn't come close to telling us what the sequence means. A human being hearing the sequence "Your horrible dog chased the poor paper boy this morning" would be able to extract many subtle aspects of its meaning, as well as the basic facts of the proposition, which are probably something like -

A simple semantic analysis of the sentence, attaching a meaning unit to each syntactic unit, would give us this much. The quality of the semantic information available is still very poor, though. It will allow the system to correctly answer the question "Did it happen today?" (yes) and "Who owned the dog?" (the person being addressed) but not much more. What is needed in the system is extra knowledge about the meaning of the lexical items and their combination that will allow it to answer questions such as: In other words, we need knowledge of objects and events in the real world, encoded in frames, semantic networks, conceptual dependency notation or some other formalism, which can be manipulated to allow the system to gain something more like what we human beings mean by "understanding."  For instance, we need to know that "chasing" is normally something that happens at speed with an agent placed behind an object and not vice-versa, that it's impossible to chase anything if you are asleep, that dogs normally have tails and that "boy" would not normally refer to a 60 year old.

There are further aspects of the sentence that we as human beings have a feel for, but which are difficult to turn into rules for a system. For instance, I think we would assume the hearer has just one dog, otherwise the natural thing to say would have been "One of your horrible dogs" or something similar, but this is not watertight reasoning. Perhaps only one of several dogs is horrible and the others are loveable, or perhaps that particular dog is the current focus of interest for other reasons: in either of these cases "your horrible dog" would be natural. And what about "horrible"?  We might naively assume that this adjective conveyed some objective attribute of the dog: it is horrible in the same way that it might be hungry, ginger, smelly, patient, ill or whatever. This doesn't seem right though. The meaning most of us would extract from "horrible" in this context is simply that the speaker dislikes the dog. How do we encode this nuance for the system? We have a similar challenge with "poor" which seems to imply something like "the paper boy deserves/needs sympathy" and "the speaker feels sorry for the paper boy" at the same time.

Providing a system with this kind of knowledge is a huge but clearly necessary undertaking. For some large-scale attempts see:

Stage 4 - Understanding intentions 

Do you suspect that there's some motive other than simple transmission of information behind "Your horrible dog chased the poor paper boy this morning"?. I think most of us would see it as a complaint of some sort, to which a reply such as "Who won?" or "That new dog food is working wonders" would not be appropriate. The area of linguistics that tries to work out the intentions of a speaker in a context is known as pragmatics and in our case we would be well advised to attend to the speaker's intention before formulating a response. This aspect of pragmatics has concentrated on the notion of speech acts, and in particular has formulated rules for spotting and responding to what are known as "indirect speech acts" such as we have in our example. The idea here is that we very frequently express ourselves indirectly, not least for reasons of politeness. For instance, if we want a cup of tea, rather than say "Make me a cup of tea," which might seem to impose our will too heavily on the person we're talking to, we might say "Is the kettle on?", "This central heating can make you feel very thirsty" or "Do you still have any of that nice Earl Gray you bought at Harrods?" or some other indirect expression of our intention.

Here, it's hard to know exactly what the intention is without some context. If, for instance, the speaker has previously asked the hearer to keep the dog tied up, this could be seen as an indirect way of repeating the request (by stating the justification for it). If so, a reply such as "Sorry, I'll tie him up in the garden" would be in order.

Adding this sort of analysis would be a big step towards creating a truly human-like dialogue partner.

Stage 5 - Generating intentions 

When it comes to working in the opposite direction from language understanding, that is, when we attempt to produce language automatically, deciding which intention to express and how directly to express it is a good start. For instance, in this case, we might understand the original utterance as a complaint and plan a conciliatory answer along the lines of offering to tie the dog up. Or you might be in a bad mood and decide to tell your nosy neighbour to mind their own business. Or you might take exception to the word "horrible" and decide to tell them so. Taking the first approach, one way of expressing your goal would be to tell your neighbour that you are going to buy the dog a rope. This is an indirect way of expressing the fact that you are promising to tie him up in future. You have decided your neighbour is bright enough to spot this.

Let's take another example. Imagine an automatic hotel booking system for Brighton and Hove. The system receives a request for an hotel booking on a weekend when the Labour Party Conference is in town and there isn't a room to be had in the whole of Brighton. However, there are rooms available in Hove. The system might have a straightforward strategy of simply telling the user that no rooms are available in the requested location. Or the Tourist Board might have programmed it to suggest Hove as an alternative in such cases. The goal now is to suggest to the user that s/he should request a room in Hove. However, a straightforward "Stay in Hove instead" won't work. Perhaps the system should generate a sentence or two describing Hove's proximity to Brighton and stating that there are often rooms available there. This would be an indirect speech act (two, in fact) that would probably communicate the system's intention without making the user feel pushed around.

Stage 6 - Deciding what to say 

The next stage involves the system selecting elements of meaning to be expressed in syntactic units, the mirror image of Stage 3. The semantic elements you might want to express could include;

Different systems have a range of mechanisms for specifying semantic content of this sort. You could probably image a frame-based or conceptual dependency version of this information. Have a look at ITRI's  WYSIWYM system for an interesting approach to creating an interface to a specification module.

Stage 7 - Deciding how to say it 

How many ways could you think of to express the semantic units specified in Stage 6?  Let's try for twenty:
 

1. I'll have to buy him a leash tomorrow.
2. I'll have to buy a leash for him tomorrow.
3. I'll have to get him a leash tomorrow.
4. I'll have to get a leash for him tomorrow.
5. I must get him a leash tomorrow.
6. I must get a leash for him tomorrow.
7. Tomorrow, I'll have to buy him a leash.
8. Tomorrow, I'll have to buy a leash for him.
9. I will buy him a leash tomorrow. I feel I have to.
10. I will buy him a leash tomorrow. I feel I have to.
11. I think I'd better get him a leash tomorrow.
12. I'll be getting a leash for him tomorrow.
13. I'll buy him a leash. I'll do it tomorrow.
14. I'll get a leash for him in the morning.
15. In the morning, I'll go and get a leash for him.
16. What I should do is get a leash for him. I'll do it tomorrow.
17. A leash is what I must buy. I'll get one for him tomorrow.
18. What I must buy is a leash for him. I'll buy it tomorrow.
19. What I must do is buy a leash for him. I'll get it tomorrow.
20. What I must buy for him is a leash. I'll get one tomorrow.
and so on. There are probably 100 sensible ways to express the content we want, not to mention the many possible versions that express the content but are stylistically not quite appropriate, e.g. "when the sun rises over the horizon I shall arise and go in search of a stout cord for my trusty hound…" and the like.

The relevant module of a natural language generation system will typically include a wide range of rules determining how semantic content should best be expressed given constraints of context, stylistic register, the history of the dialogue, the state of knowledge of the participants and other parameters. The output of this stage will be a textual representation of the generated text in pretty much final form, with lexical elements selected and ordered correctly, according to the system's parameters.

Stage 8 -Text to output medium 

If text is to be the output medium, the message that has been generated can simply be displayed on a screen or wherever it is need, as plain text. However some systems are now adding formatting information to the text in an attempt to add information: bulleted lists, headings and subheadings, footnotes and so on are useful tools of the human writer's trade and it seems foolish not to exploit them. The Iconoclast system concentrates on this issue.
Generating text suitable for display in hypertext format is another current research area, as is the integration of generated texts and illustrations.

The Iconoclast project
 

Text to speech mapping is a very important area, particularly as telephone commerce continues to expand. You will probably be familiar with BT's automatic synthesis of telephone numbers for Directory Enquiries and will have an opinion on the quality of the output. Similarly you will have heard the robotic tones generated by the text-to- speech synthesiser used by Stephen Hawkins and other people with no speech production capability of their own. Mac owners can also experiment with a range of voices reading their texts in SimpleText.

There are various approaches to speech synthesis. The BT approach, judging by the output, is to splice together ready made chunks of recorded natural speech, typically the introductory text as a single chunk, i.e. "the number you require is…" followed by a string of numbers, each a separate chunk. It would appear that there is only one version of each number. This is what makes for a slightly unnatural effect as the rising and falling pattern we would naturally expect for each group of numbers, e.g. 01273, doesn't happen: the "3" is the same whether at the beginning of a group or at the end. Listen with a critical ear next time you phone 192.

This is similar to a general approach known as diphone synthesis, which takes natural utterances of all the pairs of phonemes in the language and stores an analysis of the section from the middle of the first phoneme to the middle of the second. This works better than simply storing a representation of each morpheme because it takes into account the fact that the actual sounds (phones) produced to represent a phoneme are strongly influenced by their context, producing different allophones in different contexts (e.g. "n" in "handbag" - try to work out the sound you produce as you pronounce the word naturally).

Another approach is to associate a single acoustic description with each phoneme and to create sequences of phonemes from these base elements.

For more depth on speech generation and synthesis see:

Good background information from DISC a European grouping concerned with dialogue evaluation.
The SOLE project
Talking Heads site - very interesting project to create heads that talk.
A collection of mostly University-based systems
IBM's text to speech system
Festival System on-line demo

For a demo system that puts many of these phases together, look at SRI's Translation system


To CS237 Home Page
To Lyn Pemberton's home page

Last edited 17 October 2001