Some applications
of intelligent systems:
The range
of areas in which expert systems are used is very broad. Below are some
examples of commercial and research-oriented expert systems in a variety
of areas.
Domain
Ecology, natural history
Type of task performed Identification
Brief Description
The Whale Watching
program allows a user to enter details of a sighting of a
whale, in
the form of a small number of questions. The system then
identifies
the whale s/he has seen. Uses photos of whales. Developed for the
Canadian Dept
of Environment by VVV Acquired Intelligence
Shell or language(s)
Written using Acquire knowledge acquisition system
Technical details
Exists in two version, one running over the Web, the other PC-based.
Douglas
Fir Cone and Seed Insects System
A system similar
to Whale Watcher, designed to help seed orchard managers and seed/fir cone
collectors.
Domain Agriculture
Type of task performed Orchard management
Brief Description
Advises on the running of apple orchards, including dealing with pests and diseases.
Shell or language
Written on the Mac in a frame-based expert system shell, based on C programming language.
EXTEL
- Brush and Weed Control
Natural
Language Processing is an umbrella term for a range of types of system
that automate the processes involved in communicating via a human language
such as English, Spanish or Chinese. In many NLP systems the representations
and processes used are those we have studied in earlier weeks - knowledge
is represented in formats such as predicate logic, frames, semantic networks,
conceptual dependency and so on and inferencing is carried out using the
search processes we have already seen. Other types of systems are being
developed using neural networks, a quite different approach. For information
on neural networks, used in particular for speech and handwriting recognition,
try these links:
The easiest
way to model the way we extract meaning from a stretch of language and
then build up and produce a response is to adopt a convenient fiction that
it consists of a series of sequential and discrete steps. In reality, these
steps or sub-processes overlap with and influence each other, and indeed
the architectures of certain language processing systems reflect this interactivity:
for the most part, however, systems proceed one stage at a time.
| Input utterance | Output utterance |
| Stage 1 - Processing of raw input | Stage 8 -Text to output medium |
| Stage 2 - Linguistic Analysis | Stage 7 - Deciding how to say it |
| Stage 3 - Semantic analysis | Stage 6 - Deciding what to say |
| Stage 4 - Understanding intentions | Stage 5 - Generating intentions |
Stage
1 - Processing of raw input
Language can come to the system in the form of digital text, which is passed on fairly straightforwardly to the subsequent stages, probably after some pre-processing involving stripping out formatting information, or in spoken or hand-written form, which is much more difficult to process.
Speech analysis and recognition is an important discipline in itself. The input into a speech analysis system is the speech event captured as wave forms, a series of numbers representing the amplitude of the sound wave at successive points in the utterance. This can be transformed into a more useful description via a speech analysis program which takes the input sounds and recognises the phonemes that make up the stream of speech. Thence a word recognition module, often using template matching, can work out what words the phonemes must represent. For example, ignoring the fact that words rarely occur alone in this way, let's say the input is an American pronunciation of the word "written". The wave forms making up the utterance might be processed to produce a representation something like [riden] (N.B. not an exact rendering - I don't have the IPA font available, but it makes the point). The sound a Scottish person makes pronouncing the same word might be represented as something more like [Riten], with R representing a rolled "r". The point is that the actual sounds uttered for the "same" word are noticeably different. A word recogniser program would need to know that in some circumstances "r" and "R" are alternative pronunciations (allophones) of the sound that begins the word "written", and that "d" and "t" (or for some people, a glottal stop) are the sounds produced for the middle consonant. The program would then be able to match the "base" phonemic structure of the word against the patterns in its knowledge base, and come up with "written" as a good candidate. In the American case, the word "ridden" would also be likely to be retrieved as it would also match.
Speech recognition is making rapid progress, under the pressure of companies wanting speech interfaces to their telephone accessed services such as enquiry services, booking, banking, automated surveys and so on. There is always a trade-off in speech recognition systems between language coverage and general applicability. If a system has to recognise input from the general public, the number of language items it can reliably recognise will be relatively small: some systems can recognise yes, no and a restricted set of numbers, for instance. A telephone banking system would be a typical example. However, where the system will only be used by known users, who can train it to recognise their particular speech habits, a much larger number of items can be recognised successfully. An office dictation system would come into this category.
Speech links
An excellent interactive introduction to speech analysis is provided by the Oregon Graduate Institute's Center for Spoken Language Understanding. Highly recommended.
Handwriting Recognition Links:
The sequences of words output by Stage 1 are typically analysed to create a representation that shows their morphemic and syntactic structure. For instance, if the input consisted of the sequence "Your horrible dog chased the poor paper boy this morning" some of information we could extract would be that this is a complete English sentence consisting of a noun phrase [your horrible dog] and a verb phrase (the rest). The verb phrase consists of a verb [chased], followed by a noun phrase functioning as direct object [the paper boy] and an adverbial phrase indicating time [this morning]. We would know that the noun [dog] was qualified by two adjectives, a possessive [your] and an attributive [horrible], and that the verb involved a morpheme indicating simple past tense [-d]. We might not be able to give a single analysis of the compound noun [paper boy] if we have to analyse it from scratch. Is it a boy made out of paper, a boy who collects waste paper, a boy who delivers newspapers or what? Let's hope the system has access to a detailed lexicon or dictionary that can put it out of its misery on this point. "Poor" looks unproblematic at this point - it's just a simple adjective.
The types of system that carry out this stage of the analysis are parts-of-speech taggers, which identify the grammatical elements making up the word sequence, and parsers, which analyse the way the elements combine.
On-line demos of example parts-of-speech taggers and parsers are available from the University of Zurich's site.
At this stage, the analysis is good enough for a very shallow dialogue to take place, given a system that can manipulate syntactic forms. For instance, once the linguistic analysis has taken place and added the details of the "paper boy" sequence into the system's database, we could imagine a dialogue system that could take question "Who chased the paper boy?" and return "The horrible dog chased the paper boy" or just "The dog" or even "The hearer's dog" if it is very tricky. It could manage this by matching the input question with its representation of the original sequence. It would then need a rule such as "For [Who Item1 Item2], where there is an item in the database [Item3 Item1 Item2], output [Item3 Item1 Item2] or [Item3]", which would give us the first two answers. The system's performance could then be made more impressive by adding facilities for switching between 1st and 2nd person pronouns and other forms, varying response types, keeping a dialogue history and so on.
The simpler Eliza-like systems use a very crude version of this type of analysis, sometimes simply pattern matching without tagging or syntactic analysis, to obtain their results. They quickly break down when faced with complex constructions, though, and the more sophisticated systems do include parsers for better performance.
Links to some Eliza-like chatterbots:
The linguistic analysis outputs a grammatical description of the word sequence. This still doesn't come close to telling us what the sequence means. A human being hearing the sequence "Your horrible dog chased the poor paper boy this morning" would be able to extract many subtle aspects of its meaning, as well as the basic facts of the proposition, which are probably something like -
There are further aspects of the sentence that we as human beings have a feel for, but which are difficult to turn into rules for a system. For instance, I think we would assume the hearer has just one dog, otherwise the natural thing to say would have been "One of your horrible dogs" or something similar, but this is not watertight reasoning. Perhaps only one of several dogs is horrible and the others are loveable, or perhaps that particular dog is the current focus of interest for other reasons: in either of these cases "your horrible dog" would be natural. And what about "horrible"? We might naively assume that this adjective conveyed some objective attribute of the dog: it is horrible in the same way that it might be hungry, ginger, smelly, patient, ill or whatever. This doesn't seem right though. The meaning most of us would extract from "horrible" in this context is simply that the speaker dislikes the dog. How do we encode this nuance for the system? We have a similar challenge with "poor" which seems to imply something like "the paper boy deserves/needs sympathy" and "the speaker feels sorry for the paper boy" at the same time.
Providing a system with this kind of knowledge is a huge but clearly necessary undertaking. For some large-scale attempts see:
Stage 4 - Understanding intentionsDo you suspect that there's some motive other than simple transmission of information behind "Your horrible dog chased the poor paper boy this morning"?. I think most of us would see it as a complaint of some sort, to which a reply such as "Who won?" or "That new dog food is working wonders" would not be appropriate. The area of linguistics that tries to work out the intentions of a speaker in a context is known as pragmatics and in our case we would be well advised to attend to the speaker's intention before formulating a response. This aspect of pragmatics has concentrated on the notion of speech acts, and in particular has formulated rules for spotting and responding to what are known as "indirect speech acts" such as we have in our example. The idea here is that we very frequently express ourselves indirectly, not least for reasons of politeness. For instance, if we want a cup of tea, rather than say "Make me a cup of tea," which might seem to impose our will too heavily on the person we're talking to, we might say "Is the kettle on?", "This central heating can make you feel very thirsty" or "Do you still have any of that nice Earl Gray you bought at Harrods?" or some other indirect expression of our intention.
Here, it's hard to know exactly what the intention is without some context. If, for instance, the speaker has previously asked the hearer to keep the dog tied up, this could be seen as an indirect way of repeating the request (by stating the justification for it). If so, a reply such as "Sorry, I'll tie him up in the garden" would be in order.
Adding this sort of analysis would be a big step towards creating a truly human-like dialogue partner.
Stage
5 - Generating intentions
When it comes to working in the opposite direction from language understanding, that is, when we attempt to produce language automatically, deciding which intention to express and how directly to express it is a good start. For instance, in this case, we might understand the original utterance as a complaint and plan a conciliatory answer along the lines of offering to tie the dog up. Or you might be in a bad mood and decide to tell your nosy neighbour to mind their own business. Or you might take exception to the word "horrible" and decide to tell them so. Taking the first approach, one way of expressing your goal would be to tell your neighbour that you are going to buy the dog a rope. This is an indirect way of expressing the fact that you are promising to tie him up in future. You have decided your neighbour is bright enough to spot this.
Let's take another example. Imagine an automatic hotel booking system for Brighton and Hove. The system receives a request for an hotel booking on a weekend when the Labour Party Conference is in town and there isn't a room to be had in the whole of Brighton. However, there are rooms available in Hove. The system might have a straightforward strategy of simply telling the user that no rooms are available in the requested location. Or the Tourist Board might have programmed it to suggest Hove as an alternative in such cases. The goal now is to suggest to the user that s/he should request a room in Hove. However, a straightforward "Stay in Hove instead" won't work. Perhaps the system should generate a sentence or two describing Hove's proximity to Brighton and stating that there are often rooms available there. This would be an indirect speech act (two, in fact) that would probably communicate the system's intention without making the user feel pushed around.
Stage
6 - Deciding what to say
The next stage involves the system selecting elements of meaning to be expressed in syntactic units, the mirror image of Stage 3. The semantic elements you might want to express could include;
Stage
7 - Deciding how to say it
How many ways
could you think of to express the semantic units specified in Stage 6?
Let's try for twenty:
1. I'll have to buy him a leash tomorrow.and so on. There are probably 100 sensible ways to express the content we want, not to mention the many possible versions that express the content but are stylistically not quite appropriate, e.g. "when the sun rises over the horizon I shall arise and go in search of a stout cord for my trusty hound…" and the like.
2. I'll have to buy a leash for him tomorrow.
3. I'll have to get him a leash tomorrow.
4. I'll have to get a leash for him tomorrow.
5. I must get him a leash tomorrow.
6. I must get a leash for him tomorrow.
7. Tomorrow, I'll have to buy him a leash.
8. Tomorrow, I'll have to buy a leash for him.
9. I will buy him a leash tomorrow. I feel I have to.
10. I will buy him a leash tomorrow. I feel I have to.
11. I think I'd better get him a leash tomorrow.
12. I'll be getting a leash for him tomorrow.
13. I'll buy him a leash. I'll do it tomorrow.
14. I'll get a leash for him in the morning.
15. In the morning, I'll go and get a leash for him.
16. What I should do is get a leash for him. I'll do it tomorrow.
17. A leash is what I must buy. I'll get one for him tomorrow.
18. What I must buy is a leash for him. I'll buy it tomorrow.
19. What I must do is buy a leash for him. I'll get it tomorrow.
20. What I must buy for him is a leash. I'll get one tomorrow.
The relevant module of a natural language generation system will typically include a wide range of rules determining how semantic content should best be expressed given constraints of context, stylistic register, the history of the dialogue, the state of knowledge of the participants and other parameters. The output of this stage will be a textual representation of the generated text in pretty much final form, with lexical elements selected and ordered correctly, according to the system's parameters.
Stage
8 -Text to output medium
If text
is to be the output medium, the message that has been generated can simply
be displayed on a screen or wherever it is need, as plain text. However
some systems are now adding formatting information to the text in an attempt
to add information: bulleted lists, headings and subheadings, footnotes
and so on are useful tools of the human writer's trade and it seems foolish
not to exploit them. The Iconoclast system concentrates on this issue.
Generating
text suitable for display in hypertext format is another current research
area, as is the integration of generated texts and illustrations.
Text to speech mapping is a very important area, particularly as telephone commerce continues to expand. You will probably be familiar with BT's automatic synthesis of telephone numbers for Directory Enquiries and will have an opinion on the quality of the output. Similarly you will have heard the robotic tones generated by the text-to- speech synthesiser used by Stephen Hawkins and other people with no speech production capability of their own. Mac owners can also experiment with a range of voices reading their texts in SimpleText.
There are various approaches to speech synthesis. The BT approach, judging by the output, is to splice together ready made chunks of recorded natural speech, typically the introductory text as a single chunk, i.e. "the number you require is…" followed by a string of numbers, each a separate chunk. It would appear that there is only one version of each number. This is what makes for a slightly unnatural effect as the rising and falling pattern we would naturally expect for each group of numbers, e.g. 01273, doesn't happen: the "3" is the same whether at the beginning of a group or at the end. Listen with a critical ear next time you phone 192.
This is similar to a general approach known as diphone synthesis, which takes natural utterances of all the pairs of phonemes in the language and stores an analysis of the section from the middle of the first phoneme to the middle of the second. This works better than simply storing a representation of each morpheme because it takes into account the fact that the actual sounds (phones) produced to represent a phoneme are strongly influenced by their context, producing different allophones in different contexts (e.g. "n" in "handbag" - try to work out the sound you produce as you pronounce the word naturally).
Another approach is to associate a single acoustic description with each phoneme and to create sequences of phonemes from these base elements.
For more depth on speech generation and synthesis see:
Good background
information from DISC
a European grouping concerned with dialogue evaluation.
The
SOLE project
Talking
Heads site - very interesting project to create heads that talk.
A
collection of mostly University-based systems
IBM's
text to speech system
Festival
System on-line demo
For a demo
system that puts many of these phases together, look at SRI's
Translation system.