Overview of Pandula Product Range
Morphological-Syntactic-Semantical NLP/NLU
- Pandula® Express: Fast and Simple NLP/NLU
- Pandula® Sharp: Fast, sharp but deterministic NLP/NLU
- Pandula® Deep: Thorough probabilistic NLP/NLU
- Pandula® Multistrategy: Everything combined
Pandula Express
Expression based NLP/NLU analysis engine
This is a simple, pragmatical system, useful as lowest layer to analyze simple sentences without much substructure in a fast and efficient way, for sentences for which analysing and understanding is trivial or simple.
The analysis is based on a kind of construction grammar language model that allows all-in-one analysis (NLP+NLU) based on syntactico-semantical word patterns, describing word sequences, or in some cases even on keyword sequences.
Such pattern are grouped by a “PatternSet” label, each of which leads to the same semantic categoty conclusions. The intermediary NLP results are usually left away in the end conclusions (only NLU) of the analysis, because the PatternSets do not necessarily relate to dependency grammar phrases.
The underlying model is easy and can be develop faster than models for higher complexity NLP/NLU analysis, at least if the model is limited to analyzing the simplest kind of sentences.
Despite its simplicity, simple sentences can occur frequently, especially in dialogue applications, so the importance and capabilities of this layer should not be underestimated too soon for a low to medium complexity dialogue environment.
The model for this layer is also often a handy precursor of models for higher layers.
Advantages:
- Quick and dirty to develop the underlying a basic, pragmatical language model that integrates NLP and NLP aspects all-in-one.
- Can be used to “quickstart” dialogue systems even with an advanced dialogue level model.
- This layer remains useful later on, for fast analysis of (frequently occurring) simple sentences, see Pandula MultiStrategy
Disadvantages:
- A sentence structure that is more than basic, can quickly pose a problem, usually leading to no or few analysis results.
- Very inflexible for varied input language.
Pandula Sharp
sharp deterministic NLP/NLU
This technology has a clearly separated NLP and NLU layer and is dependency tree compatible. No intermediary parse forests are generated, only the one and only most likely parse.
NLP
It is ideal for Low to Medium Complexity natural language processing based on a syntactic analysis of “islands” of limited length in the sentences. This will result in only the most probable syntactic analyses of such islands without taking the rest of the sentence into account. In many cases, correct semantic conclusions can be drawn, but there is a relatively elevated chance that incorrect conclusions are drawn. To avoid this, the islands can be modelled up to a full sentence level
In the context of a dialogue system (Intalxys), for medium complexity language and in popular domains, such a system can perform very well if the dialogue model stimulates the use of short sentences by the user. For more specialized domains, more complex user language and more language finesse, it will rather frequently miss important analysis detail and disambiguate wrongly, in which case Pandula Deep will perform better.
The pandula Sharp engine’s output usually corresponds to concepts in a dependency tree, and the resulting analysis to dependency subtrees, not necessarily up to the S (sentence) level.
Advantages:
- High NLP processing speed, which is in many cases very important
- Flexible
- Robustness for incorrect or partly wrongly (speech) recognized sentences is relatively good
- Language Model development time is relatively short (about 2-4 months).
- Close integration between NLP and NLU layers is possible, leading to faster model development.
Disadvantages:
- Less reliable disambiguation for more complex sentences.
- Barely relevant syntactic elements in sub sentences and prepositional phrases may be passed as equally relevant going to NLU.
- No concurrent syntactic analysis hypotheses (parse forests), so that important disambiguation decisions are made already at the syntactic level, instead of the semantic allayer, which would be better, more sure.
NLU
The NLU system is based on
- semantical role assignment of phrases and
- inheritance and
- checking of semantical properties derived from lexical entries.
Low to Medium complexity natural language understanding, based on a limited set of semantical roles and a limited range of inheritance levels. The underlying ontology is relatively simple.
A new version is currently being developed with strongly improved ontological possibilities.
Advantages:
- Quick development time
- Optional: Little or no dependency on external ontology resources.
- Models can be useful precursor of Pandula Deep NLU models.
- growing ability within this concept in next versions
Disadvantages:
- Not recommended for complex domains with richly structured semantics (but soon to be improved)
- limited growing ability within this concept in current version
Pandula Deep
In depth thorough NLP/NLU technology
This most complex technology of the Panula Range has a partly interacting NLP and NLU layer and is dependency tree compatible. The NLP/NLU processing generates intermediate parse forests with extensive and complex probability accounting on virtually every aspect
NLP
This Natural Language Processing technology is best fit to analyse complex sentences that are compatible (=correct) with the language model. In Natlanco’s technology, the model can be very detailed, and it can contain probability information almost for every model aspect. Natlanco’s fast parser will yield a syntactic parse forest, containing possibly many competing hypotheses of a sentence’s structure, each with its own probability, that will be updated by the NLU, after which high confidence disambiguation can take place.
Advantages:
- Can deal with very complex sentences
- High confidence disambiguation
- Very Flexible
- Speed and depth Scalable numerically, even for the same language model
Disadvantages:
- Relatively slow, especially for longer sentences with many syntactic ambiguities
- Limited tolerance for incorrect sentences, frequently occurring language errors should be integrated into the model as if they were correct
- Relatively long language model development time (3-10 months)
NLU
This level of NLU is the most ambitious one, and allows understanding of almost all aspects of a sentence in literal way. It can also include intrasentencial anaphora resolution.
Intersentencial resolution technology exists also, but is for instance part of the dialogue system or another optional supersentencial engine (see Suboption NLUDeep+) (Link within page).
To maximze the capabilities of understanding, a generic and ambitious generic ontology system with at least wide hypo/hypernymy relations should be attached to the entire system, including the lexical and NLP levels. Another important optional ontology aspect is the “is associated with”-relation.
Suboption NLUDeep
Low to High complexity natural language understanding based on dealing with very elaborated ontologies and a virtually unlimited range of inheritance levels.
Extra kinds of recursions in Natlanco’s own technology allow extra flexibility to understand and handle bigger meaning structure patterns, with direct consequences for the intelligent appearance of the system.
Advantages:
- Best possible technology, also in the long term.
- Major progress for a system’s intelligence in understanding language.
- Generic Ontology development time often avoided using for instance WorldNet type resources.
- Regular updates and improvements of generic ontology versions probable
- High availability of WordNets for many languages (see “Wordnet” on Wikipedia)
Disadvantages:
- High dependency on external ontology resources, some with unpredictable pricing evolution.
- Dependency on partly suboptimal standards, results of compromises between many partners.
- Domain Specific customisations of ontology may still be much own work
Suboption NLUDeep+
This Same as NLUDeep but with added co-reference resolution layers, so that words like “he”, “it” “that”, “then” and many more words and expressions, get coupled to the correct meaning content, even if these entities occurred in other sentences
Advantages:
- This is an important part of the artificial intelligence of a system.
Disadvantages:
- Slightly more time needed to develop the model with respect to NLUDeep
Pandula -Multi-strategy NLP/NLU option
This option is the combination of all previous Pandula options which are now considered as concurrent NLP/NLU layers:
The above layers are ordered according to processing speed, but in reverse order of precision.
When sentences come in, usually all layers start processing concurrently, and the first layer that reaches a high level of result confidence stops the other layers and overrides their results.
This technique is used to deal with all levels of sentence complexity at the same time. The NLP strategies, together with their NLU counterparts, compete for finding as early as possible a reasonably high confidence of analysis for incoming sentences.
Advantages:
- Best possible technology, also in the long term, best speed/quality trade-off
- Can deal with very simple as well as very complex sentences
- Efficient: Loses only a limited speed performance on the less appropriate strategies.
- Can be dynamically tuned for available processing power in server parks.
- High confidence disambiguation
- Relatively good robustness for incorrectly formulated sentences
- Very Flexible
Disadvantages:
- Not the ultimate of syntactic analysis speed performance
- Relatively long language model development time (4-10 months)
- Product still partly under development, first release expected medio 2015