Posts

Showing posts from December, 2018

Providers and Pipeline, Oh Why!!!

I have mentioned to a few people that DAIN and DIANA are built with providers and pipelines. The pipelines. Some providers are weighted while others run conditionally based on rules.  The question I get is often is why not just choose an algorithm and stick with it for simplicity.  There are a few reasons for that: There are 3 reasons to have different providers the first is Bias and the second is Context. Bias In the case of Natural Language Processing, you have heard a lot in the news about bias in AI and ML. Some of that is due to data which is the subject of another article but other times it is that the algorithm has been trained with data that is not specific to your domain and thus can produce bias that is not specific to your data. That is why it is so important to to use multiple algorithms to keep the other in check. You can weight one algorithm higher so it is most likely to produce the result but there are cases where it differs greatly from other algorithms that is so

Named-Entity Recognition vs Natural Language Processing

I have had a few questions regarding what is the difference between NER and NLP. Natural Language Processing (NLP) is the act of taking a body of text manipulating and processing it so that you can respond to it. Named Entity Recognition (NER) is part of the NLP process. If you have read the article on Natural Language Processing you will note that Named Entity Recognition is step 6. The process of labeling parts of speech and recognizing named entities requires a model or often multiple models. Why multiple models? Well consider you are processing a series of tweets. You will likely have a model that will recognize the @ symbol as the start of a Twitter Handle and a # as the start of a hash tag.  The twitter handle and hash tag have baggage attached to them.  The twitter handle is a person and you can use that to look up the contact or company that owns it. A hash tag provides additional context as well. A hash tag has no spaces but contains multiple words. For DAIN, we have a se

Natural Language Processing

Another key concept to DAIN and DIANA is Natural Language Processing. There are a variety of libraries and algorithms that can convert a sentence or series of statements into their various figures of speech.  DAIN and DIANA use the provider model the same way we did for Language Detection. This allows you to swap out your NLP as well as use multiple with the same idea of providing weighting to determine which provider would be considered more important. Weighting can be a simple number weighting or could include simple or more complex rules like this provider works better for English than it does for french or if this sentence contains these keywords then use this provider over that one. Natural Language Processing Steps There are various phases in Natural Language Processing. This article explains the basic ones.  Some libraries will expose the various steps while others will group them together or have a single method to do all of them: Splitting: This is the process of taki