Named-Entity Recognition vs Natural Language Processing
I have had a few questions regarding what is the difference between NER and NLP. Natural Language Processing (NLP) is the act of taking a body of text manipulating and processing it so that you can respond to it. Named Entity Recognition (NER) is part of the NLP process. If you have read the article on Natural Language Processing you will note that Named Entity Recognition is step 6.
The process of labeling parts of speech and recognizing named entities requires a model or often multiple models. Why multiple models? Well consider you are processing a series of tweets. You will likely have a model that will recognize the @ symbol as the start of a Twitter Handle and a # as the start of a hash tag. The twitter handle and hash tag have baggage attached to them. The twitter handle is a person and you can use that to look up the contact or company that owns it. A hash tag provides additional context as well. A hash tag has no spaces but contains multiple words.
For DAIN, we have a separate models used if the source is Twitter: TwitterFollowers, HashTags. We also have CRM models for companies and contacts. For your provider you may have CRM models as well that recognize customers so you can better respond to them based on other information you have on that customer. If a customer is complaining about an issue with their camera and from their purchase history you know they bought model XYZ123 then before you respond you can look up known issues with that model and see if it matches their issue. If so you could confirm their model number but would already have some specifics so when they tell you it is XYZ123 you can say yes we are aware of that issue and this is how you fix it. A much better experience than oh let me look into that.
Determining your models early at least the major pieces are important because it often takes time and a lot of work to fill them and clean any holes in the data. For a simple POC you could build your own models but for a more advanced project we recommend you connect with a Data Scientist that can help you determine the right features to add and the right models to build. One extra column/feature multiplied by millions can really extend your data set and cause additional performance concerns that you may not need. However missing a key column/feature can mean your model is bias and will not provide you with the right answers.
If you need help let me know and I can get you in touch with someone to work with you to build the right model.
We will be discussing building a model as part of the Build-Your-Own Natural Language Processing Provider but those will be simple models that you will need to expand upon for your specific purposes.
The process of labeling parts of speech and recognizing named entities requires a model or often multiple models. Why multiple models? Well consider you are processing a series of tweets. You will likely have a model that will recognize the @ symbol as the start of a Twitter Handle and a # as the start of a hash tag. The twitter handle and hash tag have baggage attached to them. The twitter handle is a person and you can use that to look up the contact or company that owns it. A hash tag provides additional context as well. A hash tag has no spaces but contains multiple words.
For DAIN, we have a separate models used if the source is Twitter: TwitterFollowers, HashTags. We also have CRM models for companies and contacts. For your provider you may have CRM models as well that recognize customers so you can better respond to them based on other information you have on that customer. If a customer is complaining about an issue with their camera and from their purchase history you know they bought model XYZ123 then before you respond you can look up known issues with that model and see if it matches their issue. If so you could confirm their model number but would already have some specifics so when they tell you it is XYZ123 you can say yes we are aware of that issue and this is how you fix it. A much better experience than oh let me look into that.
Determining your models early at least the major pieces are important because it often takes time and a lot of work to fill them and clean any holes in the data. For a simple POC you could build your own models but for a more advanced project we recommend you connect with a Data Scientist that can help you determine the right features to add and the right models to build. One extra column/feature multiplied by millions can really extend your data set and cause additional performance concerns that you may not need. However missing a key column/feature can mean your model is bias and will not provide you with the right answers.
If you need help let me know and I can get you in touch with someone to work with you to build the right model.
We will be discussing building a model as part of the Build-Your-Own Natural Language Processing Provider but those will be simple models that you will need to expand upon for your specific purposes.
If you are interested in learning more then reach out and let's discuss. I will be providing a basic QuickStart to SitecoreDain subscribers soon. If you cannot wait email me at chris.williams@readwatchcreate.com and as a subscriber, I can release an early version to get you started now.
Comments
Post a Comment