Posts

Showing posts from February, 2019

At our core we are just a Brain in a Jar

For those fans of Dungeons and Dragons you will understand this concept. For those not familiar this article discusses Brain in a Jar .  The key concept here is based on this: The Brain in a Jar uses mainly psionic abilities to do what its lack of moving parts would otherwise prevent: move itself, manipulate objects and the environment, and ward off attackers. Its main attack is Mind Thrust, an assault upon the mind of another creature. In addition to this, it can also drive mad anyone who magically or psionically detects it, and it can control and rebuke other undead. Now let's look at this in the context of DAIN and DIANA or in this article I will just say DAIN for simplicity.  Think of DainJar as the outer layer that contains the executive suite which is responsible for making the brain perform core brain functions such as waking up, sleeping, napping and thinking. The executive suite also connects to the body but not the body as you know it. DAIN is all electronic so i...

Building Your Own Natural Language Processor - Parts of Speech

Although you may think this is an easy lookup it is not. For a first attempt you could do that and some simple sentences could work but if someone said "That dumbbell was as heavy as lead" then lead is a noun but if someone said "He lead the parade through town" then lead is an action, a verb. This is where context comes in and you need to establish rules on when lead is a noun and when it is a verb. If you look at our model you will notice that we do have context and we do have rules. The context is used during training and the rules are used to determine which Word is the right match. During training, you can play with different sentences and establish patterns on what works and what doesn't and create additional rules to resolve these conflicts. Tagging parts of speech is a nested for loop. For each sentence and for each word. Look up the word in the dictionary and if listed once then check the rules and if there is a match then use it. If there is more t...

Building Your Own Natural Language Processor - Tokenize

Now that we have sentences we need to break it into words. This phase is called "Tokenize". Tokenize: This is the process of taking each sentence and separating it into "words" or tokens. For a basic provider you can do a split into words. These functions can be found in the CSHARP.Text repository but I have placed them here as well.          /// <summary>         /// Splits a string into its words for manipulation         /// </summary>         /// <param name="toSplit">String to split into words</param>         /// <returns></returns>         /// <remarks>Uses default values to split words</remarks>         public List<string> SplitStringIntoWords(string toSplit)         {             return SplitStringIntoWords(...