The Importance of the Right Metadata at the right time

As I have been digging deeper into research of Digital Asset Management there is a lot of talk about Metadata and the importance of adding as much metadata as you can to assets.  

On a quick tangent or shameless plug, I have started DAM Guild as a mentoring community so we can learn more about Asset Management solutions and best practices managing our assets.  Here is how you can take part:

LinkedIn: Join Dam Guild LinkedIn group
Facebook : Join Dam Guild Facebook group
Twitter: Follow @DamGuild

After you join one of the above continue reading below. 

Now some people may see an asset as content or a product but metadata applies to any type of item both digital and physical.  In the realm of data science metadata is very important but if you let it an item can have millions of pieces of metadata associated with it and processing can be a lot.

For those that are familiar with data science they know that the metadata assigned to an item is called a feature and the challenge is to know which features area important and which ones can be ignored. Let's do a little simple math.  Let's say you assign 20 features to an item.  Now you have 1 million items to process.  That gives you 20 million pieces of metadata to process.  If you look at the data set and determine that 2 features (pieces of metadata) are irrelevant then you save yourself 2 million pieces of metadata to process.  However if you remove the wrong features then you taint the data set and get the wrong results.  

It is a fine line when it comes to removing features that is why you need an experienced data scientist to help. If you need help with that I can introduce you to a team that does that all the time.  

For the sake of DAIN and DIANA we collect all the metadata we can on every item then we tag each feature with its own metadata that describes when it is most useful. That helps us narrow down when to use which feature and is how we optimize.  Currently the feature metadata is created by a human but as we progress the metadata on a feature could become more automated as well.

If you are starting a data science related project it is important to be collecting all the data to start. Remember you can always throw data out but it is more difficult or most of the time impossible to collect it later. 

Comments

Popular posts from this blog

At our core we are just a Brain in a Jar

Natural Language Processing

Building Your Own Natural Language Processor - Parts of Speech