Semantic search, Classification and Data migration: the winning team

Seven years ago, I had a mission to perform data migration from one system to another. One of the major challenges was to import parts inside a hierarchy of categories, which have been designed in the new system. Analyzing the legacy data, another hierarchy had been set, but this hierarchy had more than 900 entries, so users had mostly used a wrong category, making this information totally unreliable.

So we estimated we could try to use the description field of parts to classify the objects, guessing that the users had used meaningful words to describe their objects. The method had to be found.

So I imagined an algorithm to do so. The method was to analyze the words of the description, and to compare those words to a dictionary, providing as well a multiplication factor to each word depending on its position in the description. In parallel, I built the technical dictionary analyzing the description of roughly 500 000 parts, founding the most used words.

I shown that more than 75% of the parts could be automatically migrated using this algorithm. For the remaining 25%, I built an application which was providing the list of parts to classify, and the possible categories available in the new system, and we asked to experts to manually classify the remaining parts. Having done that, I enriched my dictionary with some new words that I had not been able to imagine the meaning (including some funny ones…). With the new dictionnary, we could be able to automatically classify more than 90% of the parts.

Then we set up an automatic procedure using this algorithm in order to migrate data at night from the legacy system to the new one, as both systems were decided to run in parallel for a given period of time. This system ran for one year, until all project data was migrated to the new system. Then the migration system was stopped, and put on archive. I created a semantic search engine without knowing it.

Years after, I have now to implement a search engine based on Exalead search engine. This technology implements semantic options, and hopefully I can reuse the dictionary I built seven years ago to provide more value this new technology.

My conclusion today is that there are several lessons I learnt from this experience:

  • semantic search can help migrate data
  • semantic search can help classify data
  • data migration activity can bring value for future activities
  • companies should pay attention building technical dictionaries, compiling words that users are using everyday

Do we need SBA for Product data?

I had chance recently to work for a project to refund part of the IS system. This project was accumulating several applications serving different user communities like engineering, program, purchasing and more. One of the objectives of that project was to enhance access of data across different applications. Indeed, it is a common need for an average user to find quickly the data he is searching, even its own data. Working with an increasing number of applications, generating more and more data, users start to feel anxious simply finding the data they generated some month ago. And for sure this is even truer with the data generated by others.

This was strange at the start to admit that most of the existing applications were containing their own search engine, but was not able to satisfy people, and only basic search methods were used because they are known and safe, like searching by numbers. This method is a bit frustrating for most people, don’t knowing the numbers. The other method is to build classical search methods, based on navigation paths across the application, which paths are simply not manageable by users if they do not use the application everyday. A solution consists in implementing those complex search paths into specific search functions, but this is not satisfying all needs and constrains.

So we turned our head to a new technology, which was Exalead. This application was coming from a completely different world, the web world, trying to solve the issue managing huge volume of data, from different sources, and providing not a perfect result like any existing search engine, but allowing to the user to filter data from qualifiers found in the indexed data.

Then it brings back to me numerous old situations were I was trying to figure how to design data models, and associated business processes, but when at the end I wanted to retrieve data created, I was simply not able, because the way used to search data had to be the one used when designing the application: by number, by state,…

With that application, I could imagine search data from a transversal way, which was not the way the application was designed, but the way the users had used the application to enter data, which may be different. Do they enter the name of a customer in the dedicated field or in the description of the object, it does not matter. And we know how much time we spend defining methodologies and control methods, telling the user to enter data following very specific rules, with dash and no space, no slash, with # between the first and second terms, and so on.

Then it brought to me confusion, and make me do a step back. For sure, Google experience has passed before, and we are used to use now for personal usage one search field to find anything. But one of PLM applications objective is usually to provide ability to reuse data, and search capability is a key tool for that purpose. This new way retrieving data could be a source of innovation for product development activities. Because when we think re-use, we think reusing the exactly same data. In fact, today a lot of data exists, and while in the past it was natural to start brainstorming between a group of experts to design a new product, and get ideas from those experts, now a part of the building blocks for a new product are clearly available in the cloud, private or public. The important thing is now to have the right tool to find this data or at least similar data, and not spend too much time searching for it, to finally recreate it.

But the most exciting characteristic of Exalead was to be able to investigate simplification of the semantic challenges that a company with many sites worldwide has to manage everyday, with many languages and many different wording. This may lead to a new trend, building CPI, Cloud based Product Innovation.

So I see clear opportunities implementing SBA for Product data in the future. What do you think?

Is it easy to find data in a PLM application?


Do you know deep web? From wikipedia, “The deep Web “…” refers to World Wide Web content that is not part of the surface Web, which is indexed by standard search engines“. So making it short, web pages that we cannot access through our preferred search engines.

Oleg pointed out some time ago that PLM apps could not help searching deep corporate web! What about deep PLM? While PLM apps are always claiming they contain the product knowledge, the corporate data or other major information inside the company, it may happen that it’s quite difficult to find data into a PLM application. The PLM applications are true, the question is:  ”Can I find the data I am looking for into the PLM app?” And I have to admit that as our PLM apps are quite complex, the typical user may encounters difficulties finding what he is looking for.

Still following wikipedia,  the first reason to not find the data the user is searching is:  ”Dynamic content: dynamic pages which are returned in response to a submitted query or accessed only through a form, especially if open-domain input elements (such as text fields) are used; such fields are hard to navigate without domain knowledge“.

Domain knowledge. Here we are. A PLM app is usually “simply” coding business processes which are supposed to be well known by the entire user community. None should ignore the law. Yes. Unfortunately, laws are not always known, the user is guilty, his penalty will be to not find what is is searching for. The second penalty may be to re-create the data…

I would like to share with you an idea:  provide easy access to data in PLM application. Just a trend to make it easy:

Quick Access

This panel introduce domain knowledge, and may implement complex queries, bringing results to the user which should know how to enter the query, but don’t know how to browse data to reach the result.

What do you think?