In a recent ODBMS.org interview, Adam Kocoloski, founder and CTO of Cloudant, described the lack of accurate simulations from machine-learning algorithms as the most difficult challenge in filtering Big Data to find useful information.

“People use machine learning-algorithms in many fields, and they don’t always understand the caveats of building in an appropriate training data set,” he said.

Kocoloski explained that if people apply training data without fully understanding how the process works, they wouldn’t realize when they’ve loaded improper data into their machine-learning algorithms.

When asked if machine learning was the right way to analyze Big Data, Kocoloski said it wasn’t a solution, but it does go beyond what any manually constructed analysis can do and offers the possibility to improve the signal-to-noise ratio.

“The potential is there, but you have to balance it with the need to understand the training data set,” he said.

“Algorithms have weak points. They have places where they fail. When you’re applying various machine-learning analyses, it’s important that you understand where those weak points are.”

The full interview is available here.