Fujitsu Laboratories Ltd. says it has developed a machine-learning technology that can generate highly accurate predictive models from datasets of more than 50 million records in a matter of hours.
Current techniques for generating highly accurate predictive models need to examine every combination of learning algorithm and configuration, taking more than one week to learn from a dataset containing 50 million records.
Fujitsu Laboratories has now developed a technology that estimates machine-learning results from a small set of sample data and the accuracy of past predictive models, extracts the learning algorithm and configuration combination that produce the most accurate result, and applies it to the larger dataset. This results in highly accurate predictive models from datasets of 50 million records in a few hours.
Predictive models produced by this technology can work to quickly make improvements, such as minimizing membership cancellations on e-commerce websites and enhancing response times to equipment failures.
Details of this technology are being presented at the meeting of the Information-Based Induction Sciences and Machine Learning (ISIMBL), opening Monday, September 14 at Ehime University in Japan.
The popularity of smartphones and other advances make it possible to gather massive quantities of sensor data, and machine learning and other advanced analytic techniques are being used extensively to extract valuable information from that data. Using the access logs of e-commerce websites, for example, it is possible to discover when people are most likely to cancel memberships on a given website, to identify those people quickly, and to take measures to discourage cancellation.
Using detailed daily power-consumption data, it is possible to discover patterns of increased or decreased usage and to predict periods and times when power usage will increase. This can lead to a reduction in power costs by applying more precise controls over power generation, transmission, and storage. Developing predictive models by machine learning is considered an effective way to obtain accurate predictions.
There are numerous methods for machine-learning algorithms, each for a different purpose, and they all differ in their predictive accuracy and run time. The algorithm that will produce the best accuracy will depend on the data being analyzed, and getting the most accurate predictions will also depend on fine-tuning its configuration. Therefore, generating an effective predictive model requires examining combinations of algorithms and configurations.
Attempting to examine every possible combination of algorithm and conditions causes the number of combinations to balloon quickly. Furthermore, learning time of a combination can take days to examine, making it impractical to use machine learning extensively. Instead, algorithms and conditions are typically selected by analysts based on their experience, so the results ultimately depend heavily on the analyst’s skill.
In cases where the volume of data is great and analysis ends up taking more than one night, examinations are usually limited to a restricted number of combinations, or analysis can only be applied to a small portion of the data, and it is not possible to automatically derive accurate predictive models in a limited period of time.