Data Mining: IARPA Seeks New Automatic Machine Learning Solutions

Data Mining: IARPA Seeks New Automatic Machine Learning Solutions
1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading...

“The U.S. Intelligence Advanced Research Projects Activity (IARPA) announced that it is looking for new ideas that may become the basis of cutting-edge machine-learning projects. “In many application areas, the amount of data to be analyzed has been increasing exponentially [sensors, audio and video, social network data, Web information], stressing even the most efficient procedures and most powerful processors,” according to IARPA. “Most of these data are unorganized and unlabeled and human effort is needed for annotation and to focus attention on those data that are significant.” IARPA’s request for information asks about proposed methods for the automation of architecture and algorithm selection and combination, feature engineering, and training data scheduling, as well as compelling reasons to use such approaches in a scalable multi-modal analytic system and whether supporting technologies are readily available. IARPA says that innovations in hierarchical architectures such as Deep Belief Nets and hierarchical clustering will be needed for useful automatic machine-learning systems. It wants to identify promising areas for investment and plans to hold a machine learning workshop in March 2012.”

IARPA: “Machine learning (ML) is used extensively in application areas of interest to IARPA including speech, language, vision, sensor processing, and multi-modal integration. Typically, expert practitioners in ML select appropriate architectures and algorithms for the application domain, performance requirements, and data characteristics of the problem at hand. Additionally, they engineer an appropriate set of features to be extracted from the data for use in the system design. Then, depending on the problem, data may be selected for training and scheduled for presentation to the system according to the requirements of the task. In some application areas, the data needed for training are extremely sparse, consisting of only a few instances, and important information may be missing, requiring the application of supplementary information and real-world knowledge for intelligent inference. In many other application areas, the amount of data to be analyzed has been increasing exponentially (sensors, audio and video, social network data, web information) stressing even the most efficient procedures and most powerful processors. Most of these data are unorganized and unlabeled and human effort is needed for annotation and to focus attention on those data that are significant.

The focus of this RFI is on recent advances toward automatic machine learning, including automation of architecture and algorithm selection and combination, feature engineering, and training data scheduling for usability by non-experts, as well as scalability for handling large volumes of data. Useful automatic machine learning systems will require significant innovations in the science and technology of machine learning, possibly including (but not limited to) hierarchical architectures like Deep Belief Nets and hierarchical clustering, methods for parallelization of computation, attentional mechanisms for focusing on data of significance, methods for transfer of previously learned knowledge to a new task, methods for incorporation of real-world knowledge to include human advisors and one-shot learning methods, methods to include different temporal scales and the effects of causality, the role of goals and environmental feedback in learning, and model selection from approaches like meta-learning.

KP "Visual Data Mining" 1Responses to this RFI will be used to help focus and organize an interactive workshop of selected machine-learning practitioners with the goal of eliciting plausible next steps through focused presentations of ideas and guided discussions.

 

Responses to this RFI should be as succinct as possible while providing specific information that addresses the following questions:

 

1. What are your proposed methods for (a) automation of architecture and algorithm selection and combination, (b) feature engineering, and (c) training data scheduling? How will these automation methods affect the usability of an analytic system by non-experts?

2. What are the compelling reasons to use your proposed approach in a scalable multi-modal analytic system?

3. How will your approach handle different time scales, missing data, and sparse data?

4. How will your approach be applied to diverse data, such as speech, language, vision, sensor processing, and multi-modal integration?

5. How will you supplement training data with real-world and previously learned knowledge?

6. What is known about your proposed approach? Please provide suitable references.

7. What are the appropriate metrics to measure performance?

8. What other solutions are being suggested to overcome the challenges in this RFI?

9. What is the timescale needed to demonstrate progress?

10. What are the data sets and other resources needed?

11. Are supporting technologies readily available or does new technology need to be created?

Source: FBO via ACM TechNews / Networkworld

Photos Credits:
Under Surveillance By simonbooth / FlickR
KP “Visual Data Mining” 1  By Bompo / FlickR