In data analysis it is of crucial importance the selection of a compact subset of the available features that correctly represent the relevant information contained in the data. This problem, referred to as feature selection, is receiving increasing attention as the amount of information gathered in many settings of relevant practical importance is often of intractable dimensions for the application of advanced of data mining and data analysis algorithm. Feature selection problems are of particular importance in the analysis of biological data, such as microarrays, haplotype and genotype data, where the number of variables that describe each experiment can have extremely large size. In such cases the feature selection step plays a key role for the success of the analysis, of superior importance w.r.t. to the modelization and classification steps. A special class of models that has been proposed to solve this problem is based on integer programming formulations that try to represent the retained information using constraints associated with the observed data. Although very precise, these methods present the significant limit of being based on a computationally difficult formulation whose size grows rapidly with the size of the data. Indeed, to efficiently solve the resulting combinatorial optimization problem, we propose a GRASP (Greedy randomized Adaptive Search Procedure).

Problemi di feature selection modellati come problemi di programmazione lineare intera / Festa, Paola. - (2008). (Intervento presentato al convegno Bioinformatica e Biologia Computazionale in Campania - BBCC 2008 tenutosi a Istituto di Scienze dell'Alimentazione del CNR, Avellino, Italia nel 12 Dicembre 2008).

Problemi di feature selection modellati come problemi di programmazione lineare intera

FESTA, PAOLA
2008

Abstract

In data analysis it is of crucial importance the selection of a compact subset of the available features that correctly represent the relevant information contained in the data. This problem, referred to as feature selection, is receiving increasing attention as the amount of information gathered in many settings of relevant practical importance is often of intractable dimensions for the application of advanced of data mining and data analysis algorithm. Feature selection problems are of particular importance in the analysis of biological data, such as microarrays, haplotype and genotype data, where the number of variables that describe each experiment can have extremely large size. In such cases the feature selection step plays a key role for the success of the analysis, of superior importance w.r.t. to the modelization and classification steps. A special class of models that has been proposed to solve this problem is based on integer programming formulations that try to represent the retained information using constraints associated with the observed data. Although very precise, these methods present the significant limit of being based on a computationally difficult formulation whose size grows rapidly with the size of the data. Indeed, to efficiently solve the resulting combinatorial optimization problem, we propose a GRASP (Greedy randomized Adaptive Search Procedure).
2008
Problemi di feature selection modellati come problemi di programmazione lineare intera / Festa, Paola. - (2008). (Intervento presentato al convegno Bioinformatica e Biologia Computazionale in Campania - BBCC 2008 tenutosi a Istituto di Scienze dell'Alimentazione del CNR, Avellino, Italia nel 12 Dicembre 2008).
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/395196
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact