Publications

Books     Journals    Conferences    Other

Book Chapters

2009

  • F. E. Psomopoulos and P. A. Mitkas, “Data Mining in Proteomics using Grid Computing,” in Handbook of research on computational grid technologies for lifesciences, biomedicine and healthcare, M. Cannataro, Ed., UK: IGI Global, 2009, pp. 245-267. doi:10.4018/978-1-60566-374-6
    [BibTeX] [Abstract] [Download PDF]

    The scope of this chapter is the presentation of Data Mining techniques for knowledge extraction in proteomics, taking into account both the particular features of most proteomics issues (such as data retrieval and system complexity), and the opportunities and constraints found in a Grid environment. The chapter discusses the way new and potentially useful knowledge can be extracted from proteomics data, utilizing Grid resources in a transparent way. Protein classification is introduced as a current research issue in proteomics, which also demonstrates most of the domain – specific traits. An overview of common and custom-made Data Mining algorithms is provided, with emphasis on the specific needs of protein classification problems. A unified methodology is presented for complex Data Mining processes on the Grid, highlighting the different application types and the benefits and drawbacks in each case. Finally, the methodology is validated through real-world case studies, deployed over the EGEE grid environment.

    @incollection{DataMiningProteomicsGrid,
    abstract = {The scope of this chapter is the presentation of Data Mining techniques for knowledge extraction in proteomics, taking into account both the particular features of most proteomics issues (such as data retrieval and system complexity), and the opportunities and constraints found in a Grid environment. The chapter discusses the way new and potentially useful knowledge can be extracted from proteomics data, utilizing Grid resources in a transparent way. Protein classification is introduced as a current research issue in proteomics, which also demonstrates most of the domain – specific traits. An overview of common and custom-made Data Mining algorithms is provided, with emphasis on the specific needs of protein classification problems. A unified methodology is presented for complex Data Mining processes on the Grid, highlighting the different application types and the benefits and drawbacks in each case. Finally, the methodology is validated through real-world case studies, deployed over the EGEE grid environment.},
    address = {UK},
    author = {Psomopoulos, Fotis E and Mitkas, Pericles A},
    booktitle = {Handbook of Research on Computational Grid Technologies for LifeSciences, Biomedicine and Healthcare},
    chapter = {13},
    doi = {10.4018/978-1-60566-374-6},
    editor = {Cannataro, Mario},
    isbn = {9781605663746},
    keywords = {Data Mining,Grid Computing,Protein Classification},
    mendeley-tags = {Data Mining,Grid Computing,Protein Classification},
    month = {may},
    pages = {245--267},
    publisher = {IGI Global},
    title = {{Data Mining in Proteomics using Grid Computing}},
    url = {http://services.igi-global.com/resolvedoi/resolve.aspx?doi=10.4018/978-1-60566-374-6},
    year = {2009}
    }

Journals

2017

  • C. Zieliński, M. Stefańczyk, T. Kornuta, M. Figat, W. Dudek, W. Szynkiewicz, W. Kasprzak, J. Figat, M. Szlenk, T. Winiarski, K. Banachowicz, T. Zielińska, E. G. Tsardoulias, A. L. Symeonidis, F. E. Psomopoulos, A. M. Kintsakis, P. A. Mitkas, A. Thallas, S. E. Reppou, G. T. Karagiannis, K. Panayiotou, V. Prunet, M. Serrano, J. Merlet, S. Arampatzis, A. Giokas, L. Penteridis, I. Trochidis, D. Daney, and M. Iturburu, “Variable structure robot control systems: the rapp approach,” Robotics and autonomous systems, vol. 94, pp. 226-244, 2017. doi:http://doi.org/10.1016/j.robot.2017.05.002
    [BibTeX] [Download PDF]
    @article{RoboticsAutonomousSystems2017,
    title = {Variable structure robot control systems: The RAPP approach},
    journal = {Robotics and Autonomous Systems},
    volume = {94},
    pages = {226 - 244},
    year = {2017},
    doi = {http://doi.org/10.1016/j.robot.2017.05.002},
    url = {http://www.sciencedirect.com/science/article/pii/S0921889016306248},
    author = {Cezary Zieliński and Maciej Stefańczyk and Tomasz Kornuta and Maksym Figat and Wojciech Dudek and Wojciech Szynkiewicz and Włodzimierz Kasprzak and Jan Figat and Marcin Szlenk and Tomasz Winiarski and Konrad Banachowicz and Teresa Zielińska and Emmanouil G. Tsardoulias and Andreas L. Symeonidis and Fotis E. Psomopoulos and Athanassios M. Kintsakis and Pericles A. Mitkas and Aristeidis Thallas and Sofia E. Reppou and George T. Karagiannis and Konstantinos Panayiotou and Vincent Prunet and Manuel Serrano and Jean-Pierre Merlet and Stratos Arampatzis and Alexandros Giokas and Lazaros Penteridis and Ilias Trochidis and David Daney and Miren Iturburu}
    }

  • A. Xanthopoulou, I. Ganopoulos, F. Psomopoulos, M. Manioudaki, T. Moysiadis, A. Kapazoglou, M. Osathanunkul, S. Michailidou, A. Kalivas, A. Tsaftaris, I. Nianiou-Obeidat, and P. Madesis, “De novo comparative transcriptome analysis of genes involved in fruit morphology of pumpkin cultivars with extreme size difference and development of est-ssr markers,” Gene, p. -, 2017. doi:http://doi.org/10.1016/j.gene.2017.04.035
    [BibTeX] [Abstract] [Download PDF]

    Abstract The genetic basis of fruit size and shape was investigated for the first time in Cucurbita species and genetic loci associated with fruit morphology have been identified. Although extensive genomic resources are available at present for tomato (Solanum lycopersicum), cucumber (Cucumis sativus), melon (Cucumis melo) and watermelon (Citrullus lanatus), genomic databases for Cucurbita species are limited. Recently, our group reported the generation of pumpkin (Cucurbita pepo) transcriptome databases from two contrasting cultivars with extreme fruit sizes. In the current study we used these databases to perform comparative transcriptome analysis in order to identify genes with potential roles in fruit morphology and fruit size. Differential Gene Expression (DGE) analysis between cv. ‘Munchkin’ (small-fruit) and cv. ‘Big Moose’ (large-fruit) revealed a variety of candidate genes associated with fruit morphology with significant differences in gene expression between the two cultivars. In addition, we have set the framework for generating EST-SSR markers, which discriminate different C. pepo cultivars and show transferability to related Cucurbitaceae species. The results of the present study will contribute to both further understanding the molecular mechanisms regulating fruit morphology and furthermore identifying the factors that determine fruit size. Moreover, they may lead to the development of molecular marker tools for selecting genotypes with desired morphological traits.

    @article{CucurbitaGene2017,
    title = {De novo comparative transcriptome analysis of genes involved in fruit morphology of pumpkin cultivars with extreme size difference and development of EST-SSR markers},
    journal = {Gene},
    pages = { - },
    year = {2017},
    issn = {0378-1119},
    doi = {http://doi.org/10.1016/j.gene.2017.04.035},
    url = {http://www.sciencedirect.com/science/article/pii/S037811191730286X},
    author = {Aliki Xanthopoulou and Ioannis Ganopoulos and Fotis Psomopoulos and Maria Manioudaki and Theodoros Moysiadis and Aliki Kapazoglou and Maslin Osathanunkul and Sofia Michailidou and Apostolos Kalivas and Athanasios Tsaftaris and Irini Nianiou-Obeidat and Panagiotis Madesis},
    abstract = {Abstract The genetic basis of fruit size and shape was investigated for the first time in Cucurbita species and genetic loci associated with fruit morphology have been identified. Although extensive genomic resources are available at present for tomato (Solanum lycopersicum), cucumber (Cucumis sativus), melon (Cucumis melo) and watermelon (Citrullus lanatus), genomic databases for Cucurbita species are limited. Recently, our group reported the generation of pumpkin (Cucurbita pepo) transcriptome databases from two contrasting cultivars with extreme fruit sizes. In the current study we used these databases to perform comparative transcriptome analysis in order to identify genes with potential roles in fruit morphology and fruit size. Differential Gene Expression (DGE) analysis between cv. ‘Munchkin’ (small-fruit) and cv. ‘Big Moose’ (large-fruit) revealed a variety of candidate genes associated with fruit morphology with significant differences in gene expression between the two cultivars. In addition, we have set the framework for generating EST-SSR markers, which discriminate different C. pepo cultivars and show transferability to related Cucurbitaceae species. The results of the present study will contribute to both further understanding the molecular mechanisms regulating fruit morphology and furthermore identifying the factors that determine fruit size. Moreover, they may lead to the development of molecular marker tools for selecting genotypes with desired morphological traits.}
    }

  • F. E. Psomopoulos, D. M. Vitsios, S. Baichoo, and C. A. Ouzounis, “BioPAXViz: a cytoscape application for the visual exploration of metabolic pathway evolution,” Bioinformatics, pp. 1-3, 2017. doi:10.1093/bioinformatics/btw813
    [BibTeX]
    @article{BioPaxViz2017,
    author = {Psomopoulos, Fotis E. and Vitsios, Dimitrios M. and Baichoo, Shakuntala and Ouzounis, Christos A.},
    title = {{BioPAXViz: a cytoscape application for the visual exploration of metabolic pathway evolution}},
    journal = {Bioinformatics},
    pages = {1--3},
    doi = {10.1093/bioinformatics/btw813},
    year = {2017}
    }

2016

  • A. M. Kintsakis, F. E. Psomopoulos, and P. A. Mitkas, “Data-aware optimization of bioinformatics workflows in hybrid clouds,” Journal of big data, vol. 3, iss. 20, pp. 1-26, 2016. doi:10.3389/fpls.2016.00554
    [BibTeX] [Download PDF]
    @article{KintsakisBigData2016,
    author = {Kintsakis, Athanassios M. and Psomopoulos, Fotis E. and Mitkas, Pericles A.},
    title = {{Data-aware optimization of bioinformatics workflows in hybrid clouds}},
    journal = {Journal of Big Data},
    volume = {3},
    number = {20},
    keywords = {Cloud computing, Component-based workflows, Bioinformatics, Big data management, Hybrid cloud, Comparative genomics},
    pages = {1--26},
    url = {http://journalofbigdata.springeropen.com/articles/10.1186/s40537-016-0055-2},
    doi = {10.3389/fpls.2016.00554},
    year = {2016},
    url = {https://journalofbigdata.springeropen.com/articles/10.1186/s40537-016-0055-2}
    }

  • A. Zambounis, F. E. Psomopoulos, I. Ganopoulos, E. Avramidou, F. A. Aravanopoulos, A. Tsaftaris, and P. Madesis, “In silico analysis of the LRR receptor-like serine threonine kinases subfamily in Morus notabilis,” Plant omics journal, vol. 9, iss. 5, pp. 319-326, 2016. doi:10.21475/poj.09.05.16.pne126
    [BibTeX] [Download PDF]
    @article{PlantOmics2016,
    author = {Zambounis, Antonios and Psomopoulos, Fotis E. and Ganopoulos, Ioannis and Avramidou, Evangelia and Aravanopoulos, Filippos A. and Tsaftaris, Athanasios and Madesis, Panagiotis},
    journal = {Plant Omics Journal},
    volume = {9},
    number = {5},
    pages = {319--326},
    issn = {18363644},
    month = {jun},
    title = {{In silico analysis of the LRR receptor-like serine threonine kinases subfamily in Morus notabilis}},
    doi = {10.21475/poj.09.05.16.pne126},
    year = {2016},
    url = {http://www.pomics.com/zambounis_9_5_2016_319_326.pdf}
    }

  • E. G. Tsardoulias, A. M. Kintsakis, K. Panayiotou, A. G. Thallas, S. E. Reppou, G. T. Karagiannis, M. Iturburu, S. Arampatzis, C. Zielinski, V. Prunet, F. E. Psomopoulos, A. L. Symeonidis, and P. A. Mitkas, “Towards an integrated robotics architecture for social inclusion – The RAPP paradigm,” Journal of cognitive systems research, pp. 1-12, 2016. doi:10.1016/j.cogsys.2016.08.004
    [BibTeX] [Abstract] [Download PDF]

    Scientific breakthroughs have led to an increase in life expectancy, to the point where senior citizens comprise an ever increasing percentage of the general population. In this direction, the EU funded RAPP project “Robotic Applications for Delivering Smart User Empowering Applications” introduces socially interactive robots that will not only physically assist, but also serve as a companion to senior citizens. The proposed RAPP framework has been designed aiming towards a cloud-based integrated approach that enables robotic devices to seamlessly deploy robotic applications, relieving the actual robots from computational burdens. The Robotic Applications (RApps) developed according to the RAPP paradigm will empower consumer social robots, allowing them to adapt to versatile situations and materialize complex behaviors and scenarios. The RAPP pilot cases involve the development of RApps for the NAO humanoid robot and the ANG-MED rollator targeting senior citizens that (a) are technology illiterate, (b) have been diagnosed with mild cognitive impairment or (c) are in the process of hip fracture rehabilitation. Initial results establish the robustness of RAPP in addressing the needs of end users and developers, as well as its contribution in significantly increasing the quality of life of senior citizens/

    @article{Tsardoulias2016,
    author={Tsardoulias, Emmanouil G. and Kintsakis, Athanassios M. and Panayiotou, Konstantinos and Thallas, Aris G. and Reppou, Sofia E. and Karagiannis, George T. and Iturburu, Miren and Arampatzis, Stratos and Zielinski, Cezary and Prunet, Vincent and Psomopoulos, Fotis E. and Symeonidis, Andreas L. and Mitkas, Pericles A.},
    title={{Towards an integrated robotics architecture for social inclusion – The RAPP paradigm}},
    journal={Journal of Cognitive Systems Research},
    year={2016},
    pages={1--12},
    abstract={Scientific breakthroughs have led to an increase in life expectancy, to the point where senior citizens comprise an ever increasing percentage of the general population. In this direction, the EU funded RAPP project “Robotic Applications for Delivering Smart User Empowering Applications” introduces socially interactive robots that will not only physically assist, but also serve as a companion to senior citizens. The proposed RAPP framework has been designed aiming towards a cloud-based integrated approach that enables robotic devices to seamlessly deploy robotic applications, relieving the actual robots from computational burdens. The Robotic Applications (RApps) developed according to the RAPP paradigm will empower consumer social robots, allowing them to adapt to versatile situations and materialize complex behaviors and scenarios. The RAPP pilot cases involve the development of RApps for the NAO humanoid robot and the ANG-MED rollator targeting senior citizens that (a) are technology illiterate, (b) have been diagnosed with mild cognitive impairment or (c) are in the process of hip fracture rehabilitation. Initial results establish the robustness of RAPP in addressing the needs of end users and developers, as well as its contribution in significantly increasing the quality of life of senior citizens/},
    doi={10.1016/j.cogsys.2016.08.004},
    url={http://dx.doi.org/10.1016/j.cogsys.2016.08.004}
    }

  • S. E. Reppou, E. G. Tsardoulias, A. M. Kintsakis, A. L. Symeonidis, P. A. Mitkas, F. E. Psomopoulos, G. T. Karagiannis, C. Zielinski, V. Prunet, J. Merlet, M. Iturburu, and A. Gkiokas, “RAPP: A Robotic-Oriented Ecosystem for Delivering Smart User Empowering Applications for Older People,” International journal of social robotics, pp. 1-14, 2016. doi:10.1007/s12369-016-0361-z
    [BibTeX] [Abstract] [Download PDF]

    It is a general truth that increase of age is associated with a level of mental and physical decline but unfortunately the former are often accompanied by social exclusion leading to marginalization and eventually further acceleration of the aging process. A new approach in alleviating the social exclusion of older people involves the use of assistive robots. As robots rapidly invade everyday life, the need of new software paradigms in order to address the user’s unique needs becomes critical. In this paper we present a novel architectural design, the RAPP [a software platform to deliver smart, user empowering robotic applications (RApps)] framework that attempts to address this issue. The proposed framework has been designed in a cloud-based approach, integrating robotic devices and their respective applications. We aim to facilitate seamless development of RApps compatible with a wide range of supported robots and available to the public through a unified online store.

    @article{Reppou2016,
    author={Reppou, Sofia E. and Tsardoulias, Emmanouil G. and Kintsakis, Athanassios M. and Symeonidis, Andreas L. and Mitkas, Pericles A. and Psomopoulos, Fotis E. and Karagiannis, George T. and Zielinski, Cezary and Prunet, Vincent and Merlet, Jean-Pierre and Iturburu, Miren and Gkiokas, Alexandros},
    title={{RAPP: A Robotic-Oriented Ecosystem for Delivering Smart User Empowering Applications for Older People}},
    journal={International Journal of Social Robotics},
    year={2016},
    pages={1--14},
    abstract={It is a general truth that increase of age is associated with a level of mental and physical decline but unfortunately the former are often accompanied by social exclusion leading to marginalization and eventually further acceleration of the aging process. A new approach in alleviating the social exclusion of older people involves the use of assistive robots. As robots rapidly invade everyday life, the need of new software paradigms in order to address the user's unique needs becomes critical. In this paper we present a novel architectural design, the RAPP [a software platform to deliver smart, user empowering robotic applications (RApps)] framework that attempts to address this issue. The proposed framework has been designed in a cloud-based approach, integrating robotic devices and their respective applications. We aim to facilitate seamless development of RApps compatible with a wide range of supported robots and available to the public through a unified online store.},
    issn={1875-4805},
    doi={10.1007/s12369-016-0361-z},
    url={http://dx.doi.org/10.1007/s12369-016-0361-z}
    }

  • M. Chatzidimopoulos, F. Psomopoulos, E. E. Malandrakis, I. Ganopoulos, P. Madesis, E. K. Vellios, and P. Drogoudi, “Comparative Genomics of Botrytis cinerea Strains with Differential Multi-Drug Resistance,” Frontiers in plant science, vol. 7, iss. April, pp. 1-3, 2016. doi:10.3389/fpls.2016.00554
    [BibTeX] [Download PDF]
    @article{Chatzidimopoulos2016,
    author = {Chatzidimopoulos, Michael and Psomopoulos, Fotis and Malandrakis, Emmanouil E. and Ganopoulos, Ioannis and Madesis, Panagiotis and Vellios, Evangelos K. and Drogoudi, Pavlina},
    doi = {10.3389/fpls.2016.00554},
    issn = {1664-462X},
    journal = {Frontiers in Plant Science},
    keywords = {Botrytis cinerea, anilinopyrimidines, whole genome,anilinopyrimidines,botrytis cinerea,next-generation sequencing,ngs,whole genome},
    month = {apr},
    number = {April},
    pages = {1--3},
    title = {{Comparative Genomics of Botrytis cinerea Strains with Differential Multi-Drug Resistance}},
    url = {http://journal.frontiersin.org/article/10.3389/fpls.2016.00554},
    volume = {7},
    year = {2016}
    }

  • A. Xanthopoulou, F. Psomopoulos, I. Ganopoulos, M. Manioudaki, A. Tsaftaris, I. Nianiou-Obeidat, and P. Madesis, “De novo transcriptome assembly of two contrasting pumpkin cultivars,” Genomics data, vol. 7, pp. 200-201, 2016. doi:10.1016/j.gdata.2016.01.006
    [BibTeX] [Download PDF]
    @article{Xanthopoulou2016,
    author = {Xanthopoulou, Aliki and Psomopoulos, Fotis and Ganopoulos, Ioannis and Manioudaki, Maria and Tsaftaris, Athanasios and Nianiou-Obeidat, Irini and Madesis, Panagiotis},
    doi = {10.1016/j.gdata.2016.01.006},
    issn = {22135960},
    journal = {Genomics Data},
    keywords = {Contrasting cultivars,Cucurbita pepo,Pumpkin,RNA-Seq},
    month = {mar},
    pages = {200--201},
    publisher = {Elsevier B.V.},
    title = {{De novo transcriptome assembly of two contrasting pumpkin cultivars}},
    url = {http://linkinghub.elsevier.com/retrieve/pii/S221359601630006X},
    volume = {7},
    year = {2016}
    }

2015

  • A. M. S. Duarte, F. E. Psomopoulos, C. Blanchet, A. M. J. J. Bonvin, M. Corpas, A. Franc, R. C. Jimenez, J. M. de Lucas, T. Nyrönen, G. Sipos, and S. B. Suhr, “Future opportunities and trends for e-infrastructures and life sciences: going beyond the grid to enable life science data analysis.,” Frontiers in genetics, vol. 6, iss. June, p. 197, 2015. doi:10.3389/fgene.2015.00197
    [BibTeX] [Abstract] [Download PDF]

    With the increasingly rapid growth of data in life sciences we are witnessing a major transition in the way research is conducted, from hypothesis-driven studies to data-driven simulations of whole systems. Such approaches necessitate the use of large-scale computational resources and e-infrastructures, such as the European Grid Infrastructure (EGI). EGI, one of key the enablers of the digital European Research Area, is a federation of resource providers set up to deliver sustainable, integrated and secure computing services to European researchers and their international partners. Here we aim to provide the state of the art of Grid/Cloud computing in EU research as viewed from within the field of life sciences, focusing on key infrastructures and projects within the life sciences community. Rather than focusing purely on the technical aspects underlying the currently provided solutions, we outline the design aspects and key characteristics that can be identified across major research approaches. Overall, we aim to provide significant insights into the road ahead by establishing ever-strengthening connections between EGI as a whole and the life sciences community.

    @article{Duarte2015,
    abstract = {With the increasingly rapid growth of data in life sciences we are witnessing a major transition in the way research is conducted, from hypothesis-driven studies to data-driven simulations of whole systems. Such approaches necessitate the use of large-scale computational resources and e-infrastructures, such as the European Grid Infrastructure (EGI). EGI, one of key the enablers of the digital European Research Area, is a federation of resource providers set up to deliver sustainable, integrated and secure computing services to European researchers and their international partners. Here we aim to provide the state of the art of Grid/Cloud computing in EU research as viewed from within the field of life sciences, focusing on key infrastructures and projects within the life sciences community. Rather than focusing purely on the technical aspects underlying the currently provided solutions, we outline the design aspects and key characteristics that can be identified across major research approaches. Overall, we aim to provide significant insights into the road ahead by establishing ever-strengthening connections between EGI as a whole and the life sciences community.},
    author = {Duarte, Afonso M. S. and Psomopoulos, Fotis E. and Blanchet, Christophe and Bonvin, Alexandre M. J. J. and Corpas, Manuel and Franc, Alain and Jimenez, Rafael C. and de Lucas, Jesus M. and Nyr{\"{o}}nen, Tommi and Sipos, Gergely and Suhr, Stephanie B.},
    doi = {10.3389/fgene.2015.00197},
    issn = {1664-8021},
    journal = {Frontiers in genetics},
    keywords = {Big Data,Cloud computing,Grid computing,e-infrastructures,life sciences},
    number = {June},
    pages = {197},
    pmid = {26157454},
    title = {{Future opportunities and trends for e-infrastructures and life sciences: going beyond the grid to enable life science data analysis.}},
    url = {http://journal.frontiersin.org/Article/10.3389/fgene.2015.00197/abstract http://www.ncbi.nlm.nih.gov/pubmed/26157454 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC4477178},
    volume = {6},
    year = {2015}
    }

  • D. M. Vitsios, F. E. Psomopoulos, P. a. Mitkas, and C. a. Ouzounis, “Inference of Pathway Decomposition Across Multiple Species Through Gene Clustering,” International journal on artificial intelligence tools, vol. 24, iss. 01, p. 1540003, 2015. doi:10.1142/S0218213015400035
    [BibTeX] [Download PDF]
    @article{Vitsios2015,
    author = {Vitsios, Dimitrios M. and Psomopoulos, Fotis E. and Mitkas, Pericles a. and Ouzounis, Christos a.},
    doi = {10.1142/S0218213015400035},
    issn = {0218-2130},
    journal = {International Journal on Artificial Intelligence Tools},
    month = {feb},
    number = {01},
    pages = {1540003},
    title = {{Inference of Pathway Decomposition Across Multiple Species Through Gene Clustering}},
    url = {http://www.worldscientific.com/doi/abs/10.1142/S0218213015400035},
    volume = {24},
    year = {2015}
    }

2013

  • F. E. Psomopoulos, P. A. Mitkas, and C. A. Ouzounis, “Detection of genomic idiosyncrasies using fuzzy phylogenetic profiles.,” Plos one, vol. 8, iss. 1, p. e52854, 2013. doi:10.1371/journal.pone.0052854
    [BibTeX] [Abstract] [Download PDF]

    Phylogenetic profiles express the presence or absence of genes and their homologs across a number of reference genomes. They have emerged as an elegant representation framework for comparative genomics and have been used for the genome-wide inference and discovery of functionally linked genes or metabolic pathways. As the number of reference genomes grows, there is an acute need for faster and more accurate methods for phylogenetic profile analysis with increased performance in speed and quality. We propose a novel, efficient method for the detection of genomic idiosyncrasies, i.e. sets of genes found in a specific genome with peculiar phylogenetic properties, such as intra-genome correlations or inter-genome relationships. Our algorithm is a four-step process where genome profiles are first defined as fuzzy vectors, then discretized to binary vectors, followed by a de-noising step, and finally a comparison step to generate intra- and inter-genome distances for each gene profile. The method is validated with a carefully selected benchmark set of five reference genomes, using a range of approaches regarding similarity metrics and pre-processing stages for noise reduction. We demonstrate that the fuzzy profile method consistently identifies the actual phylogenetic relationship and origin of the genes under consideration for the majority of the cases, while the detected outliers are found to be particular genes with peculiar phylogenetic patterns. The proposed method provides a time-efficient and highly scalable approach for phylogenetic stratification, with the detected groups of genes being either similar to their own genome profile or different from it, thus revealing atypical evolutionary histories.

    @article{FuzzyBMCBioinformatics,
    abstract = {Phylogenetic profiles express the presence or absence of genes and their homologs across a number of reference genomes. They have emerged as an elegant representation framework for comparative genomics and have been used for the genome-wide inference and discovery of functionally linked genes or metabolic pathways. As the number of reference genomes grows, there is an acute need for faster and more accurate methods for phylogenetic profile analysis with increased performance in speed and quality. We propose a novel, efficient method for the detection of genomic idiosyncrasies, i.e. sets of genes found in a specific genome with peculiar phylogenetic properties, such as intra-genome correlations or inter-genome relationships. Our algorithm is a four-step process where genome profiles are first defined as fuzzy vectors, then discretized to binary vectors, followed by a de-noising step, and finally a comparison step to generate intra- and inter-genome distances for each gene profile. The method is validated with a carefully selected benchmark set of five reference genomes, using a range of approaches regarding similarity metrics and pre-processing stages for noise reduction. We demonstrate that the fuzzy profile method consistently identifies the actual phylogenetic relationship and origin of the genes under consideration for the majority of the cases, while the detected outliers are found to be particular genes with peculiar phylogenetic patterns. The proposed method provides a time-efficient and highly scalable approach for phylogenetic stratification, with the detected groups of genes being either similar to their own genome profile or different from it, thus revealing atypical evolutionary histories.},
    author = {Psomopoulos, Fotis E and Mitkas, Pericles A and Ouzounis, Christos A},
    doi = {10.1371/journal.pone.0052854},
    issn = {1932-6203},
    journal = {PloS one},
    keywords = {Algorithm,Archaea,Archaea: genetics,Archaeal,Archaeal: genetics,Bacteria,Bacteria: genetics,Bacterial,Bacterial: genetics,Bioinformatics,Fuzzy,Fuzzy Logic,Genes,Genome,Phylogenetic Profiles,Phylogeny,Reproducibility of Results,Species Specificity},
    mendeley-tags = {Algorithm,Bioinformatics,Fuzzy,Phylogenetic Profiles},
    month = {jan},
    number = {1},
    pages = {e52854},
    pmid = {23341912},
    publisher = {PLOS},
    title = {{Detection of genomic idiosyncrasies using fuzzy phylogenetic profiles.}},
    url = {http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3544837{\&}tool=pmcentrez{\&}rendertype=abstract},
    volume = {8},
    year = {2013}
    }

2012

  • F. E. Psomopoulos, V. I. Siarkou, N. Papanikolaou, I. Iliopoulos, A. S. Tsaftaris, V. J. Promponas, and C. A. Ouzounis, “The chlamydiales pangenome revisited: structural stability and functional coherence.,” Genes, vol. 3, iss. 2, pp. 291-319, 2012. doi:10.3390/genes3020291
    [BibTeX] [Abstract] [Download PDF]

    The entire publicly available set of 37 genome sequences from the bacterial order Chlamydiales has been subjected to comparative analysis in order to reveal the salient features of this pangenome and its evolutionary history. Over 2,000 protein families are detected across multiple species, with a distribution consistent to other studied pangenomes. Of these, there are 180 protein families with multiple members, 312 families with exactly 37 members corresponding to core genes, 428 families with peripheral genes with varying taxonomic distribution and finally 1,125 smaller families. The fact that, even for smaller genomes of Chlamydiales, core genes represent over a quarter of the average protein complement, signifies a certain degree of structural stability, given the wide range of phylogenetic relationships within the group. In addition, the propagation of a corpus of manually curated annotations within the discovered core families reveals key functional properties, reflecting a coherent repertoire of cellular capabilities for Chlamydiales. We further investigate over 2,000 genes without homologs in the pangenome and discover two new protein sequence domains. Our results, supported by the genome-based phylogeny for this group, are fully consistent with previous analyses and current knowledge, and point to future research directions towards a better understanding of the structural and functional properties of Chlamydiales.

    @article{GenesChlamydiae,
    abstract = {The entire publicly available set of 37 genome sequences from the bacterial order Chlamydiales has been subjected to comparative analysis in order to reveal the salient features of this pangenome and its evolutionary history. Over 2,000 protein families are detected across multiple species, with a distribution consistent to other studied pangenomes. Of these, there are 180 protein families with multiple members, 312 families with exactly 37 members corresponding to core genes, 428 families with peripheral genes with varying taxonomic distribution and finally 1,125 smaller families. The fact that, even for smaller genomes of Chlamydiales, core genes represent over a quarter of the average protein complement, signifies a certain degree of structural stability, given the wide range of phylogenetic relationships within the group. In addition, the propagation of a corpus of manually curated annotations within the discovered core families reveals key functional properties, reflecting a coherent repertoire of cellular capabilities for Chlamydiales. We further investigate over 2,000 genes without homologs in the pangenome and discover two new protein sequence domains. Our results, supported by the genome-based phylogeny for this group, are fully consistent with previous analyses and current knowledge, and point to future research directions towards a better understanding of the structural and functional properties of Chlamydiales.},
    author = {Psomopoulos, Fotis E and Siarkou, Victoria I and Papanikolaou, Nikolas and Iliopoulos, Ioannis and Tsaftaris, Athanasios S and Promponas, Vasilis J and Ouzounis, Christos A},
    doi = {10.3390/genes3020291},
    issn = {2073-4425},
    journal = {Genes},
    keywords = {Bioinformatics,Chlamydiales,Pangenome},
    mendeley-tags = {Bioinformatics,Chlamydiales,Pangenome},
    month = {jan},
    number = {2},
    pages = {291--319},
    pmid = {24704919},
    publisher = {MDPI, Basel, Switzerland},
    title = {{The chlamydiales pangenome revisited: structural stability and functional coherence.}},
    url = {http://www.mdpi.com/2073-4425/3/2/291/ http://www.ncbi.nlm.nih.gov/pubmed/24704919},
    volume = {3},
    year = {2012}
    }

  • F. E. Psomopoulos and P. A. Mitkas, “Multi-Level Clustering of Phylogenetic Profiles,” International journal on artificial intelligence tools (special issue on bioinformatics and bioengineering), vol. 21, iss. 5, p. 1240023, 2012. doi:10.1142/S0218213012400234
    [BibTeX] [Abstract] [Download PDF]

    The prediction of gene function from genome sequences is one of the main issues in Bioinformatics. Most computational approaches are based on the similarity between sequences to infer gene function. However, the availability of several fully sequenced genomes has enabled alternative approaches, such as phylogenetic profiles. Phylogenetic profiles are vectors which indicate the presence or absence of a gene in other genomes. The main concept of phylogenetic profiles is that proteins participating in a common structural complex or metabolic pathway are likely to evolve in a correlated fashion. In this paper, a multi-level clustering algorithm of phylogenetic profiles is presented, which aims to detect inter- and intra-genome gene clusters.

    @article{MultiLevelClusteringJournal,
    abstract = {The prediction of gene function from genome sequences is one of the main issues in Bioinformatics. Most computational approaches are based on the similarity between sequences to infer gene function. However, the availability of several fully sequenced genomes has enabled alternative approaches, such as phylogenetic profiles. Phylogenetic profiles are vectors which indicate the presence or absence of a gene in other genomes. The main concept of phylogenetic profiles is that proteins participating in a common structural complex or metabolic pathway are likely to evolve in a correlated fashion. In this paper, a multi-level clustering algorithm of phylogenetic profiles is presented, which aims to detect inter- and intra-genome gene clusters.},
    author = {Psomopoulos, Fotis E and Mitkas, Pericles A},
    doi = {10.1142/S0218213012400234},
    issn = {0218-2130},
    journal = {International Journal on Artificial Intelligence Tools (special issue on Bioinformatics and Bioengineering)},
    keywords = {Bioinformatics,Clustering,Data Mining,Phylogenetic Profiles},
    mendeley-tags = {Bioinformatics,Clustering,Data Mining,Phylogenetic Profiles},
    month = {oct},
    number = {5},
    pages = {1240023},
    publisher = {World Scientific Publishing Co.},
    title = {{Multi-Level Clustering of Phylogenetic Profiles}},
    url = {http://www.worldscientific.com/doi/abs/10.1142/S0218213012400234},
    volume = {21},
    year = {2012}
    }

2010

  • F. E. Psomopoulos and P. A. Mitkas, “Bioinformatics algorithm development for Grid environments,” Journal of systems and software, vol. 83, iss. 7, pp. 1249-1257, 2010. doi:10.1016/j.jss.2010.01.051
    [BibTeX] [Abstract] [Download PDF]

    A Grid environment can be viewed as a virtual computing architecture that provides the ability to perform higher throughput computing by taking advantage of many computers geographically dispersed and connected by a network. Bioinformatics applications stand to gain in such a distributed environment in terms of increased availability, reliability and efficiency of computational resources. There is already considerable research in progress toward applying parallel computing techniques on bioinformatics methods, such as multiple sequence alignment, gene expression analysis and phylogenetic studies. In order to cope with the dimensionality issue, most machine learning methods either focus on specific groups of proteins or reduce the size of the original data set and/or the number of attributes involved. Grid computing could potentially provide an alternative solution to this problem, by combining multiple approaches in a seamless way. In this paper we introduce a unifying methodology coupling the strengths of the Grid with the specific needs and constraints of the major bioinformatics approaches. We also present a tool that implements this process and allows researchers to assess the computational needs for a specific task and optimize the allocation of available resources for its efficient completion.

    @article{Psomopoulos2010a,
    abstract = {A Grid environment can be viewed as a virtual computing architecture that provides the ability to perform higher throughput computing by taking advantage of many computers geographically dispersed and connected by a network. Bioinformatics applications stand to gain in such a distributed environment in terms of increased availability, reliability and efficiency of computational resources. There is already considerable research in progress toward applying parallel computing techniques on bioinformatics methods, such as multiple sequence alignment, gene expression analysis and phylogenetic studies. In order to cope with the dimensionality issue, most machine learning methods either focus on specific groups of proteins or reduce the size of the original data set and/or the number of attributes involved. Grid computing could potentially provide an alternative solution to this problem, by combining multiple approaches in a seamless way. In this paper we introduce a unifying methodology coupling the strengths of the Grid with the specific needs and constraints of the major bioinformatics approaches. We also present a tool that implements this process and allows researchers to assess the computational needs for a specific task and optimize the allocation of available resources for its efficient completion.},
    author = {Psomopoulos, Fotis E and Mitkas, Pericles A},
    doi = {10.1016/j.jss.2010.01.051},
    journal = {Journal of Systems and Software},
    keywords = {Bioinformatics,Data Mining,Grid Computing,Workflow},
    mendeley-tags = {Bioinformatics,Data Mining,Grid Computing,Workflow},
    number = {7},
    pages = {1249--1257},
    publisher = {Elsevier Inc.},
    title = {{Bioinformatics algorithm development for Grid environments}},
    url = {http://linkinghub.elsevier.com/retrieve/pii/S0164121210000373},
    volume = {83},
    year = {2010}
    }

2009

  • F. E. Psomopoulos, P. A. Mitkas, C. S. Krinas, and I. N. Demetropoulos, “A grid-enabled algorithm yields figure-eight molecular knot,” Molecular simulation, vol. 35, iss. 9, pp. 725-736, 2009. doi:10.1080/08927020902833103
    [BibTeX] [Abstract] [Download PDF]

    The recently proposed general molecular knotting algorithm and its associated package, MolKnot, introduce programming into certain sections of stereochemistry. This work reports the G-MolKnot procedure that was deployed over the grid infrastructure; it applies a divide-and-conquer approach to the problem by splitting the initial search space into multiple independent processes and, combining the results at the end, yields significant improvements with regards to the overall efficiency. The algorithm successfully detected the smallest ever reported alkane configured to an open-knotted shape with four crossings.

    @article{gMolKnotMolSimul,
    abstract = {The recently proposed general molecular knotting algorithm and its associated package, MolKnot, introduce programming into certain sections of stereochemistry. This work reports the G-MolKnot procedure that was deployed over the grid infrastructure; it applies a divide-and-conquer approach to the problem by splitting the initial search space into multiple independent processes and, combining the results at the end, yields significant improvements with regards to the overall efficiency. The algorithm successfully detected the smallest ever reported alkane configured to an open-knotted shape with four crossings.},
    author = {Psomopoulos, Fotis E and Mitkas, Pericles A and Krinas, Christos S and Demetropoulos, Ioannis N},
    doi = {10.1080/08927020902833103},
    issn = {0892-7022},
    journal = {Molecular Simulation},
    keywords = {Data Mining,Grid Computing,Molecular Knots,Molecular Simulation},
    mendeley-tags = {Data Mining,Grid Computing,Molecular Knots,Molecular Simulation},
    month = {aug},
    number = {9},
    pages = {725--736},
    publisher = {Taylor {\&} Francis},
    title = {{A grid-enabled algorithm yields figure-eight molecular knot}},
    url = {http://www.tandfonline.com/doi/abs/10.1080/08927020902833103},
    volume = {35},
    year = {2009}
    }

Conferences

2016

  • A. Via, P. Fernandes, E. Korpelainen, A. M. Duarte, and F. E. Psomopoulos, “E-Infrastructures for the Bioinformatics long tail of science: a user perspective,” in Digital infrastructures for research 2016, Krakow, Poland: , 2016, pp. 1-2.
    [BibTeX]
    @incollection{DI4R2016,
    address = {Krakow, Poland},
    author = {Via, Allegra and Fernandes, Pedro and Korpelainen, Eija and Duarte, Afonso MS and Psomopoulos, Fotis E},
    booktitle = {Digital Infrastructures for Research 2016},
    pages = {1--2},
    title = {{E-Infrastructures for the Bioinformatics long tail of science: a user perspective}},
    year = {2016}
    }

  • E. Stergiadis, A. M. Kintsakis, F. E. Psomopoulos, and P. A. Mitkas, “A scalable Grid Computing framework for extensible phylogenetic profile construction,” in 5th mining humanistic data workshop (mhdw2016) in conjunction with the 12th international conference on artificial intelligence applications and innovations (aiai 2016), Thessaloniki, Greece: , 2016, pp. 1-8.
    [BibTeX]
    @incollection{12thAIAI,
    address = {Thessaloniki, Greece},
    author = {Stergiadis, Emmanouil and Kintsakis, Athanassios M and Psomopoulos, Fotis E and Mitkas, Pericles A},
    booktitle = {5th Mining Humanistic Data Workshop (MHDW2016) in conjunction with the 12th International Conference on Artificial Intelligence Applications and Innovations (AIAI 2016)},
    pages = {1--8},
    title = {{A scalable Grid Computing framework for extensible phylogenetic profile construction}},
    year = {2016}
    }

  • F. E. Psomopoulos, A. M. Kintsakis, and P. A. Mitkas, “A pan-genome approach and application to species with photosynthetic capabilities, [v1; not peer reviewed],” F1000research 2016, vol. 5(ECCB 2016, iss. 2132, p. 1, 2016. doi:10.7490/f1000research.1112964.1
    [BibTeX]
    @article{ECCB2016Pangenomes,
    author = {Psomopoulos, Fotis E and Kintsakis, Athanassios M and Mitkas, Pericles A},
    doi = {10.7490/f1000research.1112964.1},
    journal = {F1000Research 2016},
    number = {2132},
    pages = {1},
    title = {{A pan-genome approach and application to species with photosynthetic capabilities, [v1; not peer reviewed]}},
    volume = {5(ECCB 2016},
    year = {2016}
    }

  • F. E. Psomopoulos, E. Korpelainen, K. Mattila, and D. Scardaci, “Bioinformatics resources on EGI Federated Cloud, [v1; not peer reviewed],” F1000research 2016, vol. 5(ECCB 2016, iss. 2131, p. 1, 2016. doi:10.7490/f1000research.1112963.1
    [BibTeX]
    @article{ECCB2016FedCloud,
    author = {Psomopoulos, Fotis E and Korpelainen, Eija and Mattila, Kimmo and Scardaci, Diego},
    doi = {10.7490/f1000research.1112963.1},
    journal = {F1000Research 2016},
    number = {2131},
    pages = {1},
    title = {{Bioinformatics resources on EGI Federated Cloud, [v1; not peer reviewed]}},
    volume = {5(ECCB 2016},
    year = {2016}
    }

2015

  • A. Kintsakis, O. Vrousgou, A. Vardi, M. Karypidou, E. Stalika, E. Korabou, P. P Zerva, A. Anagnostopoulos, A. Chatzidimitriou, K. Stamatopoulos, and F. E. Psomopoulos, “Large-scale detection of sequencing errors using immunogenetic data of β chain T cell receptors (in Greek),” in 26th panhellenic haematology congress, Athens, Greece: , 2015, pp. 1-2.
    [BibTeX]
    @incollection{26thPHCErrorCorrection,
    address = {Athens, Greece},
    author = {Kintsakis, Athanassios and Vrousgou, Olga and Vardi, Anna and Karypidou, M and Stalika, Evangelia and Korabou, E and P Zerva, P and Anagnostopoulos, Achilles and Chatzidimitriou, Anstasia and Stamatopoulos, Kostas and Psomopoulos, Fotis E},
    booktitle = {26th Panhellenic Haematology Congress},
    pages = {1--2},
    title = {{Large-scale detection of sequencing errors using immunogenetic data of β chain T cell receptors (in Greek)}},
    year = {2015}
    }

  • S. Ntoufa, N. Papakonstantinou, M. Tsagiopoulou, T. Moisiadis, A. Malousi, D. Papazoglou, K. Pasentsis, F. Psomopoulos, E. Stalika, M. Laidou, N. Maglaveras, A. Anagnostopoulos, and K. Stamatopoulos, “Distinct methylation patterns in subgroups of patients with Chronic Lymphocytic Leukemia reveal TP63 as a new player in clinical aggressiveness ("Arkagathos Gouttas" Prize award),” in 26th panhellenic haematology congress, Athens, Greece: , 2015, pp. 1-2.
    [BibTeX]
    @incollection{26thPHCAward,
    address = {Athens, Greece},
    author = {Ntoufa, Stavroula and Papakonstantinou, Nikos and Tsagiopoulou, Maria and Moisiadis, Theodoros and Malousi, Antigoni and Papazoglou, Despoina and Pasentsis, Kostas and Psomopoulos, Fotis and Stalika, Evangelia and Laidou, Mata and Maglaveras, Nikolaos and Anagnostopoulos, Achilles and Stamatopoulos, Kostas},
    booktitle = {26th Panhellenic Haematology Congress},
    pages = {1--2},
    title = {{Distinct methylation patterns in subgroups of patients with Chronic Lymphocytic Leukemia reveal TP63 as a new player in clinical aggressiveness ("Arkagathos Gouttas" Prize award)}},
    year = {2015}
    }

  • P. Mokos, F. E. Psomopoulos, C. Ainali, A. Charalampidou, M. Hadzopoulou-Cladaras, and D. Dafou, “Large integrative analysis for the identification of novel molecular targets in hepatocellular carcinoma,” in 66th congress of the hellenic society of biochemistry and molecular biology, Eugenides Foundation, Athens, Greece: , 2015, pp. 1-4.
    [BibTeX]
    @incollection{EEBMB2015,
    address = {Eugenides Foundation, Athens, Greece},
    author = {Mokos, Panagiotis and Psomopoulos, Fotis E. and Ainali, Chrysanthi and Charalampidou, Alexandra and Hadzopoulou-Cladaras, Margarita and Dafou, Dimitra},
    booktitle = {66th Congress of the Hellenic Society of Biochemistry and Molecular Biology},
    pages = {1--4},
    title = {{Large integrative analysis for the identification of novel molecular targets in hepatocellular carcinoma}},
    year = {2015}
    }

  • A. Duarte, F. E. Psomopoulos, G. Sipos, N. Ferreira, and D. Scardaci, “EGI Support for Genome Analysis and Protein Folding – A Virtual Team for Life Sciences,” in International symposium on grids and clouds ( isgc ) 2015, 2015, p. 2.
    [BibTeX]
    @inproceedings{Duarte2015a,
    author = {Duarte, Afonso and Psomopoulos, Fotis E. and Sipos, Gergely and Ferreira, Nuno and Scardaci, Diego},
    booktitle = {International Symposium on Grids and Clouds ( ISGC ) 2015},
    pages = {2},
    title = {{EGI Support for Genome Analysis and Protein Folding - A Virtual Team for Life Sciences}},
    year = {2015}
    }

  • F. E. Psomopoulos, O. T. Vrousgou, and P. A. Mitkas, “Large-scale modular comparative genomics: the Grid approach [v1; not peer reviewed],” F1000research 2015, vol. 4(ISCB Com, iss. 377, p. 1, 2015. doi:10.7490/f1000research.1110127.1
    [BibTeX]
    @article{Psomopoulos2015,
    author = {Psomopoulos, Fotis E and Vrousgou, Olga T and Mitkas, Pericles A},
    doi = {10.7490/f1000research.1110127.1},
    journal = {F1000Research 2015},
    number = {377},
    pages = {1},
    title = {{Large-scale modular comparative genomics: the Grid approach [v1; not peer reviewed]}},
    volume = {4(ISCB Com},
    year = {2015}
    }

  • O. T. Vrousgou, F. E. Psomopoulos, and P. A. Mitkas, “A grid-enabled modular framework for efficient sequence analysis workflows,” in 16th international conference on engineering applications of neural networks, Island of Rhodes, 2015, p. 10.
    [BibTeX] [Abstract]

    In the era of Big Data in Life Sciences, efficient processing and analysis of vast amounts of sequence data is becoming an ever daunting challenge. Among such analyses, sequence alignment is one of the most commonly used procedures, as it provides useful insights on the functionality and relationship of the involved entities. Sequence alignment is one of the most common computational bottlenecks in several bioinformatics workflows. We have designed and implemented a time-efficient distributed modular application for sequence alignment, phylogenetic profiling and clustering of protein sequences, by utilizing the European Grid Infrastructure. The optimal utilization of the Grid with regards to the respective modules, allowed us to achieve significant speedups to the order of 1400{\%}.

    @inproceedings{Vrousgou2015,
    abstract = {In the era of Big Data in Life Sciences, efficient processing and analysis of vast amounts of sequence data is becoming an ever daunting challenge. Among such analyses, sequence alignment is one of the most commonly used procedures, as it provides useful insights on the functionality and relationship of the involved entities. Sequence alignment is one of the most common computational bottlenecks in several bioinformatics workflows. We have designed and implemented a time-efficient distributed modular application for sequence alignment, phylogenetic profiling and clustering of protein sequences, by utilizing the European Grid Infrastructure. The optimal utilization of the Grid with regards to the respective modules, allowed us to achieve significant speedups to the order of 1400{\%}.},
    address = {Island of Rhodes},
    author = {Vrousgou, Olga T and Psomopoulos, Fotis E and Mitkas, Pericles A},
    booktitle = {16th International Conference on Engineering Applications of Neural Networks},
    keywords = {bioinformatics,cessing,comparative genomics,grid computing,modular software engineering,parallel pro-,phylogenetic profiles,protein clustering,quence alignment,se-},
    pages = {10},
    title = {{A grid-enabled modular framework for efficient sequence analysis workflows}},
    year = {2015}
    }

2014

  • F. E. Psomopoulos and P. A. Mitkas, “Algebraic Interpretations Towards Clustering Protein Homology Data,” in Artificial intelligence applications and innovations, ifip advances in information and communication technology volume 437, , 2014. doi:10.1007/978-3-662-44722-2_15
    [BibTeX] [Download PDF]
    @incollection{Psomopoulos2014,
    author = {Psomopoulos, Fotis E and Mitkas, Pericles A},
    booktitle = {Artificial Intelligence Applications and Innovations, IFIP Advances in Information and Communication Technology Volume 437},
    doi = {10.1007/978-3-662-44722-2_15},
    title = {{Algebraic Interpretations Towards Clustering Protein Homology Data}},
    url = {http://link.springer.com/10.1007/978-3-662-44722-2{\_}15},
    year = {2014}
    }

  • F. E. Psomopoulos and C. A. Ouzounis, “Computation and visualization of ancestral pathway reconstruction and inference,” in Eccb 2014, 2014, p. 52854.
    [BibTeX]
    @inproceedings{Psomopoulos2014a,
    author = {Psomopoulos, Fotis E and Ouzounis, Christos A},
    booktitle = {ECCB 2014},
    pages = {52854},
    title = {{Computation and visualization of ancestral pathway reconstruction and inference}},
    volume = {8},
    year = {2014}
    }

  • F. Psomopoulos, E. Tsardoulias, A. Giokas, C. Zielinski, V. Prunet, I. Trochidis, D. Daney, M. Serrano, L. Courtes, S. Arampatzis, and P. A. Mitkas, “RAPP System Architecture,” in Workshop on assistance and service robotics in a human environment (asrob), in conjunction with the ieee/rsj international conference on intelligent robots and systems (iros’14), Chicago, Illinois, USA, 2014, p. 8.
    [BibTeX]
    @inproceedings{Psomopoulos2014b,
    address = {Chicago, Illinois, USA},
    author = {Psomopoulos, Fotis and Tsardoulias, Emmanouil and Giokas, Alexandros and Zielinski, Cezary and Prunet, Vincent and Trochidis, Ilias and Daney, David and Serrano, Manuel and Courtes, Ludovic and Arampatzis, Stratos and Mitkas, Pericles A},
    booktitle = {Workshop on Assistance and Service Robotics in a Human Environment (ASROB), in conjunction with the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS'14)},
    pages = {8},
    title = {{RAPP System Architecture}},
    year = {2014}
    }

2012

  • F. E. Psomopoulos, A. Argiriou, A. S. Tsaftaris, and C. A. Ouzounis, “A Pangenome Analysis of the Glucosinolate Pathway in Plants,” in 7th conference of the hellenic society for computational biology & bioinformatics – hscbb ’12, FORTH, Heraklion, Crete, Greece, 2012, p. 63.
    [BibTeX] [Abstract]

    Glucosinolates are active compounds that contain sulfur and nitrogen, found as secondary metabolites in certain plant taxa and especially in cruciferous vegetables, such as the Brassicales family. They remain inert unless the plant receives damage and they are brought into contact with the enzyme myrosinase, releasing glucose and highly reactive breakdown products such as isothiocyanates, nitriles, epithionitriles and thiocyanates. Several decades ago glucosinolate breakdown products were considered only as natural toxicants and used accordingly (e.g. as natural pesticides). However, the situation changed in 1992 after the identification and purification of sulforaphane (from glucoraphanin) exhibiting anticarcinogenic properties. Our goal is to determine the qualitative and quantitative characteristics of the glucosinolate biosynthesis pathway across the entire available plant pangenome.

    @inproceedings{HSCBB12Gluco,
    abstract = {Glucosinolates are active compounds that contain sulfur and nitrogen, found as secondary metabolites in certain plant taxa and especially in cruciferous vegetables, such as the Brassicales family. They remain inert unless the plant receives damage and they are brought into contact with the enzyme myrosinase, releasing glucose and highly reactive breakdown products such as isothiocyanates, nitriles, epithionitriles and thiocyanates. Several decades ago glucosinolate breakdown products were considered only as natural toxicants and used accordingly (e.g. as natural pesticides). However, the situation changed in 1992 after the identification and purification of sulforaphane (from glucoraphanin) exhibiting anticarcinogenic properties. Our goal is to determine the qualitative and quantitative characteristics of the glucosinolate biosynthesis pathway across the entire available plant pangenome.},
    address = {FORTH, Heraklion, Crete, Greece},
    author = {Psomopoulos, Fotis E and Argiriou, A and Tsaftaris, Athanasios S and Ouzounis, Christos A},
    booktitle = {7th conference of the Hellenic Society for Computational Biology {\&} Bioinformatics - HSCBB '12},
    keywords = {Glucosinolates,Pangenome,Phylogenetic Profiles,Plant},
    mendeley-tags = {Glucosinolates,Pangenome,Phylogenetic Profiles,Plant},
    pages = {63},
    publisher = {Hellenic Society for Computational Biology {\&} Bioinformatics},
    title = {{A Pangenome Analysis of the Glucosinolate Pathway in Plants}},
    year = {2012}
    }

  • F. E. Psomopoulos and C. A. Ouzounis, “Ancestral Reconstruction of Metabolic Pathway Content at the Paleome Level,” in 7th conference of the hellenic society for computational biology & bioinformatics – hscbb ’12, FORTH, Heraklion, Crete, Greece, 2012, p. 90.
    [BibTeX] [Abstract]

    In the post-genomic era, one key problem is the interpretation and understanding of the complex functional and evolutionary relationships between genes. The evolutionary history of metabolic pathways can provide significant insights into the organization of functionally related genes, and the way they interact with each other in biological space and time. There have been several approaches toward this goal in general and ancestral genome content reconstruction in particular, both at the gene content and, recently, at the pathway content level.

    @inproceedings{HSCBB12Ancestral,
    abstract = {In the post-genomic era, one key problem is the interpretation and understanding of the complex functional and evolutionary relationships between genes. The evolutionary history of metabolic pathways can provide significant insights into the organization of functionally related genes, and the way they interact with each other in biological space and time. There have been several approaches toward this goal in general and ancestral genome content reconstruction in particular, both at the gene content and, recently, at the pathway content level.},
    address = {FORTH, Heraklion, Crete, Greece},
    author = {Psomopoulos, Fotis E and Ouzounis, Christos A},
    booktitle = {7th conference of the Hellenic Society for Computational Biology {\&} Bioinformatics - HSCBB '12},
    keywords = {Ancestral,Pangenome,Pathway,Phylogenetic Profiles},
    mendeley-tags = {Ancestral,Pangenome,Pathway,Phylogenetic Profiles},
    pages = {90},
    publisher = {Hellenic Society for Computational Biology {\&} Bioinformatics},
    title = {{Ancestral Reconstruction of Metabolic Pathway Content at the Paleome Level}},
    year = {2012}
    }

  • D. M. Vitsios, F. E. Psomopoulos, P. A. Mitkas, and C. A. Ouzounis, “Multi-genome Core Pathway Identification through Gene Clustering,” in Lecture series ifip advances in information and communication technology (proceedings of the 1st workshop on algorithms for data and text mining in bioinformatics (wadtmb 2012) in conjunction with the 8th aiai), I. L. et. Al., Ed., Halkidiki, Greece: Springer New York, 2012, vol. 382, pp. 545-555. doi:10.1007/978-3-642-33412-2_56
    [BibTeX] [Abstract] [Download PDF]

    In the wake of gene-oriented data analysis in large-scale bioinformatics studies, focus in research is currently shifting towards the analysis of the functional association of genes, namely the metabolic pathways in which genes participate. The goal of this paper is to attempt to identify the core genes in a specific pathway, based on a user-defined selection of genomes. To this end, a novel methodology has been developed that uses data from the KEGG database, and through the application of the MCL clustering algorithm, identifies clusters that correspond to different “layers” of genes, either on a phylogenetic or a functional level. The algorithm’s complexity, evaluated experimentally, is presented and the results on a characteristic case study are discussed.

    @incollection{8thAIAI,
    abstract = {In the wake of gene-oriented data analysis in large-scale bioinformatics studies, focus in research is currently shifting towards the analysis of the functional association of genes, namely the metabolic pathways in which genes participate. The goal of this paper is to attempt to identify the core genes in a specific pathway, based on a user-defined selection of genomes. To this end, a novel methodology has been developed that uses data from the KEGG database, and through the application of the MCL clustering algorithm, identifies clusters that correspond to different “layers” of genes, either on a phylogenetic or a functional level. The algorithm's complexity, evaluated experimentally, is presented and the results on a characteristic case study are discussed.},
    address = {Halkidiki, Greece},
    author = {Vitsios, Dimitrios M. and Psomopoulos, Fotis E and Mitkas, Pericles A and Ouzounis, Christos A},
    booktitle = {Lecture Series IFIP Advances in Information and Communication Technology (proceedings of the 1st Workshop on Algorithms for Data and Text Mining in Bioinformatics (WADTMB 2012) in conjunction with the 8th AIAI)},
    doi = {10.1007/978-3-642-33412-2_56},
    editor = {et. Al., L. Iliadis},
    keywords = {Clustering,Evolution,Pathway,Phylogenetic Profiles},
    mendeley-tags = {Clustering,Evolution,Pathway,Phylogenetic Profiles},
    pages = {545--555},
    publisher = {Springer New York},
    title = {{Multi-genome Core Pathway Identification through Gene Clustering}},
    url = {http://link.springer.com/10.1007/978-3-642-33412-2{\_}56},
    volume = {382},
    year = {2012}
    }

  • D. Vitsios, F. E. Psomopoulos, P. A. Mitkas, and C. A. Ouzounis, “Detection of closely correlated genes across genomes using pathway data,” in 5th national student conference of electrical and computer engineering, Democritus University of Thrace, Xanthi, Greece, 2012, pp. 111-117.
    [BibTeX] [Abstract]

    In the wake of gene-oriented data analysis in large-scale bioinformatics studies, focus in research is currently shifting towards the analysis of the functional association of genes, namely the metabolic pathways in which genes participate. The goal of this paper is to attempt to identify the core genes in a specific pathway, based on a user-defined selection of genomes. To this end, a novel algorithm has been developed that uses data from the KEGG database, and through the application of the MCL clustering algorithm, identifies clusters that correspond to different “layers” of genes, either on a phylogenetic or a functional level. The algorithm’s complexity, evaluated experimentally, is presented and the results on two characteristic case studies are discussed.

    @inproceedings{SFHMMY5,
    abstract = {In the wake of gene-oriented data analysis in large-scale bioinformatics studies, focus in research is currently shifting towards the analysis of the functional association of genes, namely the metabolic pathways in which genes participate. The goal of this paper is to attempt to identify the core genes in a specific pathway, based on a user-defined selection of genomes. To this end, a novel algorithm has been developed that uses data from the KEGG database, and through the application of the MCL clustering algorithm, identifies clusters that correspond to different “layers” of genes, either on a phylogenetic or a functional level. The algorithm's complexity, evaluated experimentally, is presented and the results on two characteristic case studies are discussed.},
    address = {Democritus University of Thrace, Xanthi, Greece},
    author = {Vitsios, Dimitrios and Psomopoulos, Fotis E and Mitkas, Pericles A and Ouzounis, Christos A},
    booktitle = {5th National Student Conference of Electrical and Computer Engineering},
    keywords = {Algorithm,Clustering,Pathway,Phylogenetic Profiles},
    mendeley-tags = {Algorithm,Clustering,Pathway,Phylogenetic Profiles},
    pages = {111 -- 117},
    title = {{Detection of closely correlated genes across genomes using pathway data}},
    year = {2012}
    }

  • D. Vitsios, F. E. Psomopoulos, and C. A. Ouzounis, “Null Entries for BioPAX Pathways, Visualization and Pathway Phylogenetic Profiling Based on Paxtools,” in 7th conference of the hellenic society for computational biology & bioinformatics – hscbb ’12, FORTH, Heraklion, Crete, Greece, 2012, p. 92.
    [BibTeX] [Abstract]

    In computational biology, various data formats for a multitude of complex biological data types have arisen. Specifically, databases of metabolic pathways have been developed based on different internal data models. Thus, access and manipulation of those data has become critical to facilitate access to these resources. Adoption of common formats with regard to metabolic pathway representation along with the development of software tools supporting that common format has become imperative. One of the most widely accepted standards for biological pathway data exchange is BioPAX [1]. We cite the most important tools that utilize the BioPAX format and we also present a new tool that extends its functionality.

    @inproceedings{HSCBB12BioPax,
    abstract = {In computational biology, various data formats for a multitude of complex biological data types have arisen. Specifically, databases of metabolic pathways have been developed based on different internal data models. Thus, access and manipulation of those data has become critical to facilitate access to these resources. Adoption of common formats with regard to metabolic pathway representation along with the development of software tools supporting that common format has become imperative. One of the most widely accepted standards for biological pathway data exchange is BioPAX [1]. We cite the most important tools that utilize the BioPAX format and we also present a new tool that extends its functionality.},
    address = {FORTH, Heraklion, Crete, Greece},
    author = {Vitsios, Dimitrios and Psomopoulos, Fotis E and Ouzounis, Christos A},
    booktitle = {7th conference of the Hellenic Society for Computational Biology {\&} Bioinformatics - HSCBB '12},
    keywords = {BioPax,Pathway},
    mendeley-tags = {BioPax,Pathway},
    pages = {92},
    publisher = {Hellenic Society for Computational Biology {\&} Bioinformatics},
    title = {{Null Entries for BioPAX Pathways, Visualization and Pathway Phylogenetic Profiling Based on Paxtools}},
    year = {2012}
    }

2011

  • D. Vitsios, F. E. Psomopoulos, P. A. Mitkas, and C. A. Ouzounis, “Detecting species evolution through metabolic pathways,” in 6th conference of the hellenic society for computational biology and bioinformatics – hscbb ’11, University of Patras Conference Center, 2011, p. 16.
    [BibTeX] [Abstract]

    The emergence and evolution of metabolic pathways represented a crucial step in molecular and cellular evolution. With the current advances in genomics and proteomics, it has become imperative to explore the impact of gene evolution as reflected in the metabolic signature of each genome. To this end a methodology is presented, which applies a clustering algorithm to genes from different species participating in the same pathway.

    @inproceedings{HSCBB11,
    abstract = {The emergence and evolution of metabolic pathways represented a crucial step in molecular and cellular evolution. With the current advances in genomics and proteomics, it has become imperative to explore the impact of gene evolution as reflected in the metabolic signature of each genome. To this end a methodology is presented, which applies a clustering algorithm to genes from different species participating in the same pathway.},
    address = {University of Patras Conference Center},
    author = {Vitsios, Dimitrios and Psomopoulos, Fotis E and Mitkas, Pericles A and Ouzounis, Christos A},
    booktitle = {6th Conference of the Hellenic Society For Computational Biology and Bioinformatics - HSCBB '11},
    keywords = {Clustering,Evolution,Pathway,Phylogenetic Profiles},
    mendeley-tags = {Clustering,Evolution,Pathway,Phylogenetic Profiles},
    pages = {16},
    title = {{Detecting species evolution through metabolic pathways}},
    year = {2011}
    }

2010

  • K. C. Chatzidimitriou, F. E. Psomopoulos, and P. A. Mitkas, “Grid-enabled parameter initialization for high performance machine learning tasks,” in 5th egee user forum, Uppsala, Sweden, 2010, pp. 113-114.
    [BibTeX] [Abstract]

    In this work we use the NeuroEvolution of Augmented Topologies (NEAT) methodology, for optimising Echo State Networks (ESNs), in order to achieve high performance in machine learning tasks. The large parameter space of NEAT, the many variations of ESNs and the stochastic nature of evolutionary computation, requiring many evaluations for statistically valid conclusions, promotes the Grid as a viable solution for robustly evaluating the alternatives and deriving significant conclusions.

    @inproceedings{KyrchaPsomopoulosEGEEForum,
    abstract = {In this work we use the NeuroEvolution of Augmented Topologies (NEAT) methodology, for optimising Echo State Networks (ESNs), in order to achieve high performance in machine learning tasks. The large parameter space of NEAT, the many variations of ESNs and the stochastic nature of evolutionary computation, requiring many evaluations for statistically valid conclusions, promotes the Grid as a viable solution for robustly evaluating the alternatives and deriving significant conclusions.},
    address = {Uppsala, Sweden},
    author = {Chatzidimitriou, Kyriakos C and Psomopoulos, Fotis E and Mitkas, Pericles A},
    booktitle = {5th EGEE User Forum},
    keywords = {Echo State Networks,Grid Computing,Neural Networks},
    mendeley-tags = {Echo State Networks,Grid Computing,Neural Networks},
    pages = {113--114},
    title = {{Grid-enabled parameter initialization for high performance machine learning tasks}},
    year = {2010}
    }

  • F. E. Psomopoulos and P. A. Mitkas, “Multi Level Clustering of Phylogenetic Profiles,” in 10th ieee international conference on bioinformatics and bioingineering, Philadelphia, PA, USA, 2010, pp. 308-309. doi:10.1109/BIBE.2010.67
    [BibTeX] [Abstract] [Download PDF]

    The prediction of gene function from genome sequences is one of the main issues in Bioinformatics. Most computational approaches are based on the similarity between sequences to infer gene function. However, the availability of several fully sequenced genomes has enabled alternative approaches, such as phylogenetic profiles. Phylogenetic profiles are vectors which indicate the presence or absence of a gene in other genomes. The main concept of phylogenetic profiles is that proteins participating in a common structural complex or metabolic pathway are likely to evolve in a correlated fashion. In this paper, a multi level clustering algorithm of phylogenetic profiles is presented, which aims to detect inter- and intra-genome gene clusters.

    @inproceedings{Psomopoulos2010,
    abstract = {The prediction of gene function from genome sequences is one of the main issues in Bioinformatics. Most computational approaches are based on the similarity between sequences to infer gene function. However, the availability of several fully sequenced genomes has enabled alternative approaches, such as phylogenetic profiles. Phylogenetic profiles are vectors which indicate the presence or absence of a gene in other genomes. The main concept of phylogenetic profiles is that proteins participating in a common structural complex or metabolic pathway are likely to evolve in a correlated fashion. In this paper, a multi level clustering algorithm of phylogenetic profiles is presented, which aims to detect inter- and intra-genome gene clusters.},
    address = {Philadelphia, PA, USA},
    author = {Psomopoulos, Fotis E and Mitkas, Pericles A},
    booktitle = {10th IEEE International Conference on Bioinformatics and Bioingineering},
    doi = {10.1109/BIBE.2010.67},
    isbn = {9781424474943},
    keywords = {algorithm,clustering,phylogenetic profiles},
    mendeley-tags = {algorithm,clustering,phylogenetic profiles},
    pages = {308--309},
    publisher = {IEEE},
    title = {{Multi Level Clustering of Phylogenetic Profiles}},
    url = {http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=5521662},
    year = {2010}
    }

  • F. E. Psomopoulos, P. A. Mitkas, and C. A. Ouzounis, “Clustering of discrete and fuzzy phylogenetic profiles,” in 5th conference of the hellenic society for computational biology and bioinformatics – hscbb ’10, Alexandroupoli, Greece, 2010, p. 58.
    [BibTeX] [Abstract]

    Phylogenetic profiles have long been a focus of interest in computational genomics. Encoding the subset of organisms that contain a homolog of a gene or protein, phylogenetic profiles are originally defined as binary vectors of n entries, where n corresponds to the number of target genomes. It is widely accepted that similar profiles especially those not connected by sequence similarity correspond to a correlated pattern of functional linkage. To this end, our study presents two methods of phylogenetic profile data analysis, aiming at detecting genes with peculiar, unique characteristics.

    @inproceedings{HSCBB10,
    abstract = {Phylogenetic profiles have long been a focus of interest in computational genomics. Encoding the subset of organisms that contain a homolog of a gene or protein, phylogenetic profiles are originally defined as binary vectors of n entries, where n corresponds to the number of target genomes. It is widely accepted that similar profiles especially those not connected by sequence similarity correspond to a correlated pattern of functional linkage. To this end, our study presents two methods of phylogenetic profile data analysis, aiming at detecting genes with peculiar, unique characteristics.},
    address = {Alexandroupoli, Greece},
    author = {Psomopoulos, Fotis E and Mitkas, Pericles A and Ouzounis, Christos A},
    booktitle = {5th Conference of the Hellenic Society For Computational Biology and Bioinformatics - HSCBB '10},
    keywords = {Algorithm,Clustering,Phylogenetic Profiles},
    mendeley-tags = {Algorithm,Clustering,Phylogenetic Profiles},
    pages = {58},
    title = {{Clustering of discrete and fuzzy phylogenetic profiles}},
    year = {2010}
    }

  • F. A. Tzima, F. E. Psomopoulos, and P. A. Mitkas, “An investigation of the effect of clustering-based initialization on Learning Classifiers Systems’ effectiveness: leveraging the Grid infrastructure,” in 5th egee user forum, Uppsala, Sweden, 2010, pp. 111-112.
    [BibTeX] [Abstract]

    Strength-based Learning Classifier Systems (LCS) are machine learning systems designed to tackle both sequential and single-step decision tasks by coupling a gradually evolving population of rules with a reinforcement component. ZCS-DM, a Zeroth-level Classifier System for Data Mining, is a novel algorithm in this field, recently shown to be very effective in several benchmark classification problems. In this paper, we evaluate the effect of clustering-based initialization on the algorithm’s performance, utilizing the EGEE infrastructure as a robust framework for an efficient parameter sweep.

    @inproceedings{TzimaPsomopoulosEGEEForum,
    abstract = {Strength-based Learning Classifier Systems (LCS) are machine learning systems designed to tackle both sequential and single-step decision tasks by coupling a gradually evolving population of rules with a reinforcement component. ZCS-DM, a Zeroth-level Classifier System for Data Mining, is a novel algorithm in this field, recently shown to be very effective in several benchmark classification problems. In this paper, we evaluate the effect of clustering-based initialization on the algorithm's performance, utilizing the EGEE infrastructure as a robust framework for an efficient parameter sweep.},
    address = {Uppsala, Sweden},
    author = {Tzima, Fani A and Psomopoulos, Fotis E and Mitkas, Pericles A},
    booktitle = {5th EGEE User Forum},
    keywords = {Classifier Systems,Grid Computing,Parameter Exploration},
    mendeley-tags = {Classifier Systems,Grid Computing,Parameter Exploration},
    pages = {111--112},
    title = {{An investigation of the effect of clustering-based initialization on Learning Classifiers Systems' effectiveness: leveraging the Grid infrastructure}},
    year = {2010}
    }

2009

  • K. M. Karagiannis, F. E. Psomopoulos, and P. A. Mitkas, “Multi Level Clustering of Phylogenetic Profiles,” in 4th conference of the hellenic society for computational biology and bioinformatics – hscbb ’09, Athens, Greece, 2009, p. 1.
    [BibTeX] [Abstract]

    The prediction of gene function from genome sequences is one of the main issues in Bioinformatics. Most computational approaches are based on the similarity between sequences to infer gene function. However, the availability of several fully sequenced genomes has enabled alternative approaches, such as phylogenetic profiles. Phylogenetic profiles (pp) are vectors which indicate the presence or absence of a gene in other genomes. The main concept of pp’s is that proteins participating in a common structural complex or metabolic pathway are likely to evolve in a correlated fashion. In this paper, a multi level clustering algorithm of pp’s is presented, which aims to detect inter- and intra-genome gene clusters.

    @inproceedings{HSCBB09,
    abstract = {The prediction of gene function from genome sequences is one of the main issues in Bioinformatics. Most computational approaches are based on the similarity between sequences to infer gene function. However, the availability of several fully sequenced genomes has enabled alternative approaches, such as phylogenetic profiles. Phylogenetic profiles (pp) are vectors which indicate the presence or absence of a gene in other genomes. The main concept of pp's is that proteins participating in a common structural complex or metabolic pathway are likely to evolve in a correlated fashion. In this paper, a multi level clustering algorithm of pp's is presented, which aims to detect inter- and intra-genome gene clusters.},
    address = {Athens, Greece},
    author = {Karagiannis, Konstantinos M and Psomopoulos, Fotis E and Mitkas, Pericles A},
    booktitle = {4th Conference of the Hellenic Society For Computational Biology and Bioinformatics - HSCBB '09},
    keywords = {Clustering,Data Mining,Phylogenetic Profiles},
    mendeley-tags = {Clustering,Data Mining,Phylogenetic Profiles},
    pages = {1},
    publisher = {Hellenic Society for Computational Biology {\&} Bioinformatics},
    title = {{Multi Level Clustering of Phylogenetic Profiles}},
    year = {2009}
    }

  • F. E. Psomopoulos and P. A. Mitkas, “BADGE: Bioinformatics Algorithm Development for Grid Environments,” in 13th panhellenic conference on informatics, Corfu, Greece, 2009, pp. 93-107.
    [BibTeX] [Abstract]

    A Grid environment can be viewed as a virtual computing architecture that provides the ability to perform higher throughput computing by taking advantage of many computers geographically dispersed and connected by a network. Bioinformatics applications stand to gain in such a distributed environment in terms of availability, reliability and efficiency of computational resources. There is already considerable research in progress toward applying parallel computing techniques on bioinformatics methods, such as multiple sequence alignment, gene expression analysis and phylogenetic studies. In order to cope with the dimensionality issue, most machine learning methods focus on specific groups of proteins or reduce either the size of the original data set or the number of attributes involved. Grid computing could potentially provide an alternative solution to this problem, by combining multiple approaches in a seamless way. In this paper we introduce a unifying methodology coupling the strengths of the Grid with the specific needs and constraints of the major bioinformatics approaches. We also present a tool that implements this process and allows researchers to assess the computational needs for a specific task and optimize the allocation of available resources for its efficient completion.

    @inproceedings{FpsomPCI2009,
    abstract = {A Grid environment can be viewed as a virtual computing architecture that provides the ability to perform higher throughput computing by taking advantage of many computers geographically dispersed and connected by a network. Bioinformatics applications stand to gain in such a distributed environment in terms of availability, reliability and efficiency of computational resources. There is already considerable research in progress toward applying parallel computing techniques on bioinformatics methods, such as multiple sequence alignment, gene expression analysis and phylogenetic studies. In order to cope with the dimensionality issue, most machine learning methods focus on specific groups of proteins or reduce either the size of the original data set or the number of attributes involved. Grid computing could potentially provide an alternative solution to this problem, by combining multiple approaches in a seamless way. In this paper we introduce a unifying methodology coupling the strengths of the Grid with the specific needs and constraints of the major bioinformatics approaches. We also present a tool that implements this process and allows researchers to assess the computational needs for a specific task and optimize the allocation of available resources for its efficient completion.},
    address = {Corfu, Greece},
    author = {Psomopoulos, Fotis E and Mitkas, Pericles A},
    booktitle = {13th Panhellenic Conference on Informatics},
    keywords = {Algorithm,Bioinformatics,Data Mining,Grid Computing,Workflow},
    mendeley-tags = {Algorithm,Bioinformatics,Data Mining,Grid Computing,Workflow},
    pages = {93--107},
    title = {{BADGE: Bioinformatics Algorithm Development for Grid Environments}},
    year = {2009}
    }

  • P. A. Mitkas, A. Delopoulos, K. Vavliakis, C. Maramis, M. Falelakis, F. E. Psomopoulos, V. Koutkias, I. Lekka, A. Tantsis, A. Mikos, N. Maglaveras, and T. Agorastos, “Combining multiple sources towards cervical cancer prognosis – the ASSIST platform (in Greek),” in 8th scientific forum on "new trends in prognosis and treatment of cervical cancer", Thessaloniki, Greece, 2009, pp. 109-118.
    [BibTeX]
    @inproceedings{ASSISTForum2009,
    address = {Thessaloniki, Greece},
    author = {Mitkas, Pericles A and Delopoulos, Anastasios and Vavliakis, Kostantinos and Maramis, Christos and Falelakis, Manolis and Psomopoulos, Fotis E and Koutkias, Vasilis and Lekka, Irini and Tantsis, A and Mikos, A and Maglaveras, Nikolaos and Agorastos, T},
    booktitle = {8th Scientific Forum on "New trends in prognosis and treatment of cervical cancer"},
    pages = {109--118},
    isbn = {9601217975},
    title = {{Combining multiple sources towards cervical cancer prognosis – the ASSIST platform (in Greek)}},
    year = {2009}
    }

2008

  • T. Agorastos, P. A. Mitkas, M. Falelakis, F. E. Psomopoulos, A. N. Delopoulos, A. L. Symeonidis, S. Diplaris, C. Maramis, A. Batzios, I. Lekka, V. Koutkias, T. Mikos, A. Tatsis, and N. Maglaveras, “Large Scale Association Studies Using Unified Data for Cervical Cancer and beyond: The ASSIST Project,” in World cancer congress, Geneva, Switzerland, 2008, p. 1.
    [BibTeX] [Abstract]

    Despite the proved close connection of cervical cancer with the human papillomavirus (HPV), intensive ongoing research investigates the role of specific genetic and environmental factors in determining HPV persistence and subsequent progression of the disease. To this end, genetic association studies constitute a significant scientific approach that may lead to a more comprehensive insight on the origin of complex diseases. Nevertheless, association studies are most of the times inconclusive, since the datasets employed are small, usually incomplete and of poor quality. The main goal of ASSIST is to aid research in the field of cervical cancer providing larger high quality datasets, via a software system that virtually unifies multiple heterogeneous medical records, located in various sites. Furthermore, the system is being designed in a generic manner, with provision for future extensions to include other types of cancer or even different medical fields. Within the context of ASSIST, innovative techniques have been elaborated for the semantic modelling and fuzzy inferencing on medical knowledge aiming at meaningful data unification: (i) The ASSIST core ontology (being the first ontology ever modelling cervical cancer) permits semantically equivalent but differently coded data to be mapped to a common language. (ii) The ASSIST inference engine maps medical entities to syntactic values that are understood by legacy medical systems, supporting the processes of hypotheses testing and association studies, and at the same time calculating the severity index of each patient record. These modules constitute the ASSIST Core and are accompanied by two other important subsystems: (1) The Interfacing to Medical Archives subsystem maps the information contained in each legacy medical archive to corresponding entities as defined in the knowledge model of ASSIST. These patient data are generated by an advanced anonymisation tool also developed within the context of the project. (2) The User Interface enables transparent and advanced access to the data repositories incorporated in ASSIST by offering query expression as well as patient data and statistical results visualisation to the ASSIST end-users. We also have to point out that the system is easily extendable virtually to any medical domain, as the core ontology was designed with this in mind and all subsystems are ontology-aware i.e., adaptable to any ontology changes/additions. Using ASSIST, a medical researcher can have seamless access to medical records of participating sites and, through a particularly handy computing environment, collect data records satisfying his criteria. Moreover he can define cases and controls, select records adjusting their validity and use the most popular statistical tools for drawing conclusions. The logical unification of medical records of participating sites, including clinical and genetic data, to a common knowledge base is expected to increase the effectiveness of research in the field of cervical cancer as it permits the creation of on-demand study groups as well as the recycling of data used in previous studies.

    @inproceedings{WCCAssist,
    abstract = {Despite the proved close connection of cervical cancer with the human papillomavirus (HPV), intensive ongoing research investigates the role of specific genetic and environmental factors in determining HPV persistence and subsequent progression of the disease. To this end, genetic association studies constitute a significant scientific approach that may lead to a more comprehensive insight on the origin of complex diseases. Nevertheless, association studies are most of the times inconclusive, since the datasets employed are small, usually incomplete and of poor quality. The main goal of ASSIST is to aid research in the field of cervical cancer providing larger high quality datasets, via a software system that virtually unifies multiple heterogeneous medical records, located in various sites. Furthermore, the system is being designed in a generic manner, with provision for future extensions to include other types of cancer or even different medical fields. Within the context of ASSIST, innovative techniques have been elaborated for the semantic modelling and fuzzy inferencing on medical knowledge aiming at meaningful data unification: (i) The ASSIST core ontology (being the first ontology ever modelling cervical cancer) permits semantically equivalent but differently coded data to be mapped to a common language. (ii) The ASSIST inference engine maps medical entities to syntactic values that are understood by legacy medical systems, supporting the processes of hypotheses testing and association studies, and at the same time calculating the severity index of each patient record. These modules constitute the ASSIST Core and are accompanied by two other important subsystems: (1) The Interfacing to Medical Archives subsystem maps the information contained in each legacy medical archive to corresponding entities as defined in the knowledge model of ASSIST. These patient data are generated by an advanced anonymisation tool also developed within the context of the project. (2) The User Interface enables transparent and advanced access to the data repositories incorporated in ASSIST by offering query expression as well as patient data and statistical results visualisation to the ASSIST end-users. We also have to point out that the system is easily extendable virtually to any medical domain, as the core ontology was designed with this in mind and all subsystems are ontology-aware i.e., adaptable to any ontology changes/additions. Using ASSIST, a medical researcher can have seamless access to medical records of participating sites and, through a particularly handy computing environment, collect data records satisfying his criteria. Moreover he can define cases and controls, select records adjusting their validity and use the most popular statistical tools for drawing conclusions. The logical unification of medical records of participating sites, including clinical and genetic data, to a common knowledge base is expected to increase the effectiveness of research in the field of cervical cancer as it permits the creation of on-demand study groups as well as the recycling of data used in previous studies.},
    address = {Geneva, Switzerland},
    author = {Agorastos, Theodoros and Mitkas, Pericles A and Falelakis, Manolis and Psomopoulos, Fotis E and Delopoulos, Anastasios N and Symeonidis, Andreas L and Diplaris, Sotiris and Maramis, Christos and Batzios, Alexandros and Lekka, Irini and Koutkias, Vasilis and Mikos, T and Tatsis, A and Maglaveras, Nikolaos},
    booktitle = {World Cancer Congress},
    keywords = {ASSIST Project,Biomedical Engineering,Cervical Cancer,Ontologies},
    mendeley-tags = {ASSIST Project,Biomedical Engineering,Cervical Cancer,Ontologies},
    pages = {1},
    title = {{Large Scale Association Studies Using Unified Data for Cervical Cancer and beyond: The ASSIST Project}},
    year = {2008}
    }

  • C. N. Gkekas, F. E. Psomopoulos, and P. A. Mitkas, “A Parallel Data Mining Methodology for Protein Function Prediction Utilizing Finite State Automata,” in 2nd electrical and computer engineering student conference, Athens, Greece, 2008, p. 6.
    [BibTeX] [Abstract]

    One of the most important challenges in modern bioinformatics is the accurate prediction of the functional behaviour of proteins. The strong correlation that exists between the properties of a protein and its motif sequence makes such a prediction possible. In this paper a novel parallel methodology for protein function prediction will be presented. Data mining techniques are employed in order to construct a model for each Gene Ontology term, based on data generated from already annotated protein sequences. In order to predict the annotation of an unknown protein, its motif sequence is run through each GO term model, producing similarity scores for every term. Although it has been experimentally proven that this process is efficient, it unfortunately requires heavy processor resources. In order to address this issue, a parallel application has been implemented and tested using the EGEE Grid infrastructure.

    @inproceedings{GkekasPsomopoulosSFHMMY,
    abstract = {One of the most important challenges in modern bioinformatics is the accurate prediction of the functional behaviour of proteins. The strong correlation that exists between the properties of a protein and its motif sequence makes such a prediction possible. In this paper a novel parallel methodology for protein function prediction will be presented. Data mining techniques are employed in order to construct a model for each Gene Ontology term, based on data generated from already annotated protein sequences. In order to predict the annotation of an unknown protein, its motif sequence is run through each GO term model, producing similarity scores for every term. Although it has been experimentally proven that this process is efficient, it unfortunately requires heavy processor resources. In order to address this issue, a parallel application has been implemented and tested using the EGEE Grid infrastructure.},
    address = {Athens, Greece},
    author = {Gkekas, Christos N and Psomopoulos, Fotis E and Mitkas, Pericles A},
    booktitle = {2nd Electrical and Computer Engineering Student Conference},
    keywords = {Data Mining,Gene Ontology,Grid Computing,Parallel Computing,Protein Classification},
    mendeley-tags = {Data Mining,Gene Ontology,Grid Computing,Parallel Computing,Protein Classification},
    pages = {6},
    title = {{A Parallel Data Mining Methodology for Protein Function Prediction Utilizing Finite State Automata}},
    year = {2008}
    }

  • C. N. Gkekas, F. E. Psomopoulos, and P. A. Mitkas, “Exploiting parallel data mining processing for protein annotation,” in Student eureka 2008: 2nd panhellenic scientific student conference, Samos, Greece, 2008, pp. 242-252.
    [BibTeX] [Abstract]

    Proteins are large organic compounds consisting of amino acids arranged in a linear chain and joined together by peptide bonds. One of the most important challenges in modern Bioinformatics is the accurate prediction of the functional behavior of proteins. In this paper a novel parallel methodology for automatic protein function annotation is presented. Data mining techniques are employed in order to construct models based on data generated from already annotated protein sequences. The first step of the methodology is to obtain the motifs present in these sequences, which are then provided as input to the data mining algorithms in order to create a model for every term. Experiments conducted using the EGEE Grid environment as a source of multiple CPUs clearly indicate that the methodology is highly efficient and accurate, as the utilization of many processors substantially reduces the execution time.

    @inproceedings{CkekasFpsomSamos,
    abstract = {Proteins are large organic compounds consisting of amino acids arranged in a linear chain and joined together by peptide bonds. One of the most important challenges in modern Bioinformatics is the accurate prediction of the functional behavior of proteins. In this paper a novel parallel methodology for automatic protein function annotation is presented. Data mining techniques are employed in order to construct models based on data generated from already annotated protein sequences. The first step of the methodology is to obtain the motifs present in these sequences, which are then provided as input to the data mining algorithms in order to create a model for every term. Experiments conducted using the EGEE Grid environment as a source of multiple CPUs clearly indicate that the methodology is highly efficient and accurate, as the utilization of many processors substantially reduces the execution time.},
    address = {Samos, Greece},
    author = {Gkekas, Christos N and Psomopoulos, Fotis E and Mitkas, Pericles A},
    booktitle = {Student EUREKA 2008: 2nd Panhellenic Scientific Student Conference},
    keywords = {Data Mining,Gene Ontology,Grid Computing,Parallel Computing},
    mendeley-tags = {Data Mining,Gene Ontology,Grid Computing,Parallel Computing},
    pages = {242--252},
    title = {{Exploiting parallel data mining processing for protein annotation}},
    year = {2008}
    }

  • C. N. Gkekas, F. E. Psomopoulos, and P. A. Mitkas, “A Parallel Data Mining Application for Gene Ontology Term Prediction,” in 3rd egee user forum, Clermont-Ferrand, France, 2008, p. 1.
    [BibTeX] [Abstract]

    Protein classification is one of the most commonly discussed problems in bioinformatics. One of the latest tools for protein function annotation is the Gene Ontology (GO) project which provides a controlled vocabulary to describe gene and gene product attributes in organisms. Although there are several cases of automated annotation, the bulk of the annotation process is performed by human curators. We present a parrallel algorithm for GO term prediction, deployed over the EGEE grid environment. Gene ontology can be thought of as a database of expert-based terms. The application presented utilizes the motifs that exist in already annotated protein sequences in order to model the corresponding GO terms. The input data set is created in a semi-automatic way, using the unique (UNIPROT) code of each protein and the InterProScan tool so that all available sequence databases (such as PRODOM, PFAM etc) will be taken under consideration. For each GO term that appears in the original protein set, a new training set is created, which contains all the protein sequences that have been annotated with the specific GO term. Based on the motifs present in the new data sets, a finite state automaton model is created for each GO term. In order to predict the annotation of an unknown protein, its motif sequence is run through each GO model thus producing similarity scores for every term. Results have shown that the algorithm is both efficient and accurate in predicting the correct GO term. The methodology has been implemented so that it can be used both as a standalone or as a grid-based application. The algorithm however is by design an embarassingly parallel one allowing for multiple models to be trained simultaneously, thus making the Grid the ideal environment for execution. In fact, it has been shown experimentally that the time to process the entire dataset on a single processor is prohibitively long. In an MPI-enabled application the utilization of the clusters available over the Grid provides a significant reduction of the processing time. The Grid also enables the seamless integration of the training process with the actual model evaluation, by allowing the concurrent retraining of GO models from different input sources or experts and the use of the existing ones. The initial dataset is stored and replicated as a single compressed file on multiple storage elements (SEs). The application was executed on available clusters using from 4 to 32 processors in different experiment configurations. In all cases a significant speedup was observed, ranging from 300 to over 500. Overall, the utilization of the Grid as the application platform has provided both a reduction in processing time and a seamless environment for running simultaneously different experiments.

    @inproceedings{GkekasPsomopoulosEGEEForum,
    abstract = {Protein classification is one of the most commonly discussed problems in bioinformatics. One of the latest tools for protein function annotation is the Gene Ontology (GO) project which provides a controlled vocabulary to describe gene and gene product attributes in organisms. Although there are several cases of automated annotation, the bulk of the annotation process is performed by human curators. We present a parrallel algorithm for GO term prediction, deployed over the EGEE grid environment. Gene ontology can be thought of as a database of expert-based terms. The application presented utilizes the motifs that exist in already annotated protein sequences in order to model the corresponding GO terms. The input data set is created in a semi-automatic way, using the unique (UNIPROT) code of each protein and the InterProScan tool so that all available sequence databases (such as PRODOM, PFAM etc) will be taken under consideration. For each GO term that appears in the original protein set, a new training set is created, which contains all the protein sequences that have been annotated with the specific GO term. Based on the motifs present in the new data sets, a finite state automaton model is created for each GO term. In order to predict the annotation of an unknown protein, its motif sequence is run through each GO model thus producing similarity scores for every term. Results have shown that the algorithm is both efficient and accurate in predicting the correct GO term. The methodology has been implemented so that it can be used both as a standalone or as a grid-based application. The algorithm however is by design an embarassingly parallel one allowing for multiple models to be trained simultaneously, thus making the Grid the ideal environment for execution. In fact, it has been shown experimentally that the time to process the entire dataset on a single processor is prohibitively long. In an MPI-enabled application the utilization of the clusters available over the Grid provides a significant reduction of the processing time. The Grid also enables the seamless integration of the training process with the actual model evaluation, by allowing the concurrent retraining of GO models from different input sources or experts and the use of the existing ones. The initial dataset is stored and replicated as a single compressed file on multiple storage elements (SEs). The application was executed on available clusters using from 4 to 32 processors in different experiment configurations. In all cases a significant speedup was observed, ranging from 300 to over 500. Overall, the utilization of the Grid as the application platform has provided both a reduction in processing time and a seamless environment for running simultaneously different experiments.},
    address = {Clermont-Ferrand, France},
    author = {Gkekas, Christos N and Psomopoulos, Fotis E and Mitkas, Pericles A},
    booktitle = {3rd EGEE User Forum},
    keywords = {Gene Ontology,Grid Computing,Parallel Computing,Protein Classification},
    mendeley-tags = {Gene Ontology,Grid Computing,Parallel Computing,Protein Classification},
    pages = {1},
    title = {{A Parallel Data Mining Application for Gene Ontology Term Prediction}},
    year = {2008}
    }

  • P. A. Mitkas, C. Maramis, A. N. Delopoulos, A. L. Symeonidis, S. Diplaris, M. Falelakis, F. E. Psomopoulos, A. Batzios, N. Maglaveras, I. Lekka, V. Koutkias, T. Agorastos, T. Mikos, and A. Tatsis, “ASSIST: Employing Inference and Semantic Technologies to Facilitate Association Studies on Cervical Cancer,” in 6th european symposium on biomedical engineering, Chania, Greece, 2008.
    [BibTeX] [Abstract]

    Advances in biomedical engineering have lately facilitated medical data acquisition, leading to increased availability of both genetic and phenotypic patient. Particularly, in the area of cervical cancer intensive research investigates the role of specific genetic and environmental factors in determining the persistence of the HPV virus – which is the primary causal factor of cervical cancer – and the subsequent progression of the disease. To this direction, genetic association studies constitute a widely used scientific approach for medical research. However, despite the increased data availability worldwide, individual studies are often inconclusive due to the physical and conceptual isolation of the medical centers that limit the pool of data actually available to each researcher. ASSIST, an EU-funded research project, aims at facilitating medical research on cervical cancer by tackling these data isolation issues. To accomplish that, it virtually unifies multiple patient record repositories, physically located at different sites and subsequently employs inferencing techniques on the unified medical knowledge to enable the execution of cervical cancer related association studies that comprise both genotypic and phenotypic study factors, allowing medical researchers to perform more complex and reliable association studies on larger, high-quality datasets.

    @inproceedings{EsbmeAssist,
    abstract = {Advances in biomedical engineering have lately facilitated medical data acquisition, leading to increased availability of both genetic and phenotypic patient. Particularly, in the area of cervical cancer intensive research investigates the role of specific genetic and environmental factors in determining the persistence of the HPV virus – which is the primary causal factor of cervical cancer – and the subsequent progression of the disease. To this direction, genetic association studies constitute a widely used scientific approach for medical research. However, despite the increased data availability worldwide, individual studies are often inconclusive due to the physical and conceptual isolation of the medical centers that limit the pool of data actually available to each researcher. ASSIST, an EU-funded research project, aims at facilitating medical research on cervical cancer by tackling these data isolation issues. To accomplish that, it virtually unifies multiple patient record repositories, physically located at different sites and subsequently employs inferencing techniques on the unified medical knowledge to enable the execution of cervical cancer related association studies that comprise both genotypic and phenotypic study factors, allowing medical researchers to perform more complex and reliable association studies on larger, high-quality datasets.},
    address = {Chania, Greece},
    author = {Mitkas, Pericles A and Maramis, Christos and Delopoulos, Anastasios N and Symeonidis, Andreas L and Diplaris, Sotiris and Falelakis, Manolis and Psomopoulos, Fotis E and Batzios, Alexandros and Maglaveras, Nikolaos and Lekka, Irini and Koutkias, Vasilis and Agorastos, Theodoros and Mikos, T and Tatsis, A},
    booktitle = {6th European Symposium on Biomedical Engineering},
    keywords = {ASSIST Project,Biomedical Engineering,Cervical Cancer,Ontologies},
    mendeley-tags = {ASSIST Project,Biomedical Engineering,Cervical Cancer,Ontologies},
    title = {{ASSIST: Employing Inference and Semantic Technologies to Facilitate Association Studies on Cervical Cancer}},
    year = {2008}
    }

  • I. K. Mprouza, F. E. Psomopoulos, and P. A. Mitkas, “AMoS: Agent-based Molecular Simulations,” in Student eureka 2008: 2nd panhellenic scientific student conference, Samos, Greece, 2008, pp. 175-186.
    [BibTeX] [Abstract]

    Molecular Dynamics (MD) is a form of computer simulation wherein atoms and molecules are allowed to interact for a period of time, utilizing theories from mathematics, physics and chemistry. At the core of any MD simulation lies the potential function (or force field), which describes the interactions between the particles of the simulation. In this paper a new framework for MD simulations is presented, which utilizes software agents. Every agent in our multi-agent system corresponds to a single particle and probes its environment for candidate agent-particles with which an interaction is possible. The framework is applied on protein structural data (PDB files) using an implicit solvent environment and a time step of 5 femtoseconds. Although the system is fully parameterized, the experiments were based on a specific force field and set of parameters, known as ENCAD.

    @inproceedings{MprouzaFpsomSamos,
    abstract = {Molecular Dynamics (MD) is a form of computer simulation wherein atoms and molecules are allowed to interact for a period of time, utilizing theories from mathematics, physics and chemistry. At the core of any MD simulation lies the potential function (or force field), which describes the interactions between the particles of the simulation. In this paper a new framework for MD simulations is presented, which utilizes software agents. Every agent in our multi-agent system corresponds to a single particle and probes its environment for candidate agent-particles with which an interaction is possible. The framework is applied on protein structural data (PDB files) using an implicit solvent environment and a time step of 5 femtoseconds. Although the system is fully parameterized, the experiments were based on a specific force field and set of parameters, known as ENCAD.},
    address = {Samos, Greece},
    author = {Mprouza, Ioanna K and Psomopoulos, Fotis E and Mitkas, Pericles A},
    booktitle = {Student EUREKA 2008: 2nd Panhellenic Scientific Student Conference},
    mendeley-tags = {Data Mining,Molecular Dynamics,Software Agents},
    pages = {175--186},
    title = {{AMoS: Agent-based Molecular Simulations}},
    year = {2008}
    }

  • F. E. Psomopoulos and P. A. Mitkas, “Sizing Up: Bioinformatics in a Grid Context,” in 3rd conference of the hellenic society for computational biology and bioinformatics – hscbb ’08, Thessaloniki, Greece, 2008, p. 1.
    [BibTeX] [Abstract]

    A Grid environment can be viewed as a virtual computing architecture that provides the ability to perform higher throughput computing by taking advantage of many computers geographically distributed and connected by a network. Bioinformatics applications stand to gain in such environment both in regards of computational resources available, but in reliability and efficiency as well. There are several approaches in literature which present the use of Grid resources in bioinformatics. Nevertheless, scientific progress is hindered by the fact that each researcher operates in relative isolation, regarding datasets and effort, since there is no universally accepted methodology for performing bioinformatics tasks in a Grid. Given the complexity of both the data and the algorithms involved in the majority of cases, a case study on protein classification utilizing the Grid infrastructure, may be the first step in presenting a unifying methodology for bioinformatics in a Grid context.

    @inproceedings{HSCBB08,
    abstract = {A Grid environment can be viewed as a virtual computing architecture that provides the ability to perform higher throughput computing by taking advantage of many computers geographically distributed and connected by a network. Bioinformatics applications stand to gain in such environment both in regards of computational resources available, but in reliability and efficiency as well. There are several approaches in literature which present the use of Grid resources in bioinformatics. Nevertheless, scientific progress is hindered by the fact that each researcher operates in relative isolation, regarding datasets and effort, since there is no universally accepted methodology for performing bioinformatics tasks in a Grid. Given the complexity of both the data and the algorithms involved in the majority of cases, a case study on protein classification utilizing the Grid infrastructure, may be the first step in presenting a unifying methodology for bioinformatics in a Grid context.},
    address = {Thessaloniki, Greece},
    author = {Psomopoulos, Fotis E and Mitkas, Pericles A},
    booktitle = {3rd Conference of the Hellenic Society For Computational Biology and Bioinformatics - HSCBB '08},
    keywords = {Data Mining,Grid Computing,Protein Classification},
    mendeley-tags = {Data Mining,Grid Computing,Protein Classification},
    pages = {1},
    publisher = {Hellenic Society for Computational Biology {\&} Bioinformatics},
    title = {{Sizing Up: Bioinformatics in a Grid Context}},
    year = {2008}
    }

  • F. E. Psomopoulos, P. A. Mitkas, C. S. Krinas, and I. N. Demetropoulos, “G-MolKnot: A grid enabled systematic algorithm to produce open molecular knots,” in 1st hellasgrid user forum, Athens, Greece, 2008, pp. 24-25.
    [BibTeX]
    @inproceedings{PsomopoulosDemetropoulosHellasGrid,
    address = {Athens, Greece},
    author = {Psomopoulos, Fotis E and Mitkas, Pericles A and Krinas, Christos S and Demetropoulos, Ioannis N},
    booktitle = {1st HellasGrid User Forum},
    keywords = {Algorithm,Grid Computing,Molecular Knots},
    mendeley-tags = {Algorithm,Grid Computing,Molecular Knots},
    pages = {24--25},
    title = {{G-MolKnot: A grid enabled systematic algorithm to produce open molecular knots}},
    year = {2008}
    }

2007

  • C. N. Gkekas, F. E. Psomopoulos, and P. A. Mitkas, “Modeling Gene Ontology Terms using Finite State Automata,” in Hellenic bioinformatics and medical informatics meeting, Biomedical Research Foundation, Academy of Athens, Greece, 2007, p. 1.
    [BibTeX]
    @inproceedings{GkekasPsomopoulosBioacademy,
    address = {Biomedical Research Foundation, Academy of Athens, Greece},
    author = {Gkekas, Christos N and Psomopoulos, Fotis E and Mitkas, Pericles A},
    booktitle = {Hellenic Bioinformatics and Medical Informatics Meeting},
    keywords = {Finite State Automata,Gene Ontology,Protein Classification},
    mendeley-tags = {Finite State Automata,Gene Ontology,Protein Classification},
    pages = {1},
    title = {{Modeling Gene Ontology Terms using Finite State Automata}},
    year = {2007}
    }

  • P. A. Mitkas, A. N. Delopoulos, A. L. Symeonidis, and F. E. Psomopoulos, “A Framework for Semantic Data Integration and Inferencing on Cervical Cancer,” in Hellenic bioinformatics and medical informatics meeting, Biomedical Research Foundation, Academy of Athens, Greece, 2007, p. 1.
    [BibTeX]
    @inproceedings{ASSISTBioacademy,
    address = {Biomedical Research Foundation, Academy of Athens, Greece},
    author = {Mitkas, Pericles A and Delopoulos, Anastasios N and Symeonidis, Andreas L and Psomopoulos, Fotis E},
    booktitle = {Hellenic Bioinformatics and Medical Informatics Meeting},
    keywords = {ASSIST Project,Cervical Cancer,Ontologies},
    mendeley-tags = {ASSIST Project,Cervical Cancer,Ontologies},
    pages = {1},
    title = {{A Framework for Semantic Data Integration and Inferencing on Cervical Cancer}},
    year = {2007}
    }

  • I. K. Mprouza, F. E. Psomopoulos, and P. A. Mitkas, “Simulating molecular dynamics through intelligent software agents,” in Hellenic bioinformatics and medical informatics meeting, Biomedical Research Foundation, Academy of Athens, Greece, 2007, p. 1.
    [BibTeX]
    @inproceedings{MprouzaPsomopoulosBioacademy,
    address = {Biomedical Research Foundation, Academy of Athens, Greece},
    author = {Mprouza, Ioanna K and Psomopoulos, Fotis E and Mitkas, Pericles A},
    booktitle = {Hellenic Bioinformatics and Medical Informatics Meeting},
    keywords = {Molecular Dynamics,Molecular Simulation,Software Agents},
    mendeley-tags = {Molecular Dynamics,Molecular Simulation,Software Agents},
    pages = {1},
    title = {{Simulating molecular dynamics through intelligent software agents}},
    year = {2007}
    }

2006

  • H. E. Polychroniadou, F. E. Psomopoulos, and P. A. Mitkas, “G-Class: A Divide and Conquer Application for Grid Protein Classification,” in Proceedings of the 2nd admkd 2006: workshop on data mining and knowledge discovery (in conjunction with adbis 2006: the 10th east-european conference on advances in databases and information systems), Thessaloniki, Greece, 2006, pp. 121-132.
    [BibTeX] [Abstract]

    Protein classification has always been one of the major challenges in modern functional proteomics. The presence of motifs in protein chains can make the prediction of the functional behavior of proteins possible. The correlation between protein properties and their motifs is not always obvious, since more than one motif may exist within a protein chain. Due to the complexity of this correlation most data mining algorithms are either non efficient or time consuming. In this paper a data mining methodology that utilizes grid technologies is presented. First, data are split into multiple sets while preserving the original data distribution in each set. Then, multiple models are created by using the data sets as independent training sets. Finally, the models are combined to produce the final classification rules, containing all the previously extracted information. The methodology is tested using various protein and protein class subsets. Results indicate the improved time efficiency of our technique compared to other known data mining algorithms.

    @inproceedings{PolychroniadouPsomopoulosGClass,
    abstract = {Protein classification has always been one of the major challenges in modern functional proteomics. The presence of motifs in protein chains can make the prediction of the functional behavior of proteins possible. The correlation between protein properties and their motifs is not always obvious, since more than one motif may exist within a protein chain. Due to the complexity of this correlation most data mining algorithms are either non efficient or time consuming. In this paper a data mining methodology that utilizes grid technologies is presented. First, data are split into multiple sets while preserving the original data distribution in each set. Then, multiple models are created by using the data sets as independent training sets. Finally, the models are combined to produce the final classification rules, containing all the previously extracted information. The methodology is tested using various protein and protein class subsets. Results indicate the improved time efficiency of our technique compared to other known data mining algorithms.},
    address = {Thessaloniki, Greece},
    author = {Polychroniadou, Helen E and Psomopoulos, Fotis E and Mitkas, Pericles A},
    booktitle = {Proceedings of the 2nd ADMKD 2006: Workshop on Data Mining and Knowledge Discovery (in conjunction with ADBIS 2006: The 10th East-European Conference on Advances in Databases and Information Systems)},
    keywords = {Algorithm,Grid Computing,Protein Classification},
    mendeley-tags = {Algorithm,Grid Computing,Protein Classification},
    pages = {121--132},
    publisher = {Springer-Verlag},
    title = {{G-Class: A Divide and Conquer Application for Grid Protein Classification}},
    year = {2006}
    }

  • F. E. Psomopoulos and P. A. Mitkas, “PROTEAS: A Finite State Automata based data mining algorithm for rule extraction in protein classification,” in Proceedings of the 5th hellenic data management symposium, Thessaloniki, Greece, 2006, pp. 118-126.
    [BibTeX] [Abstract]

    An important challenge in modern functional proteomics is the prediction of the functional behavior of proteins. Motifs in protein chains can make such a prediction possible. The correlation between protein properties and their motifs is not always obvious, since more than one motifs may exist within a protein chain. Thus, the behavior of a protein is a function of many motifs, where some overpower others. In this paper a data mining approach for a motif-based classification of proteins is presented. A new classification algorithm that induces rules and exploits finite state automata is introduced. First, data are modeled by terms of prefix tree acceptors, which are later merged into finite state automata. Finally, a new algorithm is proposed, for the induction of protein classification rules from finite state automata. The data mining model is trained and tested using various protein and protein class subsets, as well as the whole dataset of known proteins and protein classes. Results indicate the efficiency of our technique compared to other known data mining algorithms.

    @inproceedings{PsomopoulosHDMS,
    abstract = {An important challenge in modern functional proteomics is the prediction of the functional behavior of proteins. Motifs in protein chains can make such a prediction possible. The correlation between protein properties and their motifs is not always obvious, since more than one motifs may exist within a protein chain. Thus, the behavior of a protein is a function of many motifs, where some overpower others. In this paper a data mining approach for a motif-based classification of proteins is presented. A new classification algorithm that induces rules and exploits finite state automata is introduced. First, data are modeled by terms of prefix tree acceptors, which are later merged into finite state automata. Finally, a new algorithm is proposed, for the induction of protein classification rules from finite state automata. The data mining model is trained and tested using various protein and protein class subsets, as well as the whole dataset of known proteins and protein classes. Results indicate the efficiency of our technique compared to other known data mining algorithms.},
    address = {Thessaloniki, Greece},
    author = {Psomopoulos, Fotis E and Mitkas, Pericles A},
    booktitle = {Proceedings of the 5th Hellenic Data Management Symposium},
    keywords = {Data Mining,Finite State Automata,Protein Classification,Rule Extraction},
    mendeley-tags = {Data Mining,Finite State Automata,Protein Classification,Rule Extraction},
    pages = {118--126},
    title = {{PROTEAS: A Finite State Automata based data mining algorithm for rule extraction in protein classification}},
    year = {2006}
    }

2005

  • F. E. Psomopoulos and P. A. Mitkas, “A protein classification engine based on stochastic finite state automata,” in Lecture series on computer and computational sciences vsp/brill (proceedings of the symposium 35: computational methods in molecular biology in conjunction with iccmse), Loutraki, Greece, 2005, pp. 1371-1374.
    [BibTeX] [Abstract]

    Accurate protein classification is one of the major challenges in modern bioinformatics. Motifs that exist in the protein chain can make such a classification possible. A plethora of algorithms to address this problem have been proposed by both the artificial intelligence and the pattern recognition communities. In this paper, a data mining methodology for classification rules induction in proposed. Initially, expert – based protein families are processed to create a new hybrid set of families. Then, a prefix tree acceptor is created from the motifs in the protein chains, and subsequently transformed into a stochastic finite state automaton using the ALERGIA algorithm. Finally, an algorithm is presented for the extraction of classification rules from the automaton.

    @inproceedings{LoutrakiPsomopoulos,
    abstract = {Accurate protein classification is one of the major challenges in modern bioinformatics. Motifs that exist in the protein chain can make such a classification possible. A plethora of algorithms to address this problem have been proposed by both the artificial intelligence and the pattern recognition communities. In this paper, a data mining methodology for classification rules induction in proposed. Initially, expert – based protein families are processed to create a new hybrid set of families. Then, a prefix tree acceptor is created from the motifs in the protein chains, and subsequently transformed into a stochastic finite state automaton using the ALERGIA algorithm. Finally, an algorithm is presented for the extraction of classification rules from the automaton.},
    address = {Loutraki, Greece},
    author = {Psomopoulos, Fotis E and Mitkas, Pericles A},
    booktitle = {Lecture Series on Computer and Computational Sciences VSP/Brill (Proceedings of the Symposium 35: Computational Methods in Molecular Biology in conjunction with ICCMSE)},
    keywords = {Algorithm,Finite State Automata,Protein Classification,Rule Extraction},
    mendeley-tags = {Algorithm,Finite State Automata,Protein Classification,Rule Extraction},
    pages = {1371--1374},
    publisher = {Springer-Verlag},
    title = {{A protein classification engine based on stochastic finite state automata}},
    volume = {4B},
    year = {2005}
    }

2004

  • F. E. Psomopoulos, S. Diplaris, and P. A. Mitkas, “A finite state automata based technique for protein classification rules induction,” in Proceedings of the second european workshop on data mining and text mining in bioinformatics (in conjunction with ecml/pkdd), Pisa, Italy, 2004, pp. 54-60.
    [BibTeX] [Abstract]

    An important challenge in modern functional proteomics is the prediction of the functional behavior of proteins. Motifs in protein chains can make such a prediction possible. The correlation between protein properties and their motifs is not always obvious, since more than one motifs can exist within a protein chain. Thus, the behavior of a protein is a function of many motifs, where some overpower others. In this paper a data-mining approach for motif-based classification of proteins is presented. A new classification rules inducing algorithm that exploits finite state automata is introduced. First, data are modeled by terms of prefix tree acceptors, which are later merged into finite state automata. Finally, we propose a new algorithm for the induction of protein classification rules from finite state automata. The data-mining model is trained and tested using various protein and protein class subsets, as well as the whole dataset of known proteins and protein classes. Results indicate the efficiency of our technique compared to other known data-mining algorithms.

    @inproceedings{PsomopoulosDiplarisPisa,
    abstract = {An important challenge in modern functional proteomics is the prediction of the functional behavior of proteins. Motifs in protein chains can make such a prediction possible. The correlation between protein properties and their motifs is not always obvious, since more than one motifs can exist within a protein chain. Thus, the behavior of a protein is a function of many motifs, where some overpower others. In this paper a data-mining approach for motif-based classification of proteins is presented. A new classification rules inducing algorithm that exploits finite state automata is introduced. First, data are modeled by terms of prefix tree acceptors, which are later merged into finite state automata. Finally, we propose a new algorithm for the induction of protein classification rules from finite state automata. The data-mining model is trained and tested using various protein and protein class subsets, as well as the whole dataset of known proteins and protein classes. Results indicate the efficiency of our technique compared to other known data-mining algorithms.},
    address = {Pisa, Italy},
    author = {Psomopoulos, Fotis E and Diplaris, Sotiris and Mitkas, Pericles A},
    booktitle = {Proceedings of the Second European Workshop on Data Mining and Text Mining in Bioinformatics (in conjunction with ECML/PKDD)},
    keywords = {Algorithm,Finite State Automata,Protein Classification},
    mendeley-tags = {Algorithm,Finite State Automata,Protein Classification},
    pages = {54--60},
    title = {{A finite state automata based technique for protein classification rules induction}},
    year = {2004}
    }

Other publications

2015

  • F. Psomopoulos and R. Jimenez, “EGI-ELIXIR project: Integrating datasets for bioinformatics,” , iss. February, pp. 5-6, 2015.
    [BibTeX]
    @article{Psomopoulos2015a,
    author = {Psomopoulos, Fotis and Jimenez, Rafael},
    number = {February},
    pages = {5--6},
    title = {{EGI-ELIXIR project: Integrating datasets for bioinformatics}},
    year = {2015}
    }

2014

  • A. Duarte and F. Psomopoulos, “Future opportunities and trends for e-infrastructures and life sciences,” Egi inspire newsletter \#14, iss. January, p. 1, 2014.
    [BibTeX]
    @article{Duarte2014,
    author = {Duarte, Afonso and Psomopoulos, Fotis},
    journal = {EGI Inspire Newsletter {\#}14},
    number = {January},
    pages = {1},
    title = {{Future opportunities and trends for e-infrastructures and life sciences}},
    year = {2014}
    }

2010

  • F. E. Psomopoulos, “Parallel Data Mining and Analysis Algorithms in a Grid environment and applications in Bioinformatics,” PhD Thesis PhD Thesis, 2010.
    [BibTeX] [Abstract]

    Although Bioinformatics and Computational Biology are often confused as the same multidisciplinary field, there exist several differences that distinguish them. Bioinformatics focuses on analysis and processing of biological data and consequently the promotion of research in algorithms and technical level of both the methods and theory to solve formal problems of data management. On the other hand, Computational Biology aims to solve specific problems in Biology, utilizing the potential of computers for testing and evaluating hypothesis. Nevertheless the two fields share several areas of convergence. Proteomics is one of these areas and is also the focus of extensive ongoing research. Proteomics is essentially the large-scale study of proteins, ranging from the identification and analysis of their structure to the prediction of their functionality and the construction of metabolic pathways. In recent years there has been a shift in research interest in Bioinformatics from genomics to proteomics, which is widely considered as the next step in the study of biological systems. While the genome of an organization remains fairly constant in different cells of the same organization, the proteome of a species is highly differentiated from cell to cell. In previous years, genomics and proteomics could only focus on one gene or protein at a time. However, the technological advancements in Life Sciences has led to an exponentially growing amount of data. For this reason there has been a shift in research, from hypothesis-driven to data-driven studies. As the demand for automated analysis of large and distributed data grows, new challenges emerge both regarding the modeling and the development of algorithms for high throughput data analysis. This thesis presents a general methodology for Bioinformatics Algorithm Development in Grid Environments (BADGE) aiming at precisely these challenges. Grid Computing can be viewed as a virtual computing architecture that provides the ability to perform higher throughput processing by taking advantage of many computers geographically dispersed and connected by a network. Bioinformatics applications can benefit greatly from the increased availability, reliability and efficiency of computational resources, in such a distributed environment. There is already considerable research in progress toward applying parallel computing techniques on bioinformatics methods, such as multiple sequence alignment, gene expression analysis and phylogenetic studies. In order to cope with the dimensionality issue, most machine learning methods focus on specific groups of proteins or reduce either the size of the original data set or the number of attributes involved. Grid Computing can potentially provide an alternative solution to this problem, by combining multiple approaches in a seamless way. The BADGE methodology presented in this thesis, couples the strengths of the Grid with the specific needs and constraints of proven bioinformatics approaches. In order to evaluate the BADGE methodology, we applied it on several existing algorithms and on a series of new algorithms, which were developed to address issues both in Bioinformatics and in other research areas as well. The bioinformatics algorithms we designed mainly focus on proteomics and aim to provide solution to the problems of protein classification, prediction of protein function, and abnormal gene detection. Finally, the methodology was also used in the development of an algorithm in Computational Chemistry, addressing the problem of identifying molecular knots in three dimensional space. In every case, it was shown, both theoretically and experimentally, that the new approach presents clear advantages over conventional approaches in terms of time performance and robustness.

    @phdthesis{FpsomPhDThesis,
    abstract = {Although Bioinformatics and Computational Biology are often confused as the same multidisciplinary field, there exist several differences that distinguish them. Bioinformatics focuses on analysis and processing of biological data and consequently the promotion of research in algorithms and technical level of both the methods and theory to solve formal problems of data management. On the other hand, Computational Biology aims to solve specific problems in Biology, utilizing the potential of computers for testing and evaluating hypothesis. Nevertheless the two fields share several areas of convergence. Proteomics is one of these areas and is also the focus of extensive ongoing research. Proteomics is essentially the large-scale study of proteins, ranging from the identification and analysis of their structure to the prediction of their functionality and the construction of metabolic pathways. In recent years there has been a shift in research interest in Bioinformatics from genomics to proteomics, which is widely considered as the next step in the study of biological systems. While the genome of an organization remains fairly constant in different cells of the same organization, the proteome of a species is highly differentiated from cell to cell. In previous years, genomics and proteomics could only focus on one gene or protein at a time. However, the technological advancements in Life Sciences has led to an exponentially growing amount of data. For this reason there has been a shift in research, from hypothesis-driven to data-driven studies. As the demand for automated analysis of large and distributed data grows, new challenges emerge both regarding the modeling and the development of algorithms for high throughput data analysis. This thesis presents a general methodology for Bioinformatics Algorithm Development in Grid Environments (BADGE) aiming at precisely these challenges. Grid Computing can be viewed as a virtual computing architecture that provides the ability to perform higher throughput processing by taking advantage of many computers geographically dispersed and connected by a network. Bioinformatics applications can benefit greatly from the increased availability, reliability and efficiency of computational resources, in such a distributed environment. There is already considerable research in progress toward applying parallel computing techniques on bioinformatics methods, such as multiple sequence alignment, gene expression analysis and phylogenetic studies. In order to cope with the dimensionality issue, most machine learning methods focus on specific groups of proteins or reduce either the size of the original data set or the number of attributes involved. Grid Computing can potentially provide an alternative solution to this problem, by combining multiple approaches in a seamless way. The BADGE methodology presented in this thesis, couples the strengths of the Grid with the specific needs and constraints of proven bioinformatics approaches. In order to evaluate the BADGE methodology, we applied it on several existing algorithms and on a series of new algorithms, which were developed to address issues both in Bioinformatics and in other research areas as well. The bioinformatics algorithms we designed mainly focus on proteomics and aim to provide solution to the problems of protein classification, prediction of protein function, and abnormal gene detection. Finally, the methodology was also used in the development of an algorithm in Computational Chemistry, addressing the problem of identifying molecular knots in three dimensional space. In every case, it was shown, both theoretically and experimentally, that the new approach presents clear advantages over conventional approaches in terms of time performance and robustness.},
    annote = {184},
    author = {Psomopoulos, Fotis E},
    keywords = {Bioinformatics,Data Mining,Grid Computing},
    mendeley-tags = {Bioinformatics,Data Mining,Grid Computing},
    pages = {184},
    school = {Aristotle University of Thessaloniki, Greece},
    title = {{Parallel Data Mining and Analysis Algorithms in a Grid environment and applications in Bioinformatics}},
    type = {PhD Thesis},
    month = jun,
    year = {2010}
    }

2008

  • F. E. Psomopoulos, C. Gkekas, and P. A. Mitkas, “Data Mining in Bioinformatics using Grid Computing (in Greek),” Auth grid team, aristotle university of thessaloniki, iss. January, p. 1, 2008.
    [BibTeX]
    @article{Psomopoulos2008,
    author = {Psomopoulos, Fotis E and Gkekas, Christos and Mitkas, Pericles A},
    journal = {AUTH Grid Team, Aristotle University of Thessaloniki},
    number = {January},
    pages = {1},
    title = {{Data Mining in Bioinformatics using Grid Computing (in Greek)}},
    year = {2008}
    }

2004

  • F. E. Psomopoulos, “A finite state automata algorithm for the extraction of association rules and applications in protein data classification (Greek text only),” , p. 108, 2004.
    [BibTeX] [Abstract]

    An important challenge in modern functional proteomics is the prediction of the functional behavior of proteins. Motifs in protein chains can make such a prediction possible. The correlation between protein properties and their motifs is not always obvious, since more than one motifs can exist within a protein chain. Thus, the behavior of a protein is a function of many motifs, where some overpower others. In this paper a data-mining approach for motif-based classification of proteins is presented. A new classification rules inducing algorithm that exploits finite state automata is introduced. First, data are modeled by terms of prefix tree acceptors, which are later merged into finite state automata. Finally, we propose a new algorithm for the induction of protein classification rules from finite state automata. The data-mining model is trained and tested using various protein and protein class subsets, as well as the whole dataset of known proteins and protein classes. Results indicate the efficiency of our technique compared to other known data-mining algorithms.

    @masterthesis{FpsomMasterThesis,
    abstract = {An important challenge in modern functional proteomics is the prediction of the functional behavior of proteins. Motifs in protein chains can make such a prediction possible. The correlation between protein properties and their motifs is not always obvious, since more than one motifs can exist within a protein chain. Thus, the behavior of a protein is a function of many motifs, where some overpower others. In this paper a data-mining approach for motif-based classification of proteins is presented. A new classification rules inducing algorithm that exploits finite state automata is introduced. First, data are modeled by terms of prefix tree acceptors, which are later merged into finite state automata. Finally, we propose a new algorithm for the induction of protein classification rules from finite state automata. The data-mining model is trained and tested using various protein and protein class subsets, as well as the whole dataset of known proteins and protein classes. Results indicate the efficiency of our technique compared to other known data-mining algorithms.},
    annote = {108},
    author = {Psomopoulos, Fotis E},
    keywords = {Data Mining,Finite State Automata,Protein Classification},
    mendeley-tags = {Data Mining,Finite State Automata,Protein Classification},
    pages = {108},
    school = {Aristotle University of Thessaloniki, Greece},
    title = {{A finite state automata algorithm for the extraction of association rules and applications in protein data classification (Greek text only)}},
    type = {Master Thesis},
    month = jul,
    year = {2004}
    }