Research Areas

I. Software Engineering Areas of Research

Research in this area concentrates on open issues in the context of predicting the overall costs to develop or enhance software systems based on incomplete, imprecise, uncertain and/or noisy input. The two main approaches investigated are the quantitative and the qualitative approach. The quantitative approach examines the use of various forms of Intelligent Systems, Artificial Neural Networks (ANN), Probabilistic Systems, Fuzzy Decision Trees (FDT), Fuzzy Inference Systems (FIS) and Hybrid Systems (ANN combined with Genetic Algorithms (GA); Ridge Regression combined with GA; Conditional Sets combined with GA; Classification and Regression Trees combined with FIS) to model and forecast software development effort. To attempt this, four different datasets have been utilised, namely COCOMO, Kemerer, Albrecht and Desharnais datasets. Each dataset includes historical data on a number of software projects (such as, lines of code, function points, effort). Relevant research activities have also included the investigation of ISBSG dataseries, which included developing models for forecasting development effort and isolating the factors that have the highest descriptive power on the predicted variable (effort). The ISBSG dataset, obtained from the International Software Benchmarking Standards Group, contains an analysis of software project costs belonging to a broad cross-section of industry and coming from various countries. The projects also range in size, effort, platform, language and development technique data. In addition, the dataset contains a vast record of software projects and measures of a large number of attributes. Other Statistical (Regression), Conditional Sets, Categorical and Regression Decision Trees, Genetic Programming, along with various Clustering and Classification Algorithms are used to understand and quantify the effect of project attributes and estimate effort by an analogy notion. The projects that are classified in the same chance nodes and are proven to comply with the same regression equations, association rules and genetically evolved ranges, are then used to identify any present correlations between effort and cost attributes. Research in this area also utilises Input Sensitivity Analysis (ISA), Attribute Ranking and Feature Subset Selection (FSS) algorithms to export the optimum subset of features and establish accurate cost estimations for a particular technique. Relevant research employed Conformal Predictors (CP) to produce reliable confidence measures and define certain ranges of effort that suggest improved project effort estimations. The qualitative approach identifies the critical cost factors and attempts to model cost estimation using Fuzzy Cognitive Maps (FCM) by representing the factors that affect cost as nodes with certain interrelationships. Once such model is finalised, various scenarios can be simulated to test the validity of the model and its efficiency in providing indications for cost estimation as well as to provide support in the context of project management and decision support.

Software reliability is one of the most significant quality features according to the ISO9126 software quality standard. This research aims at investigating the nature and structure of a set of dataseries known as the “Musa datasets” for software reliability. Non-parametric methods like R/S analysis and its variations were employed to test for the presence of long-term dependence in the datasets. So far, results have shown that these software reliability dataseries follow a pink noise structure with randomness being the dominant characteristic. Artificial Neural Networks (ANNs) have been employed to investigate the level of forecasting ability of these datasets and compared results against the previous R/S findings that strongly supported the random structure. Future work will focus on collecting new data of modern software systems (i.e., operating systems, e-mail servers, web browsers) and testing the new datasets with R/S analysis to investigate consistencies in the structure of software reliability. Finally, hybrid systems comprising Artificial Neural Networks (ANNs) and Genetic Algorithms (GA) will be employed for prediction purposes and the new results will be contrasted with those obtained for Musa’s dataseries so as to detect whether modern approaches for producing software affect, and to what extent, their reliability.

Software components management involves the handling of components in a repository and their extraction based on ideas borrowed from the field of Computational Intelligence. The clustering of components uses techniques to group components in heterogeneous sets aiming for both the efficient storing and retrieval of the most suitable components from a repository. One such technique employs GAs where a binary representation of components in a repository is used to compute an optimal component categorisation based on initially randomised classifiers along with a threshold of bit similarity which guides the assignment of a component to a class. Users are then able to select a component by providing their preference which is matched against the optimal classifiers produced by the GA and subsequently retrieves the components assigned to the best class and displays them for the user to select. Another technique uses a hybrid combination of entropy and fuzzy k-modes clustering. With this approach, components in a repository are preliminarily grouped based on an entropy-based clustering algorithm as a pre-processing step in order to discover which components are the most representative of the repository as well as how many clusters are inherent in the repository. Thereafter, a fuzzy k-modes algorithm is employed to perform the actual clustering of the components using the outputs of entropy-based clustering. With the addition of fuzziness, the k-modes algorithm allows components to belong to several clusters with a degree of participation. As a result, it deals with the uncertainty residing in a component repository and also provides flexibility as the repository grows in size. Users give their preference to be matched against the final cluster centres (or representatives), which are computed through the hybrid algorithm. The nearest cluster is selected and the most suitable components assigned to that nearest cluster are presented for the user to choose from. Both approaches were thoroughly tested and validated and the results prove that the utilisation of Computational Intelligence methods greatly assists in the classification and retrieval of software components and component management as a whole.

In this research area, two different approaches have been followed. The first involves a new elicitation methodology for enhancing the traditional requirements engineering process. The methodology is based on Human, Social and Organizational (HSO) factors that exist in the business environment of the client organisation and affect the functional and non-functional part of a software system under development, as well as its future users. Such factors include, for example, the working procedures of potential users, the working habits and customs of users, their workload, the communication and cooperation between users within their working environment, their psychology and temperament, the organisation and visibility of their everyday working activities, the level to which the product will promote employees’ productivity and content, legal and ethical issues posed. The methodology proposes a set of activities for uncovering HSO factors, assessing their effect on known system requirements and recording new requirements resulting from these factors. The second approach constitutes a new and complete requirements engineering process which is based on Natural Language Syntax and Semantics (NLSS). Although at a preliminary stage, this newly proposed process focuses on the way requirements are elicited, analysed and recorded. Basic elements of the syntax and semantics of a sentence (e.g., verbs, nouns, adjectives, roles, etc.), guide elicitation activities so as to ask specific, predetermined questions and gather the relevant functional and constraints information. This information is then written in dedicated syntactic forms of requirements classes. The resulted requirements are thus more complete, while they are written in a semi-formal natural language style, with less ambiguity and vagueness, minimising the time-consuming nature of engineering requirements from huge documents. Once requirements are expressed as semi-formal statements of the proposed type, a dedicated CASE tool, specially developed to support the NLSS process, reads the statements and automatically produces semi-formal diagrammatic notations, such as Data Flow Diagrams and Class Diagrams.

In the current age of information overload, it is becoming increasingly harder to find relevant content. This problem is not only widespread but also alarming. Over the last 10-15 years recommender systems technologies have been introduced to help people deal with these vasts amount of information and they have been widely used in research as well as e-commerce applications. The main aim of a Recommender System is to provide accurate recommendations back to the user that he/she may not thought about or it may was difficult to find. Recommender Systems are divided into two main categories: Content Based Systems (CB) and Collaborative Filtering Systems (CF). Content based systems are providing recommendations to the active user based on text similarities that exist in different articles. This method is limited to media files where text or tags can be assign. Collaborative Filtering systems are divided into memory-based and model-based algorithms. Memory based algorithms help the active user to get recommendations based on similarities between his/her neighbors (users with similar characteristics) or between items he/she previously bought or seen. Model based algorithms such as matrix factorization or latent factor models try to build a profile in the user-item matrix and then to predict and provide accurate recommendations back to the user. Recommender systems face many challenges and limitations (sparsity, cold-start problem etc) so to deal with these problems hybrid techniques have been introduced that combine CB and CF techniques. Also other algorithms and techniques have been produced to deal with the limitations of RS in order to provide accurate recommendations back to the user.

The fact that Cloud Computing is steadily becoming one of the most significant fields of Information and Communication Technology (ICT) has led many organizations to consider the benefits of migrating their business operations to the Cloud. Decision makers are facing strong challenges when assessing the feasibility of the adoption of Cloud Computing for their organizations. Cloud adoption is a multi-level decision which is influenced by a number of intertwined factors and concerns thus characterizing it as a complex and difficult to model real-world problem. The extremely fast moving nature of the cloud computing environment changes makes particularly difficult the decision making process. Any model or framework aims to support cloud computing adoption should be quite flexible and dynamically adaptable. Guided by these assumptions/prerequisites we tried to approach the problem by using adapted computational intelligence techniques which have shown promising results indicating strong ability to capture the dynamics of complex environments. A brief description of our related work on this area follows. The decision for adopting the cloud environment was addressed first using an approach based on Fuzzy Cognitive Maps (FCM), which models the parameters that potentially influence such a decision. The construction and analysis of the map is based on factors reported in the relevant literature and the utilization of experts’ opinion. The proposed approach is evaluated through four real-world experimental cases and the suggestions of the model are compared with the customers’ final decisions. The evaluation indicated that the proposed approach is capable of capturing the dynamics behind the interdependencies of the participating factors. Further research on the same topic suggested the use of Influence Diagrams (ID) as modeling tools aiming to support the decision process. The developed ID model combines a number of factors which were identified through literature review and input received from field experts. The proposed approach is validated against four experimental cases, two realistic and two real-world, and its performance proved to be highly capable of estimating and predicting correctly the right decision. Continuing in the same direction, we proposed two decision support modeling approaches based on ID aiming to model the answer to the question “Adopt Cloud Services or Not?” Two models are developed and tested; the first is a generic ID with nodes interacting in a probabilistic manner, while the second is a more flexible version that utilizes Fuzzy Logic. Both models combine several factors that influence the decision to be taken, which were identified through literature review and input received from field experts. The proposed approaches are validated using five experimental scenarios, two synthetic and three real-world cases, and their performance suggests that they are highly capable of supporting the right decision. Our most recent work proposes a multi-layer FCM approach which models a number of factors which play a decisive role to the cloud adoption issue and offers the means to study their influence. The factors are organized in different layers which focus on specific aspects of the cloud environment, something which, on one hand, enables tracking the causes for the decision outcome, and on the other offers the ability to study the dependencies between the leading determinants of the decision. The construction and analysis of the model is based on factors reported in the relevant literature and the utilization of experts’ opinion. The efficacy and applicability of the proposed approach are demonstrated through four real-world experimental cases.

Research carried out in the area of software project management focuses on two main topics. The first topic involves solving the problem of human resource allocation and task scheduling in software development projects using computational intelligence. Traditional approaches for automated decision support or tools for project scheduling and staffing found in literature often make heavy assumptions so as to lower the complexity of the whole process. Examples of such assumptions may be that all software developers have the same skills and/or level of experience, that developers are expected to deliver with the same productivity regardless of the working environment and their teammates, that members of a team either possess a skill or not (no intermediate case where a skill may be possessed at different levels between employees), that tasks are worked on in the same way, and so on. The work carried out in this topic includes the use of evolutionary algorithms in order to minimize project duration and cost based on developer productivity, type of task interdependence and communication overhead. The second topic concerns project staffing using personality types aiming to improve various features of software development, such as software quality, job satisfaction, team performance, and social cohesion and conflict. Attributes of personality in the development process are often partially or totally neglected leading to significantly inaccurate estimates in terms of time, cost and quality. Studies focus on investigating personality issues of software professionals to help assess the effectiveness of a team as a whole in order to form teams that will not have communication and cooperation problems because of personality type mismatches. Additionally, personality types are used for associating developers with tasks, so as to ensure that the right type of personality is selected for to undertake a task. Teams formed in these ways will be able to execute tasks and activities with the maximum possible productivity, at the same time shortening development schedules and lowering effort.

Research here addresses several open issues in the context of software testing and in particular deals with static and dynamic program analysis. Problems encountered in this area involve the creation and presentation of control flow graphs, code initialisation sequences, the support of large and complex programs in a variety of programming languages, the extraction of program paths, and the identification of variables’ scopes and method identification. Another equally important issue raised by programmers is the presentation of program analysis results in a friendly graphical user interface with interactive capabilities. So far, a novel multilayered architecture has been proposed that attempts to offer specific solutions to the aforementioned problems by providing a set of embedded cooperating software modules for program analysis and by focusing on practical, static and dynamic analysis tools for programmers. Two types of analysis are supported: runtime and non-runtime; each type comprises modules that collaborate and provide an interactive user-programmer GUI displaying the results in a relatively short execution time, the latter being proportional to the size of the program (in lines of code). Based on the proposed architecture, the issue of automatically producing software test cases using intelligent optimisation algorithms, such as Genetic Algorithms (GAs) has been investigated. More specifically, a new evolutionary algorithm has been proposed, which is able to automatically produce a set of test data for a given program according to a specified criterion (e.g., statement, edge, condition/edge). The performance of the algorithm is measured over a pool of sample programs and benchmarks and may be characterised as highly successful. In addition, program slicing is currently being studied, as well as the automatic creation of source code dependence graphs and the integration of these into the main architecture. Finally, techniques of program slicing and testing using symbolic execution and model-based checking (e.g., using JML) are also under examination.

Mobile Commerce (M-Commerce) is an evolving area of e-Commerce, where users can interact with the service providers through a mobile and wireless network, using mobile devices for information retrieval and transaction processing. M-Commerce services and applications can be adopted through different wireless and mobile networks, with the aid of several mobile devices. However, constraints inherent in both mobile networks and devices influence their operational performance; therefore, there is a strong need for taking into consideration those constraints in the design and development phases of m-Commerce services and applications in order to improve their quality. Another important factor in designing quality m-Commerce services and applications is the identification of mobile users’ requirements. Furthermore, m-Commerce services and applications need to be classified based on the functionality they provide to the mobile users. This kind of classification results in two major classes: directory and transaction-oriented services and applications. This research builds upon and extends the lab’s previous work on designing and developing m-Commerce services and applications. This approach takes account of the mobile users’ needs and requirements, the classification of the m-Commerce services and applications, as well as the current technologies for mobile and wireless computing and their constraints. In this context the different characteristics and capabilities of modern mobile devices (e.g., smart phones, PDAs, etc.) have been studied to form the level to which such devices affect the quality that mobile software must provide for the needs of the modern mobile user.

II. Intelligent Information Systems Areas of Research

A new framework for developing IIS has been proposed, which is based on Fuzzy Logic, Neuro-fuzzy computing and Genetic Algorithms (GAs). More specifically, we have introduced and used a modified form of Fuzzy Cognitive Maps (FCM), enhanced by a specially designed Fuzzy Knowledge Base and Genetic Algorithms to produce a hybrid form of IIS. Such a hybrid system is able to handle the problems of traditional FCM (e.g., the limit-cycle phenomenon) and offers the ability to perform multi-objective scenario analysis and optimisation. The proposed IIS has been successfully employed for crisis modelling and decision support in various real world problems (such as the settlement of the Cyprus issue, the S-300 missiles and Imia crises), yielding very promising results. This research part also examines the extension of FCMs proposing a multi-layered structure comprising smaller FCMs and working using inheritance characteristics. In particular, the multi-layered FCM targets problems that involve factors with high levels of complexity. These factors may be decomposed into other sub-factors describing the behavior of their originator. This decomposition may continue at various layers thus resulting in a hierarchy of elementary and more easily manageable factors. The modelling of this elementary piece of information is based on the multi-layered structure of FCMs where each composite node-factor is essentially a child FCM at a lower level. In addition, multi-objective optimisation is addressed in order to provide the means for introducing hypothetical scenarios and enable simulation at various levels and concepts of interest.