Cloud and Big Data are perhaps one of the most talked about aspects in the world of technology, with more and more people/organisations trying out new ways to interact with their data. Data mining is an important concept in these domains, as such those looking to exploit the same need to be aware of the various tools that exist today!
The ELKI framework is written in Java and built around a modular architecture. Most currently included algorithms belong to clustering, outlier detection and database indexes. A key concept of ELKI is to allow the combination of arbitrary algorithms, data types, distance functions and indexes and evaluate these combinations. When developing new algorithms or index structures, the existing components can be reused and combined.
SCaViS is an interactive framework for scientific computation, data analysis and data visualisation designed for scientists, engineers and students. SCaViS is multiplatform since it is written in Java, thus it runs on any operating system where the Java virtual machine can be installed.
KNIME, the Konstanz Information Miner, is an open source data analytics, reporting and integration platform. KNIME integrates various components for machine learning and data mining through its modular data pipelining concept. A graphical user interface allows assembly of nodes for data preprocessing (ETL: Extraction, Transformation, Loading), for modeling and data analysis and visualisation.
OpenNN (Open Neural Networks Library) is a software library written in the C++ programming language which implements neural networks. The library is open source, hosted at SourceForge and licensed under the GNU Lesser General Public License. OpenNN was formerly known as Flood.
RapidMiner is a software platform that provides an integrated environment for machine learning, data mining, text mining, predictive analytics and business analytics. It is used for business and industrial applications as well as for research, education, training, rapid prototyping, and application development and supports all steps of the data mining process including results visualisation, validation and optimisation.
Orange is a component-based data mining and machine learning software suite, featuring a visual programming front-end for explorative data analysis and visualisation, and Python bindings and libraries for scripting. It includes a set of components for data preprocessing, feature scoring and filtering, modeling, model evaluation, and exploration techniques. It is implemented in C++ and Python.
The Weka workbench contains a collection of visualisation tools and algorithms for data analysis and predictive modeling, together with graphical user interfaces for easy access to this functionality.
Open source Java reporting tool that can write to a variety of targets, such as: screen, a printer, into PDF, HTML, Microsoft Excel, RTF, ODT, Comma-separated values or XML files. It can be used in Java-enabled applications, including Java EE or web applications, to generate dynamic content. It reads its instructions from an XML or .jasper file.