Skip to contents

Licence

The fda.clust package provides specialized methods for clustering functional data, inspired by the functional data analysis (FDA) framework. This package offers tools for clustering, validation, and visualization of functional data. It allows users to work with real and simulated datasets for performance evaluation of clustering methods.

Features

  • Clustering Methods:
    • fkmeans: Functional k-means clustering.
    • fdbscan: Functional DBSCAN clustering.
    • fmeanshift: Functional mean-shift clustering.
    • fhclust: Functional hierarchical clustering.
  • Validation Metrics:
    • Silhouette: Measure of cohesion and separation.
    • Dunn: Ratio of the smallest inter-cluster distance to the largest intra-cluster distance.
    • Davies-Bouldin: Average similarity between clusters.
    • Calinski-Harabasz: Ratio of between-cluster dispersion to within-cluster dispersion.
  • Data Simulation:
    • rprocKclust: Simulate functional data for a predefined number of clusters.
    • rprocKmu: Simulate mean functions for multiple clusters.
  • Datasets:
    • ECG200: 200 heartbeats classified as normal or myocardial infarction.
    • ECG5000: 5000 heartbeats classified into four groups.
    • growth_ldata: Longitudinal growth data from the Berkeley Growth Study.

Installation

To install the development version from GitHub, run the following command in R:

# Install the development version from GitHub
if (!requireNamespace("devtools", quietly = TRUE)) install.packages("devtools")
devtools::install_github("moviedo5/fda.clust")

Usage

Below is an example demonstrating how to use the fkmeans function for clustering functional data.

library(fda.clust)

# Load the example dataset ECG200
data(ECG200)

# Perform k-means clustering on functional data with 2 clusters
set.seed(123)
result <- fkmeans(ECG200$x, ncl = 2)

# Plot the functional data with cluster assignments
plot(ECG200$x, col = result$cluster, main = "ECG200 Clustered with fkmeans")

Datasets

The package includes three functional datasets to test clustering methods:

  1. ECG200: Electrical signals from heartbeats (2 classes: normal and myocardial infarction).
  2. ECG5000: A larger dataset of 5000 heartbeats (4 classes).
  3. growth_ldata: Longitudinal growth data from the Berkeley Growth Study, including the heights of boys and girls at 31 ages.

Available Functions

Clustering Methods

  • fkmeans: Perform functional k-means clustering.
  • fdbscan: Perform functional DBSCAN clustering.
  • fmeanshift: Perform functional mean-shift clustering.
  • fhclust: Perform functional hierarchical clustering.

Cluster Validation and Measures

  • fclust.measures: Evaluate the quality of clusters using internal indices such as silhouette, Dunn, Davies-Bouldin, and Calinski-Harabasz indices.

Data Simulation

  • rprocKclust: Generate functional data with known clusters for testing clustering methods.
  • rprocKmu: Generate mean functions for multiple clusters.

Utility Functions

  • kmeans.assig.groups: Assign functional data to clusters based on distances.
  • kmeans.centers.update: Update cluster centers during k-means clustering.
  • kmeans.fd.dist: Calculate distances for functional k-means clustering.

Documentation

To learn more about the fda.clust package, check the vignettes for a more comprehensive overview of its functionalities. You can access the vignettes directly in R:

vignette("Introduction", package = "fda.clust")
vignette("Simulations", package = "fda.clust")

Details on specific functions are in the reference manual. To learn more about the functions and their usage, you can refer to the pkgdown documentation site.

Vignettes:


Issues & Feature Requests

For reporting issues, bugs, feature requests, etc., please use the Github Issues page. Contributions and feedback are always welcome.


References

  • Febrero-Bande, M. and Oviedo de la Fuente, M. (2012). Statistical Computing in Functional Data Analysis: The R Package fda.usc. Journal of Statistical Software, 51(4):1-28, DOI.