Clustering Methods for Functional Data Analysis • fda.clust

The fda.clust package provides specialized methods for clustering functional data, inspired by the functional data analysis (FDA) framework. This package offers tools for clustering, validation, and visualization of functional data. It allows users to work with real and simulated datasets for performance evaluation of clustering methods.

Features

Clustering Methods:
- fkmeans: Functional k-means clustering.
- fdbscan: Functional DBSCAN clustering.
- fmeanshift: Functional mean-shift clustering.
- fhclust: Functional hierarchical clustering.
Validation Metrics:
- Silhouette: Measure of cohesion and separation.
- Dunn: Ratio of the smallest inter-cluster distance to the largest intra-cluster distance.
- Davies-Bouldin: Average similarity between clusters.
- Calinski-Harabasz: Ratio of between-cluster dispersion to within-cluster dispersion.
Data Simulation:
- rprocKclust: Simulate functional data for a predefined number of clusters.
- rprocKmu: Simulate mean functions for multiple clusters.
Datasets:
- ECG200: 200 heartbeats classified as normal or myocardial infarction.
- ECG5000: 5000 heartbeats classified into four groups.
- growth_ldata: Longitudinal growth data from the Berkeley Growth Study.

Installation

To install the development version from GitHub, run the following command in R:

# Install the development version from GitHub
if (!requireNamespace("devtools", quietly = TRUE)) install.packages("devtools")
devtools::install_github("moviedo5/fda.clust")

Usage

Below is an example demonstrating how to use the fkmeans function for clustering functional data.

library(fda.clust)

# Load the example dataset ECG200
data(ECG200)

# Perform k-means clustering on functional data with 2 clusters
set.seed(123)
result <- fkmeans(ECG200$x, ncl = 2)

# Plot the functional data with cluster assignments
plot(ECG200$x, col = result$cluster, main = "ECG200 Clustered with fkmeans")

Datasets

The package includes three functional datasets to test clustering methods:

ECG200: Electrical signals from heartbeats (2 classes: normal and myocardial infarction).
ECG5000: A larger dataset of 5000 heartbeats (4 classes).
growth_ldata: Longitudinal growth data from the Berkeley Growth Study, including the heights of boys and girls at 31 ages.

Available Functions