Simulating Functional Data with fda.clust

Introduction

The fda.clust package provides powerful tools for clustering functional data. To test and benchmark the performance of these clustering methods, it is essential to generate synthetic functional data with known properties. This vignette introduces the main functions for simulating functional data using fda.clust, namely rprocKclust() and rprocKmu().

These functions allow users to generate functional data with a predefined number of clusters, controlled smoothness, and variability.

Simulating Functional Data

The fda.clust package offers the following key functions for data generation:

rprocKclust(): Simulates functional data with a known number of clusters.
rprocKmu(): Simulates mean functions for multiple clusters.

Each function provides a set of parameters that allow control over the type, smoothness, and structure of the generated data.

Generating Data with `rprocKclust()`

The rprocKclust() function generates functional data for K clusters. Users can control the number of observations per cluster, the type of generating process, and the variability of the curves.

Usage

rprocKclust(t, n, process, c, s)

Arguments:

t: A numeric vector defining the time points where the functional data is observed.
n: A vector specifying the number of curves in each cluster.
process: A vector defining the process type for each cluster (e.g., “sin”, “cos”, etc.).
c: A numeric vector indicating shifts or translations for each cluster.
s: A numeric vector controlling the smoothness or variability of each cluster.

Example

library(fda.clust)

set.seed(123)
t <- seq(0, 2 * pi, length.out = 101)
res <- rprocKclust(t, n = c(30, 50, 40),
                   process = c("cos_sin", "sin", "cos"),
                   c = c(-1, 1, 1),
                   s = c(0.2, 0.3, 0.1))
sim_data <- res$X
group <- res$groups

plot(sim_data, col = rainbow(3)[group], main = "Simulated Data from rprocKclust")

Functional data

The plot above illustrates the curves generated by rprocKclust(), with three distinct clusters, each having different shapes, shifts, and variability.

Generating Mean Functions with `rprocKmu()`

The rprocKmu() function generates mean functions for K clusters. This is useful for controlling the general shape or structure of the curves within each cluster.

Usage

rprocKmu(t, process, c)

Arguments: - t: A numeric vector defining the time points where the mean functions are evaluated. - process: A vector defining the process type for each cluster (e.g., “sin”, “cos”, etc.). - c: A numeric vector indicating shifts or translations for each cluster.

Example

set.seed(123)
t <- seq(0, 2 * pi, length.out = 101)
mean_functions <- rprocKmu(t, process = c("cos_sin", "sin", "cos"), c = c(-1, 1, 1))

plot(mean_functions, main = "Mean Functions from rprocKmu")

Functional centroids

The plot above shows the mean functions for three clusters. These mean functions are later used in the rprocKclust() function to generate actual functional data.

Customizing Simulations

The rprocKclust() and rprocKmu() functions allow users to customize the following aspects:

Number of Clusters: The number of clusters can be controlled via the n argument in rprocKclust().
Types of Functions: The shape of the curves (e.g., sinusoidal, cosine, etc.) is defined by the process argument.
Smoothness: The parameter s controls the smoothness or variability of the generated curves.

Manuel Oviedo (UDC), Manuel Febrero (USC)

2024-12-16

Introduction