We optimize iteratively the objective function \[ \lambda\underbrace{\sum_{i=1}^n{\sum_{k=1}^{d-1}}\left(\frac{1}{\sqrt{d}}\|{\mathbf{P}_k}\mathbf{X}_i\|-1\right)^2}_{\substack{\text{Proj's distance to}\\ \sqrt{d}\times\mathbb{S}^{2 d-1}\supset(\mathbb{S}^1)^d}}+(1-\lambda)\underbrace{\sum_{i=1}^n\|(\mathbf{I}-\mathbf{P})\mathbf{X}_i\|^2}_{\substack{\text{Complement to}\\\text{proj's variation}}} \] We set \(\lambda=0.5\) in the next plots. We use the raw scores (no rescaling, see Issue 1). The legend for the next plots:
distanceScaled = FALSE (the default). See Issue 1.## Reduction to dimension d = 2. Time: 0.1 seconds.
## Reduction to dimension d = 1. Time: 0.309 seconds.
## Reduction to dimension d = 2. Time: 0.142 seconds.
## Reduction to dimension d = 1. Time: 0.347 seconds.
## Reduction to dimension d = 2. Time: 0.262 seconds.
## Reduction to dimension d = 1. Time: 0.461 seconds.
## Reduction to dimension d = 2. Time: 0.085 seconds.
## Reduction to dimension d = 1. Time: 0.349 seconds.
## Reduction to dimension d = 2. Time: 0.054 seconds.
## Reduction to dimension d = 1. Time: 0.235 seconds.
## Reduction to dimension d = 2. Time: 0.056 seconds.
## Reduction to dimension d = 1. Time: 0.237 seconds.
## Reduction to dimension d = 2. Time: 0.058 seconds.
## Reduction to dimension d = 1. Time: 0.232 seconds.
## Reduction to dimension d = 2. Time: 0.064 seconds.
## Reduction to dimension d = 1. Time: 0.342 seconds.
## Reduction to dimension d = 2. Time: 0.148 seconds.
## Reduction to dimension d = 1. Time: 0.337 seconds.
## Reduction to dimension d = 2. Time: 0.164 seconds.
## Reduction to dimension d = 1. Time: 0.431 seconds.
## Reduction to dimension d = 2. Time: 0.192 seconds.
## Reduction to dimension d = 1. Time: 0.495 seconds.
Legend for the next plots:
## Reduction to dimension d = 2. Time: 0.259 seconds.
## Reduction to dimension d = 1. Time: 0.533 seconds.
## Reduction to dimension d = 2. Time: 0.252 seconds.
## Reduction to dimension d = 1. Time: 0.62 seconds.
## Reduction to dimension d = 2. Time: 1.029 seconds.
## Reduction to dimension d = 1. Time: 1.893 seconds.
## Reduction to dimension d = 2. Time: 0.925 seconds.
## Reduction to dimension d = 1. Time: 1.741 seconds.
## Reduction to dimension d = 2. Time: 0.3 seconds.
## Reduction to dimension d = 1. Time: 0.701 seconds.
## Reduction to dimension d = 2. Time: 0.291 seconds.
## Reduction to dimension d = 1. Time: 0.727 seconds.
Therefore, \((\text{Score}_1,\text{Score}_2)\not\in[-\pi,\pi)^2\). Using distanceScaled = TRUE, these are scores are multiplied by \(\frac{\pi}{l/2}\) and \(\frac{\pi}{\sqrt{2} \pi / 2}\) in order to force them to be in \([-\pi,\pi)\).
Recall that for a random variable \(X\) with \(\mathrm{supp}(X)=(a,b)\), \(\mathbb{V}\mathrm{ar}[X]\leq\frac{(b-a)^2}{4}\). This means that:
For general \(d\), \[
\text{Score}_d=\mathrm{Distance\_from\_point\_to\_surface\_projection}\in[-\frac{\sqrt{d}\pi}{2},\frac{\sqrt{d}\pi}{2})
\] and therefore \(\mathbb{V}\mathrm{ar}[\text{Score}_d]\leq\frac{1}{4}d\pi^2=\frac{d}{4}\pi^2=:C_d\). This means that \[
C_2<C_3<\ldots<C_{d-1}<C_{d}
\] and that \[
C_{1}=C_{16}>C_{15}>\ldots>C_2,\quad C_1<C_{17}<C_{18}<\ldots<C_d.
\] Setting distanceScaled = FALSE (raw scores) or distanceScaled = TRUE (applies rescaling) has consequences, as illustrated below.
## Reduction to dimension d = 2. Time: 0.119 seconds.
## Reduction to dimension d = 1. Time: 0.353 seconds.
## Reduction to dimension d = 2. Time: 0.152 seconds.
## Reduction to dimension d = 1. Time: 0.331 seconds.
## Reduction to dimension d = 2. Time: 0.061 seconds.
## Reduction to dimension d = 1. Time: 0.229 seconds.
## Reduction to dimension d = 2. Time: 0.056 seconds.
## Reduction to dimension d = 1. Time: 0.281 seconds.
## Reduction to dimension d = 2. Time: 0.061 seconds.
## Reduction to dimension d = 1. Time: 0.321 seconds.
## Reduction to dimension d = 2. Time: 0.056 seconds.
## Reduction to dimension d = 1. Time: 0.236 seconds.
Some choices of the vectors \(\mathbf{u}\) and \(\mathbf{v}\) provide more squarish curves than others. Squarish fits tend to yield degenerate projections to the corners. This is likely more problematic in higher dimensions, since the moment the projections collapse, they will remain degenerate for lower-dimensional fits, hence producing a sequence of degenerate scores.
I do not know what is the parametrization of \(\mathbf{u}\) and \(\mathbf{v}\) that provides squarish curves (squarish can be characterized by the length of the curves, which is close to \(4\pi\) for the squarish ones). It is somehow related with the vectors \(\mathbf{u}\) and \(\mathbf{v}\) having entries close to \(0\) (respectively, close to \(1\)), as the following empirical evidence suggests (regressions of lengths on entries of the vectors):
# Sample random curves and evaluate their lengths
M <- 1e4
l <- numeric(M)
x <- matrix(nrow = M, ncol = 3)
y <- matrix(nrow = M, ncol = 2)
u <- v <- matrix(nrow = M, ncol = 4)
for (i in 1:M) {
x[i, ] <- c(runif(2, min = 0, max = pi), runif(1, min = -pi, max = pi))
y[i, ] <- c(runif(1, min = 0, max = pi), runif(1, min = -pi, max = pi))
u[i, ] <- anglesToSphere(theta = x[i, ])
v[i, ] <- c(anglesToSphere(theta = y[i, ]) %*% Hu(u = u[i, ])[-1, ])
l[i] <- distPC1Curve(alpha = c(0, 2 * pi - 1e-4), u = u[i, ], v = v[i, ],
N = 1e3, shortest = FALSE, der = FALSE)
}
df <- data.frame("l" = l, "u" = u, "v" = v)
df <- reshape(df, direction = "long",
varying = c("u.1", "u.2", "u.3", "u.4", "v.1", "v.2", "v.3", "v.4"),
times = c("u", "v"), timevar = "k")
df <- reshape2::melt(df, measure.vars = c("u", "v"))
df$id <- NULL
df$k <- as.factor(df$k)
names(df) <- c("len", "k", "uv", "vec")
grid.arrange(
ggplot(data = df[df$uv == "u", ], mapping = aes(x = vec, y = len, colour = k)) +
geom_smooth(se = FALSE) + ggtitle("Regression of the length curve on the entries of u"),
ggplot(data = df[df$uv == "v", ], mapping = aes(x = vec, y = len, colour = k)) +
geom_smooth(se = FALSE) + ggtitle("Regression of the length curve on the entries of v")
)
If we could characterize when the curves are squarish and also the surface, we could add a small penalty in the obejective function to avoid degenerate solutions.