PH.D. DISSERTATION PROPOSAL

Privacy-Preserving Machine Learning
for Skeleton-Based Data

Thomas Carr

Advised by Dr. Depeng Xu

Committee Members:
Dr. Aidong Lu • Dr. Xi "Sunshine" Niu • Dr. Minwoo "Jake" Lee • Dr. Jeremy Holleman

December 5th, 2025

Proposal Outline

1. Introduction

Motivation & threat model.

2. Research Questions

The core problems this dissertation addresses.

3. Study 1

Explanation-Based Anonymization (PAKDD 25).

4. Study 2

Privacy-centric Motion Retargeting (ICCV 25).

5. Study 3

Disentangled Transformer Motion Retargeting.

6. Summary

Timeline & Broader Implications.

Background: Skeleton Data Utility

Example Representation: $\mathbf{S} \in \mathbb{R}^{T \times J \times 3}$

What is Skeleton Data?

Skeleton data: joint coordinates over time, no RGB pixels.

Why is it important?

VR: Avatar control.
Healthcare: Gait and rehab monitoring.
Surveillance: Security and activity monitoring.

O. Seredin, A. Kopylov, S.-C. Huang, and D. Rodionov, “A skeleton featuresbased fall detection using microsoft kinect v2 with one class-classifier outlier removal,” ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. XLII-2/W12, pp. 189-195, 05 2019.

The Privacy Threat

S. Moon, M. Kim, Z. Qin, Y. Liu, and D. Kim, “Anonymization for skeleton action recognition,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2023.

The "Anonymous" Myth

Assumption:	No video → no privacy risk
Reality:	Motion is a biometric

Static: limb lengths, bone ratios.
Dynamic: gait frequency, posture.

Motion alone is enough to track users or infer sensitive health attributes.

Attack Vectors

We analyze three primary threats to skeleton privacy.

1. Re-ID

Who?

Matching to database.

Accuracy: >80%*

2. Inference

What?

Predicting traits.

Accuracy: ~87%* (gender)

3. Linkage

Same person?

Cross-session tracking.

Accuracy: 74%^†

* S. Moon, M. Kim, Z. Qin, Y. Liu, and D. Kim, “Anonymization for skeleton action recognition,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2023.

^† T. Carr, A. Lu, and D. Xu, “Linkage attack on skeleton-based motion visualization,” in Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, pp. 3758-3762, 2023.

Evidence: The Linkage Attack (CIKM 23)

Unlabeled, Siamese network, cross-session linkage.

T. Carr, A. Lu, and D. Xu, “Linkage attack on skeleton-based motion visualization,” in Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, pp. 3758–3762, 2023.

Research Gaps

Gap 1: Varying Joint Sensitivity

Traditional methods apply uniform noise, ignoring that joints have varying sensitivity. Global perturbation disproportionately degrades utility.

Gap 2: Attacker Agnostic Anonymization

Existing techniques rely on "white-box" assumptions. These fail against unknown, real-world attack vectors where the model is black-box.

Gap 3: Architectural Limitations

Modern ML uses Transformers for temporal dependencies, yet privacy methods still rely on frame-by-frame CNNs, missing long-term context.

Gap 4: Kinematic Inconsistency

Treating skeletons as unstructured vectors ignores biomechanics, leading to impossible motions like bone stretching and jitter.

Research Questions

RQ1: Explainable AI-guided anonymization

How can explainable AI techniques identify privacy-sensitive joints to enable targeted anonymization that overcomes the utility loss of uniform perturbation methods?

RQ2: Motion-retargeting as a defense

How can motion retargeting serve as an attacker-agnostic defense by effectively disentangling user identity from motion content without reliance on specific threat models?

RQ3: Transformer-based anonymization

How can transformer architectures be leveraged to capture long-horizon dependencies and enforce kinematic consistency, thereby strengthening the privacy-utility balance and ensuring physical plausibility?

Proposed Studies

Based on the research questions, we propose three studies.

Study 1

Explanation-Based Anonymization
Targeted masking using Integrated Gradients.

Published in PAKDD 25

Study 2

Privacy-centric Motion Retargeting (PMR)
Implicit disentanglement via CNNs.

Published in ICCV 25

Study 3

Disentangled Transformer (TMR)
Explicit disentanglement via Transformers.

Study 1

EXPLANATION-BASED ANONYMIZATION METHODS FOR MOTION PRIVACY

White-Box Defense using Integrated Gradients

Study 1: Introduction

The Goal

To significantly reduce the performance of the Threat Model (Privacy) while maintaining high performance on the Utility Model (Action Recognition).

The Approach

We train two competing deep learning models: a Utility Model and a Threat Model. Our goal is to reduce the Threat Model's accuracy while keeping the Utility Model's accuracy high.

Hypothesis: Targeted noise (based on sensitivity) beats uniform noise on the privacy–utility trade-off.

Train Utility and Threat models.
Use XAI to identify privacy-sensitive joints.
Add noise to those joints.

Methodology: Training Two Models

We train two independent deep learning models to capture the opposing goals of utility preservation and privacy protection.

Utility Model ($f^U$)

Task: Action Recognition

Labels: 60 Actions.
Goal: Maximize Accuracy.

Threat Model ($f^P$)

Task: Re-Identification

Labels: 40 User IDs.
Goal: Minimize Accuracy.

Technical Foundation: Integrated Gradients

Integrated Gradients (IG):

$$ \boldsymbol{\pi}^f_i(x) = (x_i - x_i') \times \int_{a=0}^{1} \frac{\partial f\left(x' + a (x - x')\right)}{\partial x_i} \text{d}a $$

Key Properties:

Attribution: Explains predictions by attributing importance to input features.
Axioms: Satisfies Completeness and Sensitivity.
Applicability: Works with any differentiable model (e.g., SGN).
Efficiency: Suitable for high-dimensional data like skeletons.

Image adapted from Gilbert Tanner, "Interpreting PyTorch models with Captum," 2019.

M. Sundararajan, A. Taly, and Q. Yan, “Axiomatic attribution for deep networks,” in International conference on machine learning, pp. 3319–3328, PMLR, 2017.

Technical Foundation: Differential Privacy

Definition: A randomized function $\mathcal{M}$ satisfies $(\epsilon, \delta)$-differential privacy if for all neighboring datasets $D$ and $D'$:

$ P(\mathcal{M}(D) \in O) \leq \exp(\epsilon) \cdot P(\mathcal{M}(D') \in O) + \delta $

Where $\epsilon$ denotes privacy budget and $\delta$ is a broken probability.

Our Application:

Budget Distribution: Total privacy budget $\epsilon$ is distributed across joints.
Targeted Noise: Sensitive joints receive a smaller budget (more noise).
Gaussian Mechanism: $\mathcal{M}(X) = f(X) + (Z_1,\ldots,Z_k)$

C. Dwork and A. Roth, “The algorithmic foundations of differential privacy,” Foundations and Trends in Theoretical Computer Science, vol. 9, no. 3-4, pp. 211–407, 2014.

Methodology: Sensitivity Score

We use Integrated Gradients (IG) to generate per-joint attributions ($\phi$).

IG integrates gradients along a path from a baseline to the input, quantifying how much each joint contributes to the model's prediction.

$$ \phi^U_j = \frac{1}{TC} \sum_{t,c} |\boldsymbol{\pi}^U_{t,j,c}|, \quad \phi^P_j = \frac{1}{TC} \sum_{t,c} |\boldsymbol{\pi}^P_{t,j,c}| $$

$$ \psi_j = \phi_j^P + \alpha(1 - \phi_j^U) $$

High $\psi_j$ = Target for masking/noise.

(Risky for privacy, not crucial for utility)

Approach 1: Smart Masking

Sets the most sensitive joints to zero.

Step 1: Sort joints by Sensitivity Score ($\psi_j$).
Step 2: Select top $\beta\%$ as Sensitive Group ($G_s$).
Step 3: Mask them ($s_j = 0$).

Definitions:

$\psi_j$: Sensitivity Score.
$\beta$: Percentage of joints to mask.
$G_s$: Sensitive Group (masked).
$G_n$: Non-Sensitive Group (kept).

Approach 2: Group Noise

Strategy: Split joints into two groups.

Sensitive ($G_s$): Top $\beta\%$ joints. Receives More Noise.
Non-Sensitive ($G_n$): Remaining joints. Receives Less Noise.

$$ \epsilon_s = \frac{\gamma \epsilon}{\gamma \beta J + (1 - \beta) J}, \quad \epsilon_n = \frac{\epsilon}{\gamma\beta J + (1 - \beta) J} $$

Definitions:

$\epsilon$: Total Privacy Budget.
$\epsilon_s, \epsilon_n$: Budget for sensitive/non-sensitive groups.
$\beta$: Proportion of sensitive joints ($|G_s|/J$).
$\gamma$: Noise Ratio ($\epsilon_s / \epsilon_n$).

Problem: Jittery Motion

High noise on specific joints causes intense fluctuations and bone stretching.

Approach 3: Smart Noise

Intuition: Continuous noise scaling based on sensitivity.

$$ \sigma_j \propto \frac{1}{\psi_j \cdot \epsilon} $$

Definitions:

$\psi_j$: Sensitivity Score.
$\epsilon$: Privacy Budget.
$\sigma_j$: Noise magnitude for joint $j$.

Compared to Group Noise: better visual quality, still strong privacy.

Experimental Setup

Datasets & Protocols

NTU-60 & NTU-120: Large-scale skeleton datasets.
Protocols: Cross-Subject (CS) and Cross-View (CV).

Model

Backbone: Semantics-Guided Neural Network (SGN)*.

Baselines

Consumer VR: Mask all but Head + Hands (Left+Right).
Naive Noise: Uniform Gaussian noise.

Hyperparameters Tested

$\alpha$ (Weighting): 0.1, 0.5, 0.9
$\beta$ (Masking %): 0.1, 0.2, 0.3, 0.4, 0.5
$\gamma$ (Noise Ratio): 0.01, 0.03, 0.05, 0.07, 0.1
$\sigma$ (Smart Noise): 0.01, 0.05, 0.1

Metrics

Privacy: Re-ID Accuracy (Lower is better).
Utility: Action Rec. Accuracy (Higher is better).

* P. Zhang, C. Lan, W. Zeng, J. Xing, J. Xue, and N. Zheng, “Semantics-guided neural networks for efficient skeleton-based human action recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1112-1121, 2020.

Study 1 Results (NTU-60)

Key point: Smart Noise offers the best balance of privacy, utility, and visual quality.

Method	Action Rec. $\uparrow$	Re-ID $\downarrow$	Qualitative Analysis
Original Data	94.72%	81.58%	-
Consumer VR (3 Joints)	90.82%	58.97%	Poor (Not visualizable)
Naive Noise	90.04%	76.08%	Good (Smooth)
Smart Masking	81.23%	21.82%	Poor (Missing limbs)
Group Noise	72.20%	12.70%	Poor (Jittery)
Smart Noise	79.97%	28.24%	Good (Smooth)

Hyperparameters: $\alpha=0.9, \beta=0.2, \gamma=0.03, \sigma=0.01$.

AnonVis Demo (ISMAR 25)

We validate not just numbers, but how the motion looks to humans.

Interactive Visualization

To validate visual quality, we built a VR pipeline.

Pipeline: Skeleton data is processed in Blender then loaded into Unity.
Mapping: Same mesh, different privacy transformations to isolate motion artifacts from mesh deformations.
Result: Users can visually distinguish actions even with high noise.

Study 1: Summary & Contributions

Contributions

Methodology: First use of Integrated Gradients for skeleton privacy.
Technique: Developed Smart Masking, Smart Noise, and Group Noise.
Validation: AnonVis VR tool for human-in-the-loop evaluation.
Outcome: Re-ID reduced by ~84% in white-box settings while preserving useful action recognition.

Limitation: White-Box

While effective, Study 1 relies on knowing the attacker's model parameters.

We need a defense that works without knowing the attacker (Black-Box).

This motivates Study 2: can we defend against unknown attackers?

Study 2

PRIVACY-CENTRIC DEEP MOTION RETARGETING FOR ANONYMIZATION OF SKELETON-BASED MOTION VISUALIZATION

Black-Box Defense via Implicit Disentanglement