摘要
arXiv:2606.04752v1 Announce Type: cross Abstract: Transformers consuming multi-channel scalar signals must embed $C$ simultaneous values into one $d_{\text{model}}$-dimensional vector per time step. We empirically audit eight input encoders -- spanning a shared-scalar baseline, per-channel linear projections, an orthogonality regulariser, a nonlinear MLP stem, block-partitioned concatenation, channel-independent and channel-as-token architectures, and a projected positional encoding -- on a synthetic benchmark designed to make channel identity informative and on ETTh1 as a real-data check, measured in next-step negative log-likelihood (NLL). The headline is one of practical near-equivalence within a wide "top tier": the standard per-channel linear projection (nn.Linear(C, $d_{\text{model}}$)) matches every alternative in that tier up to small, statistically real but practically modest, differences.
相关事件查看全部 (1)
相关公司
暂无数据
相关人物
暂无数据