How Small Can You Go? LoRA Fine-Tuning 270M-8B Models for Merchant Information Extraction in Financial Transactions 文章

ArXiv CS.AI2026-06-09NEWSen作者: Donghao Huang, Tomas Drietomsky, Benjamin Barrett, Zhaoxia Wang

详细信息

来源站点
ArXiv CS.AI
作者
Donghao Huang, Tomas Drietomsky, Benjamin Barrett, Zhaoxia Wang
文章类型
NEWS
语言
en
发布日期
2026-06-09

摘要

arXiv:2606.08051v1 Announce Type: new Abstract: Financial transaction processing requires extracting structured merchant information from noisy, abbreviated bank transaction strings at scale. Our current production system, a LoRA-fine-tuned LLaMA 3.1-8B, achieves 96.95% F1 on this task, but deploying 8-billion-parameter models imposes prohibitive memory, latency, and cost constraints. To identify more efficient alternatives, we conduct a deployment-focused study of 24 model variants spanning four model families: Gemma 3 (270M, 1B, 4B), Qwen 3.5 (0.8B, 2B, 4B), Aya (3.35B), and LLaMA 3.1-8B, systematically evaluating accuracy, inference throughput, training cost, and hardware behavior to assess production suitability. Our findings show that: (1) reproducing the LLaMA 3.1-8B fine-tune with a LoRA rank of 8 achieves 96.75% F1, only 0.20 points below the rank-32 baseline; (2) Qwen 3.5 4B with JSON-only prompting reaches 96.60% F1, within 0.

相关事件

暂无数据

相关公司

暂无数据

相关人物

暂无数据