First, the user has asked me to translate a given text into Chinese and to return only the translation without any explanations or annotations. The text to translate is: "Spectral Probe-Circuits: A Three-Step Recipe for Identifying Attention-Head Circuits in Pretrained Transformers" I need to break this down to ensure an accurate translation: - "Spectral Probe-Circuits": This seems to be a technical term. "Spectral" likely refers to something related to spectra, perhaps in a mathematical or computational context. "Probe-Circuits" might mean circuits that probe or investigate. I should keep it as a compound term. A possible translation could be "频谱探针电路" or something similar, but I need to be precise. - "A Three-Step Recipe": This is straightforward; it means a three-step method or procedure. - "for Identifying Attention-Head Circuits": "Identifying" means to find or recognize. "Attention-Head Circuits" refers to circuits involving attention heads, which are components in transformer models. In Chinese, "attention heads" are often translated as "注意力头", so "Attention-Head Circuits" could be "注意力头电路". - "in Pretrained Transformers": "Pretrained Transformers" refers to transformer models that have been pre-trained, common in AI. In Chinese, it's typically "预训练变压器" or more commonly "预训练Transformer模型", but since the text specifies "Transformers", I'll use "变压器" or keep it as "Transformer". In technical contexts, "Transformer" is often left untranslated or translated as "变压器" for the model. Putting it all together: - "Spectral Probe-Circuits": I think "频谱探针电路" is appropriate. "Spectral" can be "频谱", "Probe" as "探针", and "Circuits" as "电路". - "A Three-Step Recipe": "三步食谱" or more formally "三步法" or "三步方案". "Recipe" here implies a method or formula, so "三步法" might be better. - "for Identifying Attention-Head Circuits": "用于识别注意力头电路". - "in Pretrained Transformers": "在预训练变压器中". Combining: "频谱探针电路:用于识别预训练变压器中注意力头电路的三步法" But the original has a colon
一种无需标签或归因梯度的三步法可用于系统识别预训练Transformer中的注意力头电路。该方法利用每个头的注意力输出的时间积分参与比作为谱信号,筛选出进行持续内容依赖计算的头,再通过任务模式屏幕过滤出特定任务的候选电路,并结合随机对照组进行消融以完成因果验证。该方法在不同参数规模、架构和预训练流程的模型上均有效,证实了归纳电路的普遍必要性。
45
热度
70
质量
55
影响力
深度分析
背景与问题
理解大规模预训练Transformer的内部计算机制是当前可解释性研究的核心挑战。传统的注意力分析或依赖标签,或需梯度信息,难以无监督、系统性地识别出负责特定功能的注意力头电路。缺乏一种通用、可移植且能建立因果关系的方法,阻碍了对模型内部工作原理的深入洞察。
核心内容
本文提出并验证了一个普适的三步识别方法:
- 谱信号排序:计算每个注意力头输出的时间积分参与比作为通用信号,无监督地排序出那些进行持续内容依赖计算的头,这是识别功能电路的基础。
- 任务模式屏幕:将上述通用信号与特定任务的模式(如合成归纳任务)相结合,筛选出任务特异性的候选电路头。
- 因果消融验证:对候选电路头进行组消融,并与匹配的随机对照组进行比较,从而确立其对特定任务性能的因果必要性。
关键验证发现:
- 普遍性:在跨8倍参数范围(51M至7B总参数)、两种架构族(稠密型、混合专家型)和四条预训练流程的模型上,该方法均成功识别出2至6个头的归纳电路,消融后合成归纳任务的Top-1准确率下降94-100%。
- 无监督预测能力:在51M参数探针模型的六个独立随机种子上,该无监督信号均能准确预测出各自种子特定的电路。
- 规律性:在Pythia模型家族(124M至410M)中,可识别出进行专门化计算的注意力头比例稳定在17-19%,而具体归纳电路的规模维持在3-11个头,与总头数呈次线性关系。
意义与影响
- 方法论突破:提供了一种无需监督、可移植的通用框架,用于因果性地定位Transformer中执行特定计算的功能子网络,极大提升了内部机制分析的系统性和可靠性。
- 理论洞察:揭示了模型内部存在稳定比例的功能专门化电路,且关键任务电路(如归纳)的规模具有次线性增长的特性,这为理解Transformer的计算效率与模块化组织提供了新视角。
- 研究路径:作为一项方法锚点,本文为后续研究开辟了道路,包括追踪预训练过程中的电路发育轨迹,以及分析模式选择性与任务因果结构解耦的复合任务电路。
免责声明:以上内容由 AI 生成,仅供参考。