Files
flypythoncom.github.io/article/python-cs224n-01/index.html
2020-01-28 16:35:09 +08:00

184 lines
12 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html lang="zh-CN">
<head>
<head><meta name="generator" content="Hexo 3.9.0">
<!-- Title -->
<meta charset="utf-8">
<meta name="applicable-device" content="pc,mobile">
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=3.0, viewport-fit=cover">
<meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1">
<meta name="author" content="flypython">
<meta name="designer" content="flypython">
<meta name="keywords" content="cs224n 解答拾遗: 为何word2vec 训练的时候每个词有两个不同的向量,FlyPython - 专业的Python学习社区,flypython, 飞蟒飞蟒PythonPython入门Python自动化Python日报">
<meta property="og:title" content="cs224n 解答拾遗: 为何word2vec 训练的时候每个词有两个不同的向量 | FlyPython - 专业的Python学习社区">
<meta property="og:site_name" content="http://www.flypython.com">
<meta property="og:type" content="article">
<meta property="og:url" content="http://www.flypython.com/article/python-cs224n-01/">
<meta property="og:image" content="http://www.flypython.com/images/cs224n-01.png">
<meta property="og:description" content="cs224n 解答拾遗: 为何word2vec 训练的时候每个词有两个不同的向量--cs224n解答">
<meta name="description" content="cs224n 解答拾遗: 为何word2vec 训练的时候每个词有两个不同的向量--cs224n解答">
<meta name="rating" content="general">
<meta name="apple-mobile-web-app-capable" content="yes">
<meta name="apple-mobile-web-app-status-bar-style" content="black">
<meta name="format-detection" content="telephone=yes">
<meta name="mobile-web-app-capable" content="yes">
<meta name="robots" content="index, follow">
<link rel="icon" href="/images/favicon.ico">
<title>cs224n 解答拾遗: 为何word2vec 训练的时候每个词有两个不同的向量 | FlyPython - 专业的Python学习社区</title>
<link rel="stylesheet" href="/css/f25.css">
<link rel="stylesheet" href="/css/highlight.css">
<link rel="stylesheet" href="/css/gitalk.css">
<!-- Global site tag (gtag.js) - Google Analytics -->
<script async src="https://www.googletagmanager.com/gtag/js?id=UA-147288599-1"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'UA-147288599-1');
</script>
</head>
</head>
<body>
<header class="wrapper header-wrapper">
<div class="container header-nav-wrapper">
<div class="logo"><a href="/" title="FlyPython - 专业的Python学习社区"><h1 class="title">FlyPython</h1></a></div>
<nav class="nav-wrapper">
<a href="https://flypython.com/python" title="飞蟒微课堂">飞蟒微课堂</a>
<a href="https://flypython.com/flypython_daily" title="Python日报">Python日报</a>
<a href="https://flypython.com/PyCon/" title="PyCon">PyCon</a>
<a href="https://github.com/flypythoncom" title="Github">Github</a>
<a href="/article/about" title="关于">关于</a>
</nav>
<span class="btn-menu" id="J_header_menu">
<div class="inner">
<span class="line line-01"></span>
<span class="line line-02"></span>
<span class="line line-03"></span>
</div>
</span>
<div class="wrapper mb-nav-wrapper" id="J_header_menu_list">
<nav class="wrapper mb-nav-container">
<a href="https://flypython.com/python" title="飞蟒微课堂">飞蟒微课堂</a>
<a href="https://flypython.com/flypython_daily" title="Python日报">Python日报</a>
<a href="https://flypython.com/PyCon/" title="PyCon">PyCon</a>
<a href="https://github.com/flypythoncom" title="Github">Github</a>
<a href="/article/about" title="关于">关于</a>
</nav>
</div>
</div>
</header>
<section class="body-wrapper">
<section class="wrapper post-banner">
<div class="container post-banner-container">
<h2 class="wrapper title">cs224n 解答拾遗: 为何word2vec 训练的时候每个词有两个不同的向量</h2>
<div class="wrapper tips">
<span>Author</span><span>flypython</span> | <span>Date: </span><span>2020-01-01</span> | <span>Category</span><span><a href="/fly/自然语言处理/" title="自然语言处理">自然语言处理</a><a href="/fly/自然语言处理/cs224n/" title="cs224n">cs224n</a></span>
</div>
</div>
</section>
<section class="wrapper main-wrapper">
<article class="sub-container post-content">
<h3 id="1-前言为何一个词有2个向量"><a href="#1-前言为何一个词有2个向量" class="headerlink" title="1.前言为何一个词有2个向量"></a>1.前言为何一个词有2个向量</h3><blockquote>
<p>在cs224n 2019课程中的L1和L2中我们详细论述了word2vec skip-gram 模型,相信大家都已经掌握了。</p>
</blockquote>
<p>但这里有一个细节,肯定让不少同学非常疑惑。<br>视频中提到一个词有2个向量总参数是2d*V。</p>
<p><img src="https://raw.githubusercontent.com/jcjview/jcjview.github.io/master/img/2vec_20200114150901.png" alt></p>
<p>Lecture 01 Introduction and Word Vectorsppt第23页)提到了use two vectors per word</p>
<p><img src="https://raw.githubusercontent.com/jcjview/jcjview.github.io/master/img/2vec_20200114145552.png" alt></p>
<p>Lecture 01 Introduction and Word Vectorsppt第27页</p>
<p>如上图示在训练过程中是存在2个词向量的。这怎么理解明明一个词最终输出的时候只有一个词向量呀。</p>
<h3 id="2-word2vec回顾"><a href="#2-word2vec回顾" class="headerlink" title="2.word2vec回顾"></a>2.word2vec回顾</h3><p>在视频中Manning教授简单提了一嘴我们这里详细说明一下,word2vec为什么要在训练的时候使用2个词向量这2个词向量是什么关系最终输出的是什么词向量。</p>
<p>这2个词向量一个是词$w$的word representations $v_w$ 在下图中表示为V一个是该词w的context representations $u_w$在下图中表示为U。</p>
<p><img src="https://raw.githubusercontent.com/jcjview/jcjview.github.io/master/img/2vec_20200114145755.png" alt></p>
<p>Lecture 02 Word Vectors 2 and Word Senses ppt第4页</p>
<p>在我们训练词向量的时候对于训练语料使得P(o|c)的概率最大,也就是<br>J=-logP(o|c)最小如果没有负号则是求最大cs224n视频的ppt就没有负号</p>
<p>我们是怎么定义P(o|c)的呢?</p>
<p>定义如下:</p>
<p>$P(o|c)={exp(u_o^Tv_c)}/{\sum_{w=1}^V exp(u_w^Tv_c)}$</p>
<p>可以看到这个条件概率是中心词和上下文词2个词向量的点乘再取e的指数并除以所有词的词向量和中心词的点乘取e的指数之和公式的本质是一个向量点乘+softmax。</p>
<p>这里当w是o 中心词的时候使用词的word representations而w是c上下文词的时候使用词的context representations。</p>
<h3 id="3-问:为什么要这么做呢,为什么不都使用同一个词向量呢?"><a href="#3-问:为什么要这么做呢,为什么不都使用同一个词向量呢?" class="headerlink" title="3.问:为什么要这么做呢,为什么不都使用同一个词向量呢?"></a>3.问:为什么要这么做呢,为什么不都使用同一个词向量呢?</h3><p>使用2个词向量的目的是训练时我们需要让具有相同上下文的词的词向量相互接近但我们不希望这些词的上下文的词也相互接近。</p>
<p>以下面2句话的例子讲解一下</p>
<p>the <strong>dog</strong> has a tail</p>
<p>the <strong>cat</strong> has a tail</p>
<p>这里采用word2vec skip-gram ,naive softmax 模型窗口大小采用k=1。</p>
<p>对于中心词dog来说需要计算</p>
<p>P(dog|has) </p>
<p>P(dog|the) </p>
<p>对于中心词cat来说需要计算</p>
<p>P(cat|has)</p>
<p>P(cat|the)</p>
<p>这里的条件概率公式是上面的:<br>$P(o|c)={exp(u_o^Tv_c)}/{\sum_{w=1}^V exp(u_w^Tv_c)}$</p>
<p>分别把中心词dog/cat 和上下文词has/the带入就可求出概率。<br>对于目标函数J=-logP(o|c)我们要求其最小值那么在word2vec算法中需要做梯度下降算法。这里使用sgd只需要对每个样本进行梯度下降如下公式</p>
<p>$u_o(new)=u_o(old)-\alpha \partial J/(\partial u_o)$</p>
<p>$v_c(new)=v_c(old)-\alpha \partial J/(\partial v_c)$</p>
<p>多次迭代sgd后会使得J变小而$u_o$和$v_c$的点乘会变大进而使得两者余弦距离接近。那么如果dog/cat 分别和has/the 接近了dog和cat的词余弦距离就会接近。</p>
<p>假设这里$u_o$和$v_c$都是同一种向量那么不光dog和cat的词余弦距离接近cat和has也会相近连has和the都会相近。</p>
<p>但是我们希望最后输出的词向量dog和cat相近但不希望cat和has相近也不希望has和the相近。如果都相近词就没有区分度了。</p>
<p>所以我们对于每个词采用双词向量对于dog 这个词有一个word representations $v_c$ 用来作为中心词时计算有一个context representations $u_o$)作为上下文词时进行计算。</p>
<p>context representations训练为了使得中心词word representations相近context representations作为中间结果不输出而word representations作为最终结果输出。这样就避免了所有中心词的词向量都接近的困境了。</p>
<p>最终我们保存中心词的word representations既公式中的$v_c$ 。</p>
<h3 id="4-考古"><a href="#4-考古" class="headerlink" title="4.考古"></a>4.考古</h3><p>解释清楚这个问题以后我们知道最早的算法是否使用2个vec的。</p>
<p>在word2vec提出者Tomas Mikolov 的论文Distributed Representations of Words and Phrases and their Compositionality<br>和他提供的word2vec.c代码里训练过程中每个word都存在2个词向量。</p>
<p>如下图<br><img src="https://raw.githubusercontent.com/jcjview/jcjview.github.io/master/img/2vec_20200114150350.png" alt><br>论文Distributed Representations of Words and Phrases and their Compositionality 中截图</p>
<p><img src="https://raw.githubusercontent.com/jcjview/jcjview.github.io/master/img/2vec_20200114150726.png" alt><br>word2vec.c代码截图</p>
<p>所以我们清楚了最早的word2vec算法确实在训练的时候是存在2个vec的。</p>
<h3 id="5-参考文献:"><a href="#5-参考文献:" class="headerlink" title="5.参考文献:"></a>5.参考文献:</h3><p><a href="https://www.quora.com/Why-does-word2vec-have-two-different-representation-for-words" target="_blank" rel="noopener">https://www.quora.com/Why-does-word2vec-have-two-different-representation-for-words</a></p>
<p>论文Distributed Representations of Words and Phrases<br>and their Compositionality</p>
</article>
<div class="sub-container gitalk-wrapper" id="gitalk-container"></div>
</section>
<div class="tips-top-wrapper">
<span class="tip-top-container" onclick="scrollToWindowTop()">
<span class="l-bar"></span>
<span class="r-bar"></span>
</span>
</div>
<footer class="wrapper footer-wrapper">
<div class="container"><span class="copyright">&copy; 2020 FlyPython . All Rights Reserved.</span></div>
</footer>
</section>
<script src="/js/f25.js"></script>
<script src="/js/gitalk.min.js"></script>
<script>
var gitalkAdmin = 'xxg1413'.split(',');
var gitalk = new Gitalk({
clientID: 'd0e566bfc45c0b852c6c',
clientSecret: '6b69b3a841c85a6223e5a904c47f5e2d84322980',
repo: 'gitalk',
owner: 'flypythoncom',
admin: gitalkAdmin,
id: location.pathname.length > 50 ? location.pathname.substr(0,50) : location.pathname, // Ensure uniqueness and length less than 50
distractionFreeMode: false // Facebook-like distraction free mode
});
gitalk.render('gitalk-container');
</script>
</body>
</html>