Site updated: 2020-01-28 16:35:08
This commit is contained in:
183
article/python-cs224n-01/index.html
Normal file
183
article/python-cs224n-01/index.html
Normal file
@@ -0,0 +1,183 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="zh-CN">
|
||||
<head>
|
||||
<head><meta name="generator" content="Hexo 3.9.0">
|
||||
<!-- Title -->
|
||||
|
||||
<meta charset="utf-8">
|
||||
<meta name="applicable-device" content="pc,mobile">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=3.0, viewport-fit=cover">
|
||||
<meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1">
|
||||
<meta name="author" content="flypython">
|
||||
<meta name="designer" content="flypython">
|
||||
<meta name="keywords" content="cs224n 解答拾遗: 为何word2vec 训练的时候每个词有两个不同的向量,FlyPython - 专业的Python学习社区,flypython, 飞蟒,飞蟒Python,Python入门,Python自动化,Python日报">
|
||||
<meta property="og:title" content="cs224n 解答拾遗: 为何word2vec 训练的时候每个词有两个不同的向量 | FlyPython - 专业的Python学习社区">
|
||||
<meta property="og:site_name" content="http://www.flypython.com">
|
||||
|
||||
<meta property="og:type" content="article">
|
||||
<meta property="og:url" content="http://www.flypython.com/article/python-cs224n-01/">
|
||||
<meta property="og:image" content="http://www.flypython.com/images/cs224n-01.png">
|
||||
<meta property="og:description" content="cs224n 解答拾遗: 为何word2vec 训练的时候每个词有两个不同的向量--cs224n解答">
|
||||
<meta name="description" content="cs224n 解答拾遗: 为何word2vec 训练的时候每个词有两个不同的向量--cs224n解答">
|
||||
|
||||
<meta name="rating" content="general">
|
||||
<meta name="apple-mobile-web-app-capable" content="yes">
|
||||
<meta name="apple-mobile-web-app-status-bar-style" content="black">
|
||||
<meta name="format-detection" content="telephone=yes">
|
||||
<meta name="mobile-web-app-capable" content="yes">
|
||||
<meta name="robots" content="index, follow">
|
||||
<link rel="icon" href="/images/favicon.ico">
|
||||
<title>cs224n 解答拾遗: 为何word2vec 训练的时候每个词有两个不同的向量 | FlyPython - 专业的Python学习社区</title>
|
||||
<link rel="stylesheet" href="/css/f25.css">
|
||||
<link rel="stylesheet" href="/css/highlight.css">
|
||||
|
||||
<link rel="stylesheet" href="/css/gitalk.css">
|
||||
|
||||
|
||||
<!-- Global site tag (gtag.js) - Google Analytics -->
|
||||
<script async src="https://www.googletagmanager.com/gtag/js?id=UA-147288599-1"></script>
|
||||
<script>
|
||||
window.dataLayer = window.dataLayer || [];
|
||||
function gtag(){dataLayer.push(arguments);}
|
||||
gtag('js', new Date());
|
||||
|
||||
gtag('config', 'UA-147288599-1');
|
||||
</script>
|
||||
|
||||
</head>
|
||||
</head>
|
||||
<body>
|
||||
<header class="wrapper header-wrapper">
|
||||
<div class="container header-nav-wrapper">
|
||||
<div class="logo"><a href="/" title="FlyPython - 专业的Python学习社区"><h1 class="title">FlyPython</h1></a></div>
|
||||
<nav class="nav-wrapper">
|
||||
|
||||
<a href="https://flypython.com/python" title="飞蟒微课堂">飞蟒微课堂</a>
|
||||
|
||||
<a href="https://flypython.com/flypython_daily" title="Python日报">Python日报</a>
|
||||
|
||||
<a href="https://flypython.com/PyCon/" title="PyCon">PyCon</a>
|
||||
|
||||
<a href="https://github.com/flypythoncom" title="Github">Github</a>
|
||||
|
||||
<a href="/article/about" title="关于">关于</a>
|
||||
|
||||
</nav>
|
||||
<span class="btn-menu" id="J_header_menu">
|
||||
<div class="inner">
|
||||
<span class="line line-01"></span>
|
||||
<span class="line line-02"></span>
|
||||
<span class="line line-03"></span>
|
||||
</div>
|
||||
</span>
|
||||
<div class="wrapper mb-nav-wrapper" id="J_header_menu_list">
|
||||
<nav class="wrapper mb-nav-container">
|
||||
|
||||
<a href="https://flypython.com/python" title="飞蟒微课堂">飞蟒微课堂</a>
|
||||
|
||||
<a href="https://flypython.com/flypython_daily" title="Python日报">Python日报</a>
|
||||
|
||||
<a href="https://flypython.com/PyCon/" title="PyCon">PyCon</a>
|
||||
|
||||
<a href="https://github.com/flypythoncom" title="Github">Github</a>
|
||||
|
||||
<a href="/article/about" title="关于">关于</a>
|
||||
|
||||
</nav>
|
||||
</div>
|
||||
</div>
|
||||
</header>
|
||||
<section class="body-wrapper">
|
||||
<section class="wrapper post-banner">
|
||||
<div class="container post-banner-container">
|
||||
<h2 class="wrapper title">cs224n 解答拾遗: 为何word2vec 训练的时候每个词有两个不同的向量</h2>
|
||||
<div class="wrapper tips">
|
||||
<span>Author:</span><span>flypython</span> | <span>Date: </span><span>2020-01-01</span> | <span>Category:</span><span><a href="/fly/自然语言处理/" title="自然语言处理">自然语言处理</a><a href="/fly/自然语言处理/cs224n/" title="cs224n">cs224n</a></span>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
<section class="wrapper main-wrapper">
|
||||
<article class="sub-container post-content">
|
||||
<h3 id="1-前言:为何一个词有2个向量"><a href="#1-前言:为何一个词有2个向量" class="headerlink" title="1.前言:为何一个词有2个向量"></a>1.前言:为何一个词有2个向量</h3><blockquote>
|
||||
<p>在cs224n 2019课程中的L1和L2中,我们详细论述了word2vec skip-gram 模型,相信大家都已经掌握了。</p>
|
||||
</blockquote>
|
||||
<p>但这里有一个细节,肯定让不少同学非常疑惑。<br>视频中提到,一个词有2个向量,总参数是2d*V。</p>
|
||||
<p><img src="https://raw.githubusercontent.com/jcjview/jcjview.github.io/master/img/2vec_20200114150901.png" alt></p>
|
||||
<p>(Lecture 01 Introduction and Word Vectors,ppt第23页)提到了use two vectors per word</p>
|
||||
<p><img src="https://raw.githubusercontent.com/jcjview/jcjview.github.io/master/img/2vec_20200114145552.png" alt></p>
|
||||
<p>(Lecture 01 Introduction and Word Vectors,ppt第27页)</p>
|
||||
<p>如上图示,在训练过程中是存在2个词向量的。这怎么理解?明明一个词最终输出的时候,只有一个词向量呀。</p>
|
||||
<h3 id="2-word2vec回顾"><a href="#2-word2vec回顾" class="headerlink" title="2.word2vec回顾"></a>2.word2vec回顾</h3><p>在视频中,Manning教授简单提了一嘴,我们这里详细说明一下,word2vec为什么要在训练的时候使用2个词向量,这2个词向量是什么关系,最终输出的是什么词向量。</p>
|
||||
<p>这2个词向量一个是词$w$的word representations ($v_w$) 在下图中表示为V,一个是该词w的context representations ($u_w$)在下图中表示为U。</p>
|
||||
<p><img src="https://raw.githubusercontent.com/jcjview/jcjview.github.io/master/img/2vec_20200114145755.png" alt></p>
|
||||
<p>(Lecture 02 Word Vectors 2 and Word Senses ppt第4页)</p>
|
||||
<p>在我们训练词向量的时候,对于训练语料,使得P(o|c)的概率最大,也就是<br>J=-logP(o|c)最小(如果没有负号则是求最大,cs224n视频的ppt就没有负号)。</p>
|
||||
<p>我们是怎么定义P(o|c)的呢?</p>
|
||||
<p>定义如下:</p>
|
||||
<p>$P(o|c)={exp(u_o^Tv_c)}/{\sum_{w=1}^V exp(u_w^Tv_c)}$</p>
|
||||
<p>可以看到,这个条件概率是中心词和上下文词2个词向量的点乘,再取e的指数,并除以所有词的词向量和中心词的点乘取e的指数之和,公式的本质是一个向量点乘+softmax。</p>
|
||||
<p>这里当w是o 中心词的时候,使用词的word representations,而w是c上下文词的时候,使用词的context representations。</p>
|
||||
<h3 id="3-问:为什么要这么做呢,为什么不都使用同一个词向量呢?"><a href="#3-问:为什么要这么做呢,为什么不都使用同一个词向量呢?" class="headerlink" title="3.问:为什么要这么做呢,为什么不都使用同一个词向量呢?"></a>3.问:为什么要这么做呢,为什么不都使用同一个词向量呢?</h3><p>答:使用2个词向量的目的是,训练时我们需要让具有相同上下文的词的词向量相互接近,但我们不希望这些词的上下文的词也相互接近。</p>
|
||||
<p>以下面2句话的例子讲解一下:</p>
|
||||
<p>the <strong>dog</strong> has a tail</p>
|
||||
<p>the <strong>cat</strong> has a tail</p>
|
||||
<p>这里采用word2vec skip-gram ,naive softmax 模型,窗口大小采用k=1。</p>
|
||||
<p>对于中心词dog来说,需要计算:</p>
|
||||
<p>P(dog|has) </p>
|
||||
<p>P(dog|the) </p>
|
||||
<p>对于中心词cat来说,需要计算:</p>
|
||||
<p>P(cat|has)</p>
|
||||
<p>P(cat|the)</p>
|
||||
<p>这里的条件概率公式是上面的:<br>$P(o|c)={exp(u_o^Tv_c)}/{\sum_{w=1}^V exp(u_w^Tv_c)}$</p>
|
||||
<p>分别把中心词dog/cat 和上下文词has/the带入,就可求出概率。<br>对于目标函数J=-logP(o|c),我们要求其最小值,那么在word2vec算法中,需要做梯度下降算法。这里使用sgd,只需要对每个样本进行梯度下降,如下公式:</p>
|
||||
<p>$u_o(new)=u_o(old)-\alpha \partial J/(\partial u_o)$</p>
|
||||
<p>$v_c(new)=v_c(old)-\alpha \partial J/(\partial v_c)$</p>
|
||||
<p>多次迭代sgd后,会使得J变小,而$u_o$和$v_c$的点乘会变大,进而使得两者余弦距离接近。那么如果dog/cat 分别和has/the 接近了,dog和cat的词余弦距离就会接近。</p>
|
||||
<p>假设这里$u_o$和$v_c$都是同一种向量,那么不光dog和cat的词余弦距离接近,cat和has也会相近,连has和the都会相近。</p>
|
||||
<p>但是我们希望最后输出的词向量,dog和cat相近,但不希望cat和has相近,也不希望has和the相近。如果都相近,词就没有区分度了。</p>
|
||||
<p>所以我们对于每个词采用双词向量,对于dog 这个词,有一个word representations ($v_c$) 用来作为中心词时计算,有一个context representations ($u_o$)作为上下文词时进行计算。</p>
|
||||
<p>context representations训练为了使得中心词word representations相近,context representations作为中间结果不输出,而word representations作为最终结果输出。这样就避免了所有中心词的词向量都接近的困境了。</p>
|
||||
<p>最终,我们保存中心词的word representations,既公式中的$v_c$ 。</p>
|
||||
<h3 id="4-考古"><a href="#4-考古" class="headerlink" title="4.考古"></a>4.考古</h3><p>解释清楚这个问题以后,我们知道最早的算法是否使用2个vec的。</p>
|
||||
<p>在word2vec提出者,Tomas Mikolov 的论文Distributed Representations of Words and Phrases and their Compositionality<br>和他提供的word2vec.c代码里,训练过程中,每个word都存在2个词向量。</p>
|
||||
<p>如下图<br><img src="https://raw.githubusercontent.com/jcjview/jcjview.github.io/master/img/2vec_20200114150350.png" alt><br>论文Distributed Representations of Words and Phrases and their Compositionality 中截图</p>
|
||||
<p><img src="https://raw.githubusercontent.com/jcjview/jcjview.github.io/master/img/2vec_20200114150726.png" alt><br>word2vec.c代码截图</p>
|
||||
<p>所以我们清楚了,最早的word2vec算法确实在训练的时候是存在2个vec的。</p>
|
||||
<h3 id="5-参考文献:"><a href="#5-参考文献:" class="headerlink" title="5.参考文献:"></a>5.参考文献:</h3><p><a href="https://www.quora.com/Why-does-word2vec-have-two-different-representation-for-words" target="_blank" rel="noopener">https://www.quora.com/Why-does-word2vec-have-two-different-representation-for-words</a></p>
|
||||
<p>论文:Distributed Representations of Words and Phrases<br>and their Compositionality</p>
|
||||
|
||||
</article>
|
||||
<div class="sub-container gitalk-wrapper" id="gitalk-container"></div>
|
||||
</section>
|
||||
|
||||
<div class="tips-top-wrapper">
|
||||
<span class="tip-top-container" onclick="scrollToWindowTop()">
|
||||
<span class="l-bar"></span>
|
||||
<span class="r-bar"></span>
|
||||
</span>
|
||||
</div>
|
||||
<footer class="wrapper footer-wrapper">
|
||||
<div class="container"><span class="copyright">© 2020 FlyPython . All Rights Reserved.</span></div>
|
||||
</footer>
|
||||
</section>
|
||||
<script src="/js/f25.js"></script>
|
||||
|
||||
<script src="/js/gitalk.min.js"></script>
|
||||
|
||||
<script>
|
||||
var gitalkAdmin = 'xxg1413'.split(',');
|
||||
var gitalk = new Gitalk({
|
||||
clientID: 'd0e566bfc45c0b852c6c',
|
||||
clientSecret: '6b69b3a841c85a6223e5a904c47f5e2d84322980',
|
||||
repo: 'gitalk',
|
||||
owner: 'flypythoncom',
|
||||
admin: gitalkAdmin,
|
||||
id: location.pathname.length > 50 ? location.pathname.substr(0,50) : location.pathname, // Ensure uniqueness and length less than 50
|
||||
distractionFreeMode: false // Facebook-like distraction free mode
|
||||
});
|
||||
gitalk.render('gitalk-container');
|
||||
</script>
|
||||
|
||||
|
||||
</body>
|
||||
</html>
|
||||
Reference in New Issue
Block a user