GAN 之父身份有爭議！「我比古德費洛早三年就想到了生成網路對抗GAN」

透過使用兩個神經網路的相互對抗，伊恩·古德費洛創造了一個強大的AI工具——生成對抗網路GAN。現在，該方法已經在機器學習領域產生了巨大的影響，也讓他的創造者古德費洛成為了人工智慧界的重要人物。GAN的誕生故事早已為技術圈所熟知，但是，產生這樣奇妙對抗想法的似乎不止伊恩·古德費洛一人。

2014年的一晚，伊恩·古德費洛（Ian Goodfellow）和一個剛剛畢業的博士生一起喝酒慶祝。在蒙特利爾一個酒吧，一些朋友希望他能幫忙看看手頭上一個棘手的項目：電腦如何自己產生圖片。

研究人員已經使用了神經網路（模擬人腦的神經元網路的一種演算法），作為產生模型來創造合理的新數據。但結果往往不盡人意。電腦產生的人臉圖像通常不是模糊不清，就是缺耳少鼻。

伊恩·古德費洛的朋友們所提出的方案是對那些組成圖片的元素進行複雜的統計分析以幫助機器自己產生圖片。這需要進行大量的資料運算，伊恩·古德費洛告訴他們這根本行不通。

邊喝啤酒邊思考問題時，他突然有了一個想法。如果讓兩個神經網路相互對抗會出現什麼結果呢？他的朋友對此持懷疑態度。

當他回到家，女朋友已經熟睡，他決定馬上實驗自己的想法。那天他一直寫程式碼寫到凌晨，然後進行測試。第一次運行就成功了！

那天晚上他提出的方法現在叫做GAN，即「生成對抗網路」（generative adversarial network）。

透過使用兩個神經網路的相互對抗，伊恩·古德費洛創造了一個強大的AI工具——生成對抗網路GAN。現在，該方法已經在機器學習領域產生了巨大的影響，也讓他的創造者古德費洛成為了人工智慧界的重要人物。

GAN的誕生故事早已為技術圈所熟知，但是，產生這樣奇妙對抗想法的似乎不止伊恩·古德費洛一人。

比如另一位機器學習領袖于爾根·施密德胡伯（Jurgen Schmidhuber）就聲稱早些時候已經做過類似的工作。在NIPS 2016上有的相關爭論。

今天，一篇2010年的部落格文章也在reddit上引發熱議。這是一篇非常簡短的文章，但是很精確的提出了GAN的基本想法，其中附帶的一張圖片更是直接表示出了GAN的部署方式。

這篇文章引發了大量討論，不少人覺得遺憾，稱，如果這位發文的人如果當時能更重視一下自己的這個想法，「他可能才會成為那個改變世界的人。」

當然，也有人表示，有這樣的想法很重要，但真的付諸實踐才行，並且，2010年的硬體條件或許也還無法支撐讓GAN大火的一些應用。甚至拿出來哥倫布發現新大陸的例子表示，「哥倫布可能是第一個發現者，但一定有很多人早就預言過『也許在大西洋有一些島嶼』？」

事實上，這篇部落格的作者Olli Niemitalo的心態其實比網友和平許多，Olli是位來自芬蘭的電器工程師，在2017年的一篇文章了，他敘述了自己在剛剛發現GAN的心路歷程：「2017年5月，我在YouTube看到了伊恩·古德費的相關教程，made my day！我之前寫下的只是一個基本的想法，他已經做了很多工作來使它取得良好的效果。這個演講回答了我曾經遇到過的問題以及更多問題。」

從這篇部落格作者的個人主頁可以看出，Olli本身也是位思維活躍並且樂於提出新想法的「寶藏男孩」，從2007年開始，他在部落格中記下了從「能唱歌的自行車剎車」到「永不遲到的手錶」等超多自己的想法，當然其中也包括了這個「GAN」的雛形。

正如古德費洛所說，「如果你有一個覺得可行的想法，也具有領域知識能夠認識到它切實有效，那麼你的想法才會真的價值。我提出GAN只花了大約1個小時，寫論文花了2個星期。這絕對是一個「99％靈感，1％汗水」的故事，但是在那之前我花了4年時間在相關主題上攻讀博士學位。」

最後，歡迎看看這個比古德費洛早三年提出的GAN的簡短想法。

A method for training artificial neural networks to generate missing data within a variable context. As the idea is hard to put in a single sentence, I will use an example:

An image may have missing pixels (let's say, under a smudge). How can one restore the missing pixels, knowing only the surrounding pixels? One approach would be a "generator" neural network that, given the surrounding pixels as input, generates the missing pixels.

But how to train such a network? One can't expect the network to exactly produce the missing pixels. Imagine, for example, that the missing data is a patch of grass. One could teach the network with a bunch of images of lawns, with portions removed. The teacher knows the data that is missing, and could score the network according to the root mean square difference (RMSD) between the generated patch of grass and the original data. The problem is that if the generator encounters an image that is not part of the training set, it would be impossible for the neural network to put all the leaves, especially in the middle of the patch, in exactly the right places. The lowest RMSD error would probably be achieved by the network filling the middle area of the patch with a solid color that is the average of the color of pixels in typical images of grass. If the network tried to generate grass that looks convincing to a human and as such fulfills its purpose, there would be an unfortunate penalty by the RMSD metric.

My idea is this (see figure below): Train simultaneously with the generator a classifier network that is given, in random or alternating sequence, generated and original data. The classifier then has to guess, in the context of the surrounding image context, whether the input is original (1) or generated (0). The generator network is simultaneously trying to get a high score (1) from the classifier. The outcome, hopefully, is that both networks start out really simple, and progress towards generating and recognizing more and more advanced features, approaching and possibly defeating human's ability to discern between the generated data and the original. If multiple training samples are considered for each score, then RMSD is the correct error metric to use, as this will encourage the classifier network to output probabilities.