数据分析揭秘京东IPO各项财务数据

Sina WeiboBaiduLinkedInQQGoogle+RedditEvernote分享

(2个打分, 平均:4.50 / 5)

谷歌 。机器学习 。数据中心

Google is using machine learning and artificial intelligence to wring even more efficiency out of its mighty data centers.

In a presentation today at Data Centers Europe 2014, Google’s Joe Kava said the company has begun using a neural network to analyze the oceans of data it collects about its server farms and to recommend ways to improve them. Kava is the Internet giant’s vice president of data centers.

In effect, Google has built a computer that knows more about its data centers than even the company’s engineers. The humans remain in charge, but Kava said the use of neural networks will allow Google to reach new frontiers in efficiency in its server farms, moving beyond what its engineers can see and analyze.

Google already operates some of the most efficient data centers on earth. Using artificial intelligence will allow Google to peer into the future and model how its data centers will perform in thousands of scenarios.

In early usage, the neural network has been able to predict Google’s Power Usage Effectiveness with 99.6 percent accuracy. Its recommendations have led to efficiency gains that appear small, but can lead to major cost savings when applied across a data center housing tens of thousands of servers.

Why turn to machine learning and neural networks? The primary reason is the growing complexity of data centers, a challenge for Google, which uses sensors to collect hundreds of millions of data points about its infrastructure and its energy use.

“In a dynamic environment like a data center, it can be difficult for humans to see how all of the variables interact with each other,” said Kava. “We’ve been at this (data center optimization) for a long time. All of the obvious best practices have already been implemented, and you really have to look beyond that.”

Enter Google’s ‘Boy Genius’

Google’s neural network was created by Jim Gao, an engineer whose colleagues have given him the nickname “Boy Genius” for his prowess analyzing large datasets. Gao had been doing cooling analysis using computational fluid dynamics, which uses monitoring data to create a 3D model of airflow within a server room.

Gao thought it was possible to create a model that tracks a broader set of variables, including IT load, weather conditions, and the operations of the cooling towers, water pumps and heat exchangers that keep Google’s servers cool.

“One thing computers are good at is seeing the underlying story in the data, so Jim took the information we gather in the course of our daily operations and ran it through a model to help make sense of complex interactions that his team – being mere mortals – may not otherwise have noticed,” Kava said in a blog post. “After some trial and error, Jim’s models are now 99.6 percent accurate in predicting PUE. This means he can use the models to come up with new ways to squeeze more efficiency out of our operations. ”

Accuracy-PUE-predictions-47

A graph showing how the projections by Google’s neural network tool aligned with actual PUE readings. Click for larger image.

How it Works

Gao began working on the machine learning initiative as a “20 percent project,” a Google tradition of allowing employees to spend a chunk of their work time exploring innovations beyond their specific work duties. Gao wasn’t yet an expert in artificial intelligence. To learn the fine points of machine learning, he took a course from Stanford University Professor Andrew Ng.

Neural networks mimic how the human brain works, allowing computers to adapt and “learn” tasks without being explicitly programmed for them. Google’s search engine is often cited as an example of this type of machine learning, which is also a key research focus at the company.

“The model is nothing more than series of differential calculus equations,” Kava explained. “But you need to understand the math. The model begins to learn about the interactions between these variables.”

Gao’s first task was crunching the numbers to identify the factors that had the largest impact on energy efficiency of Google’s data centers, as measured by PUE. He narrowed the list down to 19 variables and then designed the neural network, a machine learning system that can analyze large datasets to recognize patterns.

“The sheer number of possible equipment combinations and their setpoint values makes it difficult to determine where the optimal efficiency lies,” Gao writes in the white paper on his initiative. “In a live DC, it is possible to meet the target setpoints through many possible combinations of hardware (mechanical and electrical equipment) and software (control strategies and setpoints). Testing each and every feature combination to maximize efficiency would be unfeasible given time constraints, frequent fluctuations in the IT load and weather conditions, as well as the need to maintain a stable DC environment.”

Machine-Learning-Visual-470

This diagram illustrates the complexity of analyzing the many variables that factor into a data center PUE calculation, which can be more closely analyzed using artifical intelligence.

Runs On a Single Server

As for hardware, the machine learning doesn’t require unusual computing horsepower, according to Kava, who says it runs on a single server and could even work on a high-end desktop.

The system was put to work inside several Google data centers. The machine learning tool was able to suggest several changes that yield incremental improvements in PUE, including refinements in data center load migrations during upgrades of power infrastructure, and small changes in the water temperature across several components of the chiller system.

“Actual testing on Google (data centers) indicates that machine learning is an effective method of using existing sensor data to model DC energy efficiency and can yield significant cost savings,” Gao writes.

The Machines Aren’t Taking Over

Kava said that the tool may help Google run simulations and refine future designs. But not to worry — Google’s data centers won’t become self-aware anytime soon. While the company is keen on automation, and has recently been acquiring robotics firms, the new machine learning tools won’t be taking over the management of any of its data centers.

“You still need humans to make good judgments about these things,” said Kava. “I still want our engineers to review the recommendations.”

The neural networks’ biggest benefits may be seen in the way Google builds its server farms in years to come. “I can envision using this during the data center design cycle,” said Kava. “You can use it as a forward-looking tool to test design changes and innovations. I know that we’re going to find more use cases.”

Google is sharing its approach to machine learning in Gao’swhite paper, in the hopes that other hyperscale data center operators may be able to develop similar tools.

“This isn’t something that only Google or only Jim Gao can do,” said Kava. “I would love to see this type of analysis tool used more widely. I think the industry can benefit from it. It’s a great tool for being as efficient as possible.”

(没有打分)

NTT Uses NFV to Power Services-On-Demand

ntt managed-security-services virtela nfv sdn

We now have a better idea of why NTT Communications acquired Virtela Technology Services. It wasn’t just for scale; it was for the ability to offer the on-demand services that software-defined networking (SDN) has been promising.

In a press conference webcast from Japan at 1:00 a.m. (matching more reasonable times-of-day for North America and Europe), NTT claimed it’s the first service provider to offer cloud-based services that users can activate almost instantly.

Vendors and carriers alike have talked dreamily about on-demand services as an byproduct of SDN and network functions virtualization (NFV). Even if NTT isn’t the very first to offer on-demand services, it’s almost certainly the largest, as the services will be available in all 190 countries that NTT’s network reaches.

While NTT was quick to jump on OpenFlow and has worked closely with NEC on SDN technology, the services announced today are being powered by Virtela.

NTT/Virtela is “using an in-house NFV and SDN platform that includes the [in-house] controller” to provide these services, said Vab Goel, Virtela’s founder, during the press conference. He didn’t specify whether OpenFlow is involved.

SDN got some mention during the press conference, but NFV seems to have more of a star role in delivering these services. (Executives gave little detail as to how it’s all done.) The new services are made possible by the infusion of NFV in 50 cloud networking centers worldwide, said Takashi Ooi, NTT’s vice president of enterprise network services.

The on-demand services, being offered under NTT’s Arcstar Universal one brand name, include things like firewalls, WAN acceleration, and SSL and IPsec VPNs. Ooi seemed to indicate the services are available now, although the press release mentions July availability.

The services will be billed on a per-use basis, meaning no long-term contracts are required. Ooi and Goel played up this aspect, because it’s a different way of doing business (and one that enterprises have craved for some time, by the sound of it).

NTT announced the $525 million acquisition of Virtela, a provider of cloud-based managed services, in October and closed the deal in January. It’s one of a couple of recent deals that expand NTT’s presence in North America; the company also paid $350 million in February to take an 80 percent interest in Sacramento-based RagingWire Data Centers.

(没有打分)

25个值得关注的云计算, 安全和移动初创公司

Network World — There’s no scientific formula behind this list: It’s just a bunch of new-ish, mainly enterprise-focused computing and networking companies that have launched, received fresh funding of late or otherwise popped onto my radar screen.

I’ll give you a brief description of the companies, then links to their respective websites or to stories about them so that you can explore further. Don’t read anything into the order in which these companies are presented either: It’s basically the order in which I came across them or their latest news. (IDG News Service and Network World staff reporting is included in this article.)

*Tactus: Appearing and disappearing tactile buttons for your otherwise flatscreen tablet or smartphone. Tactus, which got $6 million in Series A funding in December of 2011, announced a mysterious Series B round in January 2014: Weirdly, no financial details disclosed. One founder led JDSU’s Optical Communications Division, and the other led development efforts for microfluidic-based programmable transdermal drug delivery systems and advanced optical sensors at Los Gatos Research.

*Confide: Quickly became known as Snapchat for professionals not that Facebook has offered $3 billion for it — because like the popular photo-sharing app Confide is designed for highly secure messaging. A Confide messages, on sensitive topics like job inquiries or work conspiracies, disappear upon being read.

*ClearSky Data: This Boston-based startup is enjoying calling itself a stealthy venture on its Twitter account. The Boston Globe has reported that ClearSky’s founders previously helped get tech startups CloudSwitch and EqualLogic off the ground. Highland Capital, which along with General Catalyst invested a combined $12 million in ClearSky, describes the newcomer as “working on solving an enterprise infrastructure problem for medium and large enterprises.”

*Aorato: This Israeli company, which has received $10 million in funding from Accel Partners and others, calls its offering a firewall designed to protect Microsoft Active Directory shops. A pair of brothers are among the company’s founders.

*Nextbit: Rock Star alert! Oh, technical rock stars, from Google, Amazon, Dropbox and Apple, according to the company’s skimpy website. The San Francisco company is focused on mobile something or other possibly some sort of mobile OS reinvention — and has received $18 million in funding from Accel and Google.

*Confer: This Waltham, Mass., startup with perhaps the most boring name on this list hopes to make a name for itself nonetheless with software and services aimed at sniffing out malware and attackers targeting enterprise servers, laptops and mobile devices through its application behavior-analysis approach and its cloud-managed threat-intelligence platform.

 

*Bluebox: The latest in a long line of “Blue” IT companies (Blue Coat, BlueCat, Blue Jeans, Blue Prism, etc.), this startup has received more than $27 million in venture funding to back its mobile security technology. Its “data-wrapping” technology for Apple iOS and Android is designed to give IT control over enterprise apps but leave personal apps…personal.

*Viddme: No sign-up video posting and sharing app, which works on mobile devices and the desktop. Simpler than YouTube. Development team is out of Los Angeles.

*MemSQL: This Big Data startup formed by a couple of ex-Facebook engineers raised $45 million through January. This database company’s technology is designed to run on commodity hardware but handle high-volume apps.

阅读全文»

(没有打分)

APT1:揭秘一支中国网络间谍部队 (PDF)

(2个打分, 平均:5.00 / 5)

美国名校计算机教授的背景分析

[原文科参阅:http://jeffhuang.com/computer_science_professors.html ]

Analysis of Over 2,000 Computer Science Professors at Top Universities

The Educational Backgrounds of Professors and the Composition and Hiring Trends of US Universities

As part of a class assignment in my human-computer interaction seminar, students used crowdsourcing to collect information about the computer science professors from 5 universities each. This information comprised the names, institution, degrees obtained, and when they joined the university, for professors in the traditional role that involves both research and teaching. The data excludes lecturers, professors of practice, clinical, adjunct, affiliate, or research professors; only because we were constrained by time and resources. My Ph.D. student Alexandra Papoutsaki worked with a handful of students in the course to correct, normalize, and merge the data, and has posted it along with some descriptive information.

The posted data includes 51 top universities in the United States and is already useful for students planning to apply to graduate schools, but there are some interesting insights that we can still draw from a more aggregate analysis. This analysis is meant to supplement the data and Alexandra’s report, and looks more at:

  1. Composition of Computer Science Departments
  2. Hiring Trends of Universities
  3. Educational Background of Professors

Composition of Computer Science Departments

One way of looking at department composition is the ranks of professors currently in the department. Having more assistant or associate professors indicates recent hiring, while having nearly all full professors usually means the department has kept the same professors for a while, or makes mostly senior hires. Here are the numbers for the universities in our data, sorted in descender order by percentage of full professors.

The data includes the primary subfield a professor does research in, and it might be interesting to see how the research subfields are represented in each department. The subfields are chosen based on a taxonomy from Microsoft Academic Search, but while we do have the numbers for each subfield, they are too small to show trends so I categorized them into broader research areas:

  • Theory: algorithms, theory, machine learning
  • Systems: architecture, networking, systems, databases, programming languages
  • Informatics: artificial intelligence, data mining, human-computer interaction, computer education, information retrieval, vision, graphics, security/privacy, multimedia
  • Scientific: computational biology or scientific computing

Informatics is a bit of a catch-all, but they are subfields that are somewhat more applied towards directly solving everyday human problems, whereas Systems is about improving computing itself, and Scientific is towards the goal of scientific progress. Obviously you may disagree with the categorization like some “Security & Privacy” folks lean more towards Theory, but I used my best judgment and you are welcome to perform your own analysis if you disagree with mine.

The research areas are illustrated in the following table, sorted by proportion of professors in Theory:

It’s clear that there are some universities with more emphasis on theory, and others that lean more towards informatics or systems. But one surprise is that 9 of the 10 most theory-heavy universities are private universities, whereas 9 of the 10 least theory-heavy universities are public. That is quite a contrast, and I don’t have an explanation for this. Update: Marc Snir from the University of Illinois sent an email to explain, “Public universities, especially land-grant universities, are more engineering-oriented. They were founded to promote the practical arts of agriculture and machinery. Private universities, especially the ivy leagues, are more science oriented.”

Hiring Trends of Universities

We can also see when the research areas were hired by combining that field with the “join year” plotted on the x-axis. Here are the trends from 1990 to 2014 (2014 hiring is grossly underreported since it is still ongoing).

One thing that is clear is that there is growth in computer science, mainly in Systems and Informatics in the past couple years. It looks like these 51 universities have been hiring about 100 professors per year (although this includes transfers). I would love to see the trends more clearly but we are missing join years for many professors; this information is sometimes quite difficult to find.

Now which universities have been doing the hiring? Here are the number of computer science professor hires a university has made in the last 7 years, and what percentage of their professors are recent hires.

It should be expected that larger computer science programs top the list and between 20-42% of their professors are recent hires.

Do universities hire their own Ph.D. students? Some computer science departments engage in the practice of “self-hires” frequently, in fact 39% of the professors at MIT received their degree there. Of course, the data here doesn’t differentiate from those who become professors right after receiving their Ph.D., and those who left to other institutions and returned later.

Educational Background of Professors

It is possible to become a Computer Science Professor without a Doctorate degree. But 99.72% of professors in the data have a Ph.D., so you may be at a slight disadvantage if your highest degree is a Masters. The data provides us a look into what universities a professor received their education. Where did they get their bachelors and doctorate degrees? If a graduate from a university is listed in spreadsheet, I call it a “placement”.

The below table shows universities whose graduates make up at least 1% of professors on the list. Over 20% of professors received their Ph.D. from MIT or Berkeley, while more than half of professors received their Ph.D. from the 10 universities. Note how closely the ranking of total placements from the Ph.D. granting institution matches the US News Computer Science Rankings.

However, it’s obvious that some departments are smaller and thus tend to graduate fewer doctoral students. The fourth column (placement ratio) shows the number of Ph.D. students that go on to become a professor normalized by the size of the department. In other words, the average number of a professor’s Ph.D. graduates who become professors.

For bachelors degrees our data is less complete, but for the professor who we have data, here is where they completed their undergraduate education:

It’s no surprise that there is more diversity in undergraduate institutions than in Ph.D. programs. In fact, there are several institutions in India (IITs) and one from China (Tsinghua University) whose graduates successfully become a professor in the United States.

An aside: should you stay or should you go? Students are sometimes discouraged from obtaining their bachelors and doctorate degrees from the same institution because it may limit a students’ diversity of experiences, but the data shows that many still end up as professors: 230 (13.6%) stay at the university for both degrees, while 1,456 (86.4%) leave elsewhere for their Ph.D. The practice of staying occurs most commonly at MIT.

Final Note

Please understand that much of the data is generated by anonymous crowd workers, and is inaccurate and incomplete. Only the columns “name”, “university”, “subfield”, “rank”, and “doctorate” are complete for all professors. we are aware of these problems and you may help us improve the data by tagging comments on the spreadsheet cells (we have addressed hundreds of errors already).

We would love to add more universities, more columns in the data, include other faculty like research professors, but simply do not have the time and resources. So if you would like to see those in the data and have spare time, we would be thrilled to receive your contribution.

These figures were generated from data from the spreadsheet that was recent as of May 24, 2014.

 

(3个打分, 平均:5.00 / 5)

中国互联网研究中心 。《美国全球监听行动记录》(全 。PDF)

(2个打分, 平均:5.00 / 5)

GFT你这么diao,你的伪粉丝们造吗(4)

GFT诞生背景与应用基础

副标题:愿科技进步惠及每一个人

 

这是一个给GFT舔脚的系列。前面三篇文章从方法论的角度解剖GFT,本文作为这个系列的终结,为达到“划上圆满句号”的华丽特效,将从(伪)“形而上”的高度俯视GFT,品味一个高科技产品的人文情怀。

 

作为一个自由主义者,笔者拥护“个人自治权”,认为理想的社会组织应能够确保每个个体的行为不受限制,正当的“灵魂”运行模式应是由内而外的感受“什么是我想做的”,而非由外而内的将“什么是需要我做的”强加到个体身上。

在这样的指导思想下,本文首先用明确的语言表述GFT团队想做的事,其次证明这样一件事是被“需要”的(上纲上线[狂汗],笔者很无奈),再后是这件事的“可行性”,最后有关GFT的实现过程前面三篇文章已介绍过。这个逻辑顺序自然是笔者主观臆测。笔者只是按照自己的情感倾向,在孤立的环节间建立关联。然而作为一个理想主义者,笔者愿意为营造这样美好的未来而奔走。

言归正传。

 

1. What is it?

心理学上有名的five stages of grief中蕴含着一个很有趣的现象。五个阶段分别是[2]:

1)         “否认”:“不会吧,不可能会是这样。我感觉没什么事啊。”

2)         “愤怒”:“干嘛啊,这不公平!这怎么可能让人接受!”

3)         “讨价还价”:“让我活着看到我的儿子毕业就好。求你了,再给我几年时间。我什么都愿意做。”

4)         “抑郁”:“唉,干嘛还要管这些事啊?反正我都要死了。也没什么意义了。”

5)         “接受”:“我没问题的。既然我已经没法改变这件事了,我就好好准备吧。”

心理学家用这五个阶段描述人们经历哀伤或失去的过程。“有趣”的地方在于,“接受”是最后一个环节。在前面四个阶段,人们拒绝正确地认识现实,对“客观”也抱持着偏激的态度。我们可以从中学到的是,在到达“正确”之前,必然会受到“干扰项”的蛊惑。为避免弯路和不脚踏实地的态度,“明确定义”是第一步。而将模棱两可的想法实实在在的写下来的过程最有助于厘清思路。

简单粗暴地说,GFT analyze large numbers of Google search queries to track influenza-like illness in a population [1].如果将GFT比作奶牛,它吃进去的草是Google服务器日志中记录的用户提交的检索词,它挤出来的奶是人口中的流感患病情况。

单看效用(理想情况下),它可以向地球人提供每个区域的人口中流感(后来又增加了登革热)患病比例。为追求更高的准确率,需满足的前提是(区域)人口中包含足够数量的Google用户(这其中又隐含了互联网基础设施建设、公民教育普及、人权实现率等,这些都是后话)。

 

2. Why is it?

运用大数据技术(吐槽:这个说法太模糊了)可以做很多改变世界、造福人类的事,比如运用贝叶斯概率模型结合random walk theory预测股市波动(准确率还蛮高,隐约记得高于80%——但这个概率在实际应用中对应的具体损失有待具体分析),可观的利润自然是池中之物,探囊可取。为什么选择预测流感发病情况呢?GFT团队是这样理解的:

(1)流感是人类个体遭受的苦难,减少社会累积财富。Seasonal influenza epidemics are a major public health concern, causing tens of millions of respiratory illnesses and 250,000 to 500,000 deaths worldwide each year. In addition to seasonal influenza, a new strain of influenza virus against which no previous immunity exists and that demonstrates human-to-human transmission could results in a pandemic with millions of fatalities.

(2)提前判断疫情有利于降低损失。Early detection of disease activity, when followed by a rapid response, can reduce the impact of both seasonal and pandemic influenza.

在前面的文章中提到过,在这几句类似宣言的外国文字的字里行间,笔者清晰地看见了三个代表中“代表最广大人民群众的根本利益”(笔者政治觉悟不高,如果理解的偏差太大,还请看官们多多包涵),也有与美国英雄们“拯救人类”的行为等同的崇高感。

整个思维过程既脚踏实地、接地气(“流感面前人人平等”),也充斥着人间大爱。GFT向人类世界输出每个区域的疫情严重情况,接下来各个层次的决策者就可以考虑资源分配问题了。

 

写这一部分的目的就是想传递出:GFT并不是一群人无所事事,以圈钱为最终目的和唯一考虑。它的人文主义情怀(对世界的责任感)下文还有提及。

 

3. How it’s done?

改善公共卫生状况的途径有很多。GFT拥有大量存储在服务器日志中的检索词(与flu或有关或无关),从Google Search queries到CDC ILI data,是否有可能?GFT团队可能是这样下定决心的:

(1)皮尤研究中心2006年发布的一项关于互联网的调查显示,每年有9千万美国成年人在互联网上检索与疾病或用药有关的信息。

(2)在文章[1]的补充资料部分,Google提供了一个其他主题的例子,证明特定检索词的数量变化与现实生活中某些“事件”的发生有关联。如下图所示,曲线表示有关日食的检索词的数量随时间的变化(按周汇总数据),黑点对应日食发生时间。清晰可见,每次日食出现的日期附近,检索词“solar eclipse”的提交数量都明显高出日常水平。也就是说,倘若我们无从得知日食何时会出现,监控Google用户检索“solar eclipse”的次数可以给我们提供很准确的预测。

文献来源:参考文献[1]的supplementary

上述事实不仅能够证明,特定检索词的数量变化与客观世界中的一些事件是“相关的”(大数据所强调的相关关系哦),也暗示了特征检索词数量变化能够反映流感患病的门诊数量(在总人口中的比例)的潜在可能性。准备工作进行到这里,只剩下谨遵Michael Jordan的教诲:JUST DO IT。

 

总结:

伟大的工作其实一直都有人在做,对公共卫生基础水平的关注并非GFT之始。比如我们熟悉的这个星球上最大的慈善基金会Bill & Melinda Gates Foundation。其下设专门的division Global Health关注和推动有关传染病、疫苗等的科学和技术项目。相比于这些“传统”的做法,GFT的优势是:快。在参考文献[1]中我们知道,GFT比CDC的统计提前2周,即实时。而它若是结合时序分析的技术,实现“提前”预警也不足为奇。这对于资源准备和分配、提前疏散(隔离)等决策的指导意义不言自明。这也难怪很多public health officials都是GFT的用户。(参考文献[3]:In addition  to the general public, an important target audience for GFT has been public health officials, who can benefit from reliable daily estimates and often make far-reaching decisions based on predicted flu incidence (such as how to stock and distribute vaccine, and the content of public health messaging).

God is a bitch,祂能力有限或心怀不轨,坐视整个人类社会千疮百孔。GFT的内生性不完美略表如下:

1. GFT整个技术体系是大数据,预测的目的却是传统的统计数据。预测的“不准”是因为GFT的结果与“现实”不一致,还是因为CDC根本就代表不了“客观的显示”?(这对于强迫症、完美主义来说,简直闹心死了。)

2. 上一篇文章大言不惭地说,GFT的核心矛盾是无上限的优化统计学习过程的需要。不假。但也还有其他。“联系”的观点,GFT并非独立存在,GFT的问题也并不单一。伴随着Google的推广普及,甚至人类社会的进步(更多的互联网基础设施,更普及的基础教育,基本人权的基本保障等等,GFT会越来越完美,也或许到那个时候,GFT也便不再被需要。

 

相关文献:

[1] Ginsberg J, Mohebbi M H, Patel R S, et al. Detecting influenza epidemics using search engine query data[J]. Nature, 2009, 457(7232): 1012-1014.

[2] http://zh.wikipedia.org/wiki/%E5%BA%93%E4%BC%AF%E5%8B%92-%E7%BD%97%E4%B8%9D%E6%A8%A1%E5%9E%8B

[3] Copeland P, et al. Google Disease Trends: an Update. International Society for Neglected Tropical Diseases. 2013. available at: http://patrickcopeland.org/papers/isntd.pdf

(没有打分)

解密:美国全球监听行动纪录 (全文下载)

美国全球监听行动纪录

互联网新闻研究中心

2014年5月26日

目录

导言

一、美国在全球范围广泛从事秘密监听

二、美国把中国当成秘密监听的主要目标

三、美国秘密监听不择手段

四、美国全球监听受到广泛批评

导言

2013年6月,英国、美国和中国香港媒体相继根据美国国家安全局前雇员爱德华·斯诺登提供的文件,报道了美国国家安全局代号为“棱镜”的秘密项目,内容触目惊心。中国有关部门经过了几个月的查证,发现针对中国的窃密行为的内容基本属实。

作为超级大国,美国利用自己在政治、经济、军事和技术等领域的霸权,肆无忌惮地对包括盟友在内的其他国家进行监听,这种行为的实质早已超出了“反恐”的需要,显示出其为了利益完全不讲道义的丑陋一面。这种行为悍然违反国际法,严重侵犯人权,危害全球网络安全,应当受到全世界的共同抵制和谴责。

美国对全球和中国进行秘密监听的行径包括:

—每天收集全球各地近50亿条移动电话纪录。

—窥探德国现任总理默克尔手机长达十多年。

—秘密侵入雅虎、谷歌在各国数据中心之间的主要通信网络,窃取了数以亿计的用户信息。

—多年来一直监控手机应用程序,抓取个人数据。

—针对中国进行大规模网络进攻,并把中国领导人和华为公司列为目标。

美国的监听行动,涉及到中国政府和领导人、中资企业、科研机构、普通网民、广大手机用户等等。中国坚持走和平发展道路,没有任何理由成为美国打着“反恐”旗号进行的秘密监听的目标。

美国必须就其监听行动作出解释,必须停止这种严重侵犯人权的行为,停止在全球网络空间制造紧张和敌意。

一、美国在全球范围广泛从事秘密监听

1.美国监听世界政要

2013年底,英国《卫报》报道,包含联合国秘书长潘基文、德国总理默克尔、巴西总统罗塞夫等多达35国领导人都出现在美国国家安全局的监听名单上。

德国《明镜》周刊今年3月29日援引斯诺登提供的文件披露,美国国家安全局2009年针对122名外国领导人实施监控,并建有一个专门存放外国领导人信息的数据库,其中关于德国总理默克尔的报告就有300份。名单从“A”开始,按每人名字的首字母顺序排列,第一位是时任马来西亚总理阿卜杜拉·巴达维,默克尔排在“A”区的第九位。122人名单的最后一位是尤利娅·季莫申科,时任乌克兰总理。

德国“明镜在线”报道,联合国总部、欧盟常驻联合国代表团都在美国国家安全局的监听范围之内,监听内容涉及政治、经济、商业等领域。

美国国家安全局2012年夏季成功侵入了联合国总部的内部视频电话会议设备,并破解了加密系统。被曝光的秘密文件说,“数据传输给我们送来了联合国内部视频电话会议。”

美国《纽约时报》报道,2010年5月,当联合国安理会考虑是否要因为伊朗的核计划而制裁该国的时候,数个理事国的投票意向悬而未决。当时的美国驻联合国大使苏珊·赖斯请求美国国家安全局的协助,“以便她制定应对策略”。美国国家安全局很快起草出了监控四个理事国外交官所需的法律文件。

根据斯诺登曝光的文件,美国国家安全局已经渗透的使馆与使团名单包括巴西、保加利亚、哥伦比亚、欧盟、法国、格鲁吉亚、希腊、印度、意大利、日本、墨西哥、斯洛文尼亚、南非、韩国、委内瑞拉和越南。

除了联合国总部,欧盟和国际原子能机构的IT基础设施和服务器信息已被美国掌握。欧盟驻联合国机构变换了办公地点,搬进新的办公室后,美国仍在继续其窃听行为。

斯诺登提供给英国《卫报》的一份文件显示,美方设于英国北约克郡一处情报分支机构在2009年的20国集团峰会上监听俄罗斯时任总统梅德韦杰夫与国内的卫星通话。这次监听的时间是梅德韦杰夫与美国总统奥巴马举行会谈后数小时,两人在会谈中刚刚就建立互信达成共识。

一份日期标注为2012年6月的机密文件显示,当时还是墨西哥总统候选人的培尼亚的电子邮件曾被美国国家安全局秘密窃取,其内容包括了培尼亚准备提名的部分内阁成员等。美国国家安全局在窃取巴西总统罗塞夫的通讯资料时,使用了一种特殊的电脑程序,可以拦截电子邮件及网络聊天的内容。

在2007年联合国气候变化大会上,澳大利亚情报机构国防情报局同美国国家安全局对印尼开展了大规模的监听活动。

2010年6月G20峰会在多伦多召开时,美国国家安全局进行了为期六天的间谍活动,美国驻加拿大使馆当时则变身为安全指挥部。

曝光文件还显示,日本与巴西、伊拉克被共同列为美国“经济稳定与影响”领域的重点监控国。此外,“最新战略科学技术”领域的重点监控对象包括俄罗斯、印度、德国、法国、韩国、以色列、新加坡、瑞典及日本;“外交政策”领域包括中国、德国、法国、俄罗斯、伊朗、朝鲜及日本等17国及联合国。

《纽约时报》总结说,美国国家安全局之所以“敌友不分地进行日常监控”,是为达成“对法德等同盟国的外交优先地位”和“对日本及巴西的经济优先地位”。

2.美国监控全球民众

美国专门监控互联网的项目非常庞大,可以监控某个目标网民的几乎所有互联网活动。英国《卫报》披露,美国情报人员利用名为“XKeyscore”的项目监控互联网活动。该项目在全球多处配备500个服务器。这家报纸评价其是美国国家安全局“最庞大”监控项目,称情报人员“可以监控某个目标网民的几乎所有互联网活动”。

斯诺登曝光的文件显示,美国国家安全局通过接入全球移动网络,每天收集全球高达近50亿份手机通话的位置纪录,并汇聚成庞大数据库。美国国家安全局大规模搜集全球手机短信息,每天收集大约20亿条。

一些美国媒体认为,美情报机构对嫌疑人相关电话进行窃听以掌握情报并非新闻,但涉及国外如此海量信息的收集相当不可思议。

据《华盛顿邮报》报道,美国国家安全局曾秘密侵入雅虎、谷歌在各国数据中心之间的主要通信网络,窃取了数以亿计的用户信息,并且保留了大量数据。通过分析这些数据,美国国家安全局可以获悉这些通讯纪录的发出者、接受者以及双方通讯的时间、地点等信息。

巴西网站Fantastico报道称,美国国家安全局采用MITM攻击方式,通过虚假的安全认证伪装成合法网站,以绕过浏览器的安全防护并截取用户数据。美国国家安全局曾通过这一方式伪装成谷歌网站成功获取用户数据。

英国《卫报》披露美国国家安全局与以色列共享原始监听数据,且可能包括美国公民的邮件和其他数据,而此前美国总统奥巴马坚称不会把监听目标锁定在美国公民身上。

2013年12月31日,德国《明镜》周刊消息称,美国国家安全局非法获取欧洲和亚洲之间最大的海底通讯电缆网络—SEA-ME-WE-4的数据,取得大量敏感资料,并计划继续监听其他海底通讯电缆。

法国《世界报》报道称,美国国家安全局曾在2012年12月10日至2013年1月8日期间,监听法国民众7030万通电话交谈。

苹果和安卓手机操作系统在美国国家安全局内部被称作“数据资源的金矿”,美英情报部门2007年就已合作监控手机应用程序,美国国家安全局一度将这方面的预算从2.04亿美元追加到7.67亿美元。

据英国《卫报》、美国《纽约时报》报道,美国国家安全局多年来一直从移动设备应用程序(App)中抓取个人数据,包括个人用户的位置数据(基于GPS)、种族、年龄和其他个人资料。这些应用程序包括手机游戏“愤怒的小鸟”、应用程序“谷歌地图”以及“脸谱”、推特和网络相册Flickr的手机客户端。

美国国家安全局至少于2008年起,向全球近10万台计算机植入专门软件,旨在时刻监控或攻击目标计算机,即使计算机没有连接上网,美国国家安全局仍可通过无线电波入侵。

自2010年起,美国国家安全局用收集到的资料,分析部分美国公民的“社交连结,辨识他们来往对象、某个特定时间的所在地点、与谁出游等私人信息”。

美国国家安全局所有的监控行为都是暗中操作,而政府也是秘密决定放开对监控的限制,并未通过国家情报法院的审定或者公开的讨论。根据2006年美国司法部的备忘录,该部曾对滥用情报监控进行过警告。

通过一项名为“共同旅行分析”的项目,美国国家安全局在收集“目标人物”相关信息纪录基础上,通过已知“目标人物”的活动发现其未知社会联系,并在一小时内在海量信息中得出“目标人物”活动时间、地点等完整情报。与“目标人物”有过联系的人将可能成为美国国家安全局新的“目标人物”。

美国官员辩称,这一大规模监听活动是合法的,不针对美国国内民众,但事实上,被窃听者包括许多到国外旅行的美国人。美国媒体报道说,美国国家安全局于2010年和2011年进行了一项有关大规模收集美国国内移动电话位置的试验项目。

2013年4月,联合国人权理事会言论自由问题特别报告员拉卢在向联合国人权理事会提交的报告中指出,美国修订“外国情报监控修正案法”,扩大美国政府对境外非美籍人士进行监控的权力,监控内容包括任何利用美国的云服务主机进行的通信。

曝光文件显示德国、韩国、日本等多国被大规模监听,美欧国家的情报机构正在联手对互联网和电话通信展开大规模监控,严重威胁世界各国网络安全。

挪威媒体报道说,挪威也是美国“监控门”的受害者,美国国家安全局曾在2012年12月10日至2013年1月8日间监听了3300多万次在挪威本土登记注册的移动电话通话。

根据意大利《快讯》周刊的报道,英国和美国情报机构大规模窃听意大利的电话和拦截网络数据。

3.监控外国企业

美国政府攻击的商业网络不仅包括互联网,还涉及金融、交通、电力、教育等诸多关系国计民生的关键行业。

斯诺登披露的文件显示,美国国家安全局开展的大规模监听行动不仅包括世界各国领导人,还包括众多国际组织和商业领袖。

据德国《明镜》周刊报道,美国国家安全局的监视项目包括国际间的金融交易,尤其是信用卡交易。全球知名的信用卡品牌维萨公司和总部设在布鲁塞尔的环球银行金融电信协会均在其监视范围之内。

一项名为“追踪金钱”的监视项目专门关注国际上银行金融交易往来。按照美国国家安全局的事先设想,通过追踪所谓的金融往来线索,可以追查到更多的恐怖分子。其为此专门建立一个名为Tracfin的金融数据库,用以存储从各个金融机构得到的信息。2011年,这一数据库的信息量达到1.8亿条,其中84%的数据是信用卡信息,涉及用户主要分布在欧洲、中东和非洲。

此外,该数据库中还有部分信息来自欧洲的环球银行金融电信协会。美国“9·11”恐怖袭击之后,环球银行金融电信协会开始秘密向美国提供金融交易数据。2006年这一事件被媒体曝光后,欧盟要求与美国展开谈判,以保证欧洲银行数据的安全和公民隐私权。在多轮谈判后,欧盟和美国于2010年达成一项协议,允许美国通过环球银行金融电信协会系统获取欧洲银行的交易信息,用于打击恐怖主义,但美国在使用和存储这些金融信息方面必须遵守欧盟数据保护法律之下的严格规定。然而,根据斯诺登的最新爆料,美国从来没有停止过监视环球银行金融电信协会的金融交易往来信息。这意味着,在此期间美欧之间所有的谈判都只是表面功夫,没有实际作用。

2013年12月29日,德国《明镜》周刊称,美国国家安全局多年前就已攻破了主要公司开发的几乎所有安全架构,其中包括来自思科、华为、瞻博和戴尔的产品。

据媒体报道,美国还侵入了巴西国家石油公司的电脑网络。

二、美国把中国当成秘密监听的主要目标

(没有打分)

Andrew Ng加盟百度

5月18日,在百度位于加州Sunnyvale的百度美国研发中心新址启动仪式上,斯坦福教授Andrew Ng被任命为百度首席科学家,全面负责百度研究院。下面部分内容,是在Quora上收集的一些关于Andrew的问答:

  1. Andrew的基本信息:http://en.wikipedia.org/wiki/Andrew_Ng
  2. Why did Andrew Ng leave Google research to join Baidu?

    George Anders, Author of The Benjamins, a social satire:

    I spoke with him over the weekend and he sounded pretty excited about the research opportunities at Baidu. For fuller details, see this write-up of our interview, posted on Forbes’s website: Baidu’s Coup: Ng Aims To Build Silicon Valley’s New Brain TrustAreas that extend beyond our conversation, which might be relevant, too:

    • The budget available at Baidu is BIG. It’s not as if Google doesn’t have lots of money, too, but it may be harder at Google to take command of a project that size (without many caveats being attached.)
    • He says he likes the opportunity to pool U.S. and Chinese research efforts. This is perhaps easier to do at Baidu than Google.
    • Wired reported that his Baidu deal took shape over a three-hour lunch with Baidu CEO Robin Li. If you’re lingering over lunch that long, you’re both excited about the topic of conversation and the degree of rapport. He could have quite a solid relationship with senior Google management and still not have that level of access or support. Here’s the Wired article.
  3. What role and responsibilities will Andrew Ng have at Baidu? Who will he be reporting to?

    Kaiser Kuo, Director, International Communications, Baidu Inc.:

    He will report to Jing Wang, senior VP. His role is head of Baidu Research, which is our name for the company’s advanced research initiatives. He has overall responsibility for three labs at present (the Silicon Valley AI Lab, under Adam Coates; the Beijing Deep Learning Lab, under Yu Kai; and the Beijing Big Data Lab under Zhang Tong) but that number is certainly expected to grow. He sets overall strategic direction of research and coordinates work between and among the labs. He has been given a very long leash, and is expected to pursue advanced projects on long timetables. In a group interview that he gave along with Robin Li and other senior Baidu officials just now, Robin said that Andrew’s time horizons could be as much as 10 years.

  4. What is Baidu working on at their research center in Sunnyvale (which Andrew Ng will be heading)?

    Kaiser Kuo, Director, International Communications, Baidu Inc.:

    Andrew’s purview includes three labs at present: The Beijing Deep Learning Lab, the Beijing Big Data Lab, and the Silicon Valley AI Lab. A shorthand way of understanding what the Sunnyvale-based research team will be working on is unsupervised (as opposed to tagged-data based) deep learning, whereas Beijing—closer to the product teams and in close proximity to the prodigious volume of data thrown off by Baidu’s search engine and other products—will still focus on the many applications of tagged-data learning..

  5. How do employees at Baidu feel about the hire of Andrew Ng?

    Kaiser Kuo, Director, International Communications, Baidu Inc.:

    I couldn’t be happier myself! We actually get a twofer with Andrew, as he’s also bringing on his longtime collaborator (and former grad student) Adam Coates, who’s an AI and robotics pioneer in his own right. In the time I’ve spent with Andrew I’ve found him to be extremely thoughtful, well-spoken, inspiring and of course massively intelligent. Most importantly we’re very confident that he’s going to lead our labs (in Deep Learning, Big Data, AI and eventually more) into real groundbreaking territory—unsupervised deep learning (in addition to the strengths Baidu already possesses in tagged-data learning) and much more I’m not at liberty to share. His guiding principals I really like: Semi-porous, open, not publication-fixated, not at all snobbish vis-a-vis the D part of R&D, interested in seeing the work done at our labs actually impact hundreds of millions of people. We’re also of course very proud that he was as excited by the work that our Deep Learning team has already done, and how well w’ve already integrated it into our products, from image identification and image-based search to voice IME to OCR to natural language and semantic intelligence to improved search results and ad matching.

(没有打分)