Category Archives: English

Pursuit of Happyness


I have just finished watching The Pursuit of Happyness stared by Will Smith for the second time. I wasn’t that touched when first watching this movie 2 years ago, feeling the life in the movie was too far away from me.


However, with two more years of life experience now, especially the hardship I have gone through in America, I become realize that it is not that far, and it is actually the story of you and me, of every person who would like to pursue their dreams.


This movie is about the story of a broken salesman who suffered much from poverty and misfortune, yet he never gave up and tried every method to change his fate and turn out to be successful stoke broker.


There are two scenes that touched my heart the most. One is when Will Smith and his son was kicked out of the motel and had to spend a night at the restroom of the subway. The strong man, who had never gave up after so many hardship and pain, cried for the first time in the movie when stroking his lovely son’s head. He cried because of his feeling of powerless to provide himself and his loving son the happiness and freedom that Jefferson has noted in the declaration of Independence. The second scene is the last part of the movie when Will Smith got the job and rushed to the kindergarden to embrace his son. Light shed into the room and the life would  no long be dark. He cried again as his perspiration and persistence turn out make an impossible thing possible.

Few people can be as misfortunate as the hero of the movie, however, most of us have the experience when we feel powerless, especially when we are a small potato. Myself is an example. As my qualification exam turned out to be a fiasco, I was at the edge of losing the RA job to support myself and the qualification to continue my Ph.D study. I felt the sense of powerless at that time. However, thanks to my previous efforts and help from several professors, everything becomes OK now.  


So whenever you get stuck into problems, just believe you can handle it and don’t give up, you will eventually figure it out.

Jing Conan Wang


Plan Before Qualifying Exam 2011-05-11


I thought my life can be easier after finals. But the real case it that it isn’t
My qualifying exam is 24-26 May.
Plan Before that:
Project for Yannis
Project for Staro
Qualify Exam.
Project for Staro can be finished by tomorrow May 12th. Yannis will come back around May 19th, so the next meeting can be at that time. Qualifying Exams consists of three subjects.
1 Optimization
2. Stochastic Process
3. Dynamic Programming.

11-12: Finish Staro Project. Meeting at 12
12-14: Project for Yannis. Get Prelinmary Result for Comparisonf PageRank and TrafficRank.  Use less 6 hours per day.
15-17: Optimzition
Day: Read the textbook of 524
Redo the homework.
Deduction of exec ices on textbook.
18-19: Dynamic Programming
18 Read the note.
19 scan the book.
Redo the HWs.

20: Stochastic Process.
Scan Notes.

21-23 Review.
Do previous qualifying exams.

Hope I can pass Qualifying Exam!!!


Sentences From Outlier: The story of success

Everthing we have learned in outliers says that success follows a predictable course. It is not brightest who succeed. Nor is the success simply the sum of the decisions and efforts we make on our own half. It is rather, a gift. outlier are those who have been given opportunities and who have had the strength and presence of mind to seize them.



A Simple Spider for Researcher Ranking Project

A important issue in ranking researchers is to construct the citation network between researchers.  To achieve it, I need crawl the database of cictation data from HistCite Website, whose URL is


I downloaded a python spider from web and revised it to make it useable. It is not perfect but enough for this project. It is really cool to watch command windows scrolling down and downloading thousands of papes. Wow, is it so called geek behaviour? 


Source Code

The original version comes from this website

Draft of my Report about ReRank Project.


This is the draft of my report about ReRank Project. It is sketchy but includes most points.

My basic idea is to transfer ranking problem into a network optimization problem.  Other guys have proposed an entropy maximization scheme to rank web pages in WWW, which is more fair then PageRank that Google uses. I borrowed this framework and analyzed the citation network formed by papers. Then I modeled the relation between researchers and papers with Bipartite Graph. By maximizing entropy again, I got the most general indicators of researchers’ popularity and influence.
The attachment is a brief summary of my idea.It’s highly appreciated if you can view it briefly before the discussion.
This is only basic idea of theoretical framework, there is long way to for before getting any meaningful result.
I will update it if I revise it later.

The Price for Growth.

I suddenly realize that I am no long a little boy snuggling in mom’s arms, and I will eventually become a man for other people to rely on.
When I was in high school, I thought my life would be comfortable if I passed the College Entrance Exam. But when I successfully enrolled into a prominent university, I didn’t enjoy the comfort I thought I should. Instead I got messed up with endless exams and competition, suffering much worry and frustration. Again I promised myself more leisure and less work in return for fulfilling my dream to study aboard.
The real situation is that I still haven’t got what I promised to myself as I sit in my lab in Boston typing these gossips now, yet I got my dream fulfilled one by one. That’s the price for growth!
I like the feeling of running, not in physical but in psychological sense. I would like to explore new journey immediately after finishing one, filling my life with rich and colorful life experience. The process is painstaking as well as joying.

Please run faster than life goes!!

Conan Wang

ReRank Algorithm Progress Mar 17

  Do you remember the idea of  Ranking Researchers? I spent the whole spring break to explore that idea.
       I read some articles about H-index, which is a research ability indicator used by Science and Nature. H-index can be summarized as: “the h-index of a researcher is h if he has exactly h papers whose citations are above h. “. Actually it only makes use of very limited information. Although the ignorance of large information make it robust and invulnerable for manipulation in some senses, it is disputable because the information filter is ugly defined by humans. In fact, we can define many h-index like indicators with different rank orders. It is impossible to justify which is the most fair one.
       Another method to rank researchers is to use citation analysis. Similar methods have been widely used in ranking webpages—probably the most famous one is the “PageRank” algorithm that powers Google. The essence idea of PageRank Algorithm is to calculate stationary probability distribution of random walk with surfer follows each out-link with equal probability.
PageRank works very well in ranking webpages. Yet it also relies on an assumption:”The surfer follows each out-link with equal probability”. The most fair ranking system should based on facts only and not rely on any human-defined assumption.
After removing this assumption from PageRank algorith, we get TrafficRank Algorithm.
The key ideas of TrafficRank Algorithm is to derive the most general(uncertain) conclusion of ranking order based on the existing information. Because the uncertainty of a system is characterized by entropy, it is actually an optimization problem to maximize the entropy. You can refer paper “A New Paradigm for Ranking Pages on the World Wide Web” by John A. Tomlin or the report I will post later for detail.
Both PageRank and TrafficRank target on webpages. Ranking researchers and ranking webpages share some characteristics, but they are different.
Webpages are connected only by link, while the relationship between researchers are much more complicated. My goal in the following days will be characterizing the relation between researchers with the suitable network model.


New Project About Ranking Researchers.

I have just thought an idea of ranking researchers, which I think is a good topic for my course project of EC 724.

There are millions of  researchers all around the world. Despite the fact that science has been broken down into many specific fields , there are still thousands of researchers in each field.

Who is the best researcher in a field? A researcher can usually point out several famous researchers in the field he is familiar with. However, we still lack a reasonable quantitative metric to describe “how famous” a researcher is.

We realize that in real life the harm of a ranking system usually surplus the benefit it brings. A poor designed ranking system which relies obviously on a certain set of metrics will draw people’s attention to those metrics and make them ignore the true meaning of research.

For example, the existing university ranking system in China relies heavily on the number of published papers. That’s the reason why the number of paper published by Chinese researchers has increased so rapidly. However, a large portion of those papers don’t worth reading.

In our research, we want to design a ranking system which is independent of specific metric. The score we assign to each professor just reflect other researchers’ opinions towards him.

That’s a score that

This is indeed a optimization problem. Suppose there are N researchers.  vector x is the score assigned to each researchers. We need to find an optimal x* that satisfies each professors as much as possible.

So it is a multiobjective optimization problem with N objectives. We can find all the pareto optimal solutions first.

What’s the constraint. There are some training set. Which should be satisfied. For example, if we know that professor A is surely better than B, the score of A should be no less that of B.

The problem is that we cannot send questionnaire to every researchers. Instead we deduce a researcher’s opinion  from his publications. We assume that the publication of professor can represent all his academic opinions.

We need overcome the problem of frequent mutual reference. We know that researcher in the same group will refer each other’s literature frequently. We need prove that our algorithm is insensitive to this factor. That is to say, people in a group can not gain advantage by deliberately mutually refers each others’ paper.

We need to show that the result is insensitive to the number of publications. A researcher cannot gain advantage by publishing more low quality papers.


The Long Tail of Labor—Influence of Crowd Sourcing on Labor Market

Author: Jing Conan Wang

Email: hbhzwj


Abstract: This article describes the influence of Crowd Sourcing and Human Computation on the Labor Market. Crowd Sourcing and Human Computation help to break the large job which previously finished by employee into small tasks which will be distributed and accomplished by a crowd of people in a pleasant and rewarding method. As a result, it will increase the tail of labor market to a great extend and generate an extremely large commercial opportunity.

Long Tail dominates the Internet. Many interesting things happen when the entering threshold of a specific field approaches zero and thus generating a extremely long tail. For example, The success of Google Adwords and AdSenses owes to the long tail of Advertisement. The essence of long tail is to generate large profits from small needs.

Usually if you want to have a jobfinished. You need to hire an employee, sign contract with him, provide a working place, train him, describe the job to him and eventually pay money to him. It is very complicated. One benefit of such complexity is that employers are less willing to fire a person he hired because it will cause extra soaring cost of hiring new person. As an definite “employee”, I must thanks to the person who create this process as it protects me better than federal labor law.

Example of Cleaning your office

but at the same time, it also means that employers are less willing to hire person for small tasks. You may think your office is dirty, but except for the case that the office is dirty enough( the degree of “enough” depends on personality ), you would not like to hire a new cleaner. In this case, the threshold, namely the cost of cleaning the room by hiring a cleaner by yourself, is very large. The head of this market is those big companies which really have the needs to recruit cleaners by themselves.

Cleaning Company reduce the cost by maintains employee and employer relationships by themselves and you outsource your tasks to him. In this way, the cost of cleaning your office is reduced and the tail is thus extended.

What if a strange job? The tail of job

But that’s not enough. Cleaning room is a common task whose needs are large enough to support a company. What if you want find someone to “steal vegetables” for you in Happy Farm Game in 4AM everyday? I thinks less than 1 out of 10,000 people may have the similar need with you, and another 1 out of 10,000 people are willing to do that for a pay.

I don’t think there would be company to whom you can call to help you “steal vegetables” in 4AM every day. But in Internet Age, there is still way to meet your need, you can post the task in a website (these kind of website is not popular yet) and person who have interests can contact with you. Maybe a boy who lives 3 blocks away from your home may send you email saying that he is willing to do that for a 9 dollar hour-wage. The tail is extended for another time.

I just said “maybe”, personally I believe it is still highly possible that you get no response because few people can bear this work for more than a month. So why not break the tasks to smaller ones and hire person to “steal vegetables” for only one day? It will be much easier to find a person happens to be awake in 2 AM for a specific day(I often stay up in Friday so it is no bad to gain some bucks by clicking mouse for several times that day).

The general trend is that job is broken into many tiny parts which can be outsourced and finished by many people. In this way, the gap between demand and supply was greatly reduced. Here comes the questions, which kind of jobs can be broken, how to break these jobs, how to find people who are willing to do those and assign each part to them in a optimal way?

Unfortunately, the rule of “breaking” is not applicable for many jobs. For example, it is impossible for you to change your baby-sitter everyday, you don’t like that and your baby would not, too.

But there are still some applications. For example, wikipedia has proved that multi-person cooperation can generate high quality report. Yahoo Answer has proved that Internet can take place of consulting as least partially. Actually, most of the office work can finished by a crowd of people connected to Internet, so called “Crowd Sourcing”. That is to say, crowding sourcing and eliminate the job of most white collars.

People don’t need to sit in office building formally. Instead they only need to do what they want in Internet, like playing games, listening to music and so on. And they will finish other people’s tasks indirectly and get paid. THAT SOUNDS FANTASTIC!!

Why it is a revolution. How large the opportunity is?

The world of long tail is a world of monopoly, that is to say, the long tail of each specific field is always dominated by small number of companies. For example, Google dominates the long tail of Online Ad, Ebay dominates the long tail of E-commerce, so on and so forth

Although the total price of each task is tiny, the long tail can account for the 40% or more of the total value because of their massive number. Each year, hundreds of billions of money are invested into office-related jobs. That’s to say, the dominate company will have sales volumes of 40% of these hundreds of billions dollars, which amounts to at least 100,000,000,000$!! It would be a giant like Google and Apple.

When will it become a reality. What’s the problem make it impractical nowadays.

People do the things they like, and they get enough payment to support themselves. It seems to be utopia! However, this scenery is still impractical now, mainly because the technical limitation.
Both Crowd Sourcing and Human Computation is still in their early stage. Luis von ahn, the creator of human computation concept and the inventor and many games with purpose, also admitted that human computation game can only be used in a small range of field like image labelling, and he don’t know how to develop an general framework, either.
I think the development of human computation and crowd sourcing is closely related to Artificial Intelligence, Data Mining. Maybe it still takes another 5-10 years for this field to have significant breakthrough.

New Technology and New Company

The history of IT Industry witness the emergence and the fall down of many famous companies, most of which are labeled with a certain kind of technology. The development of these companies are tightly associated to corresponding technology.

For example, prior to the Microsoft Age, software was rarely considered as profitable, let along become the center of whole IT Industry. It is Microsoft led by Bill Gates that open the gate to proprietary software age. Microsoft teaches people all round the world (may be except China) that software development should be well respected as the other essential part of PC in spite of Hardware.
In the mid 90s, few people realized the power of Internet. Yahoo was one of the first companies that really treated Internet as paramount new media which can totally change the method of information distribution. Since then online AD gradually become the mainstream ad distribution method. Cherry Yang’s efforts greatly damage the traditional media like TV and Newspaper and reshuffle the whole media industry.
Similarly, another Internet giant Google has totally changed people’s way to get information since its setup at 1998. With novel PageRank technology, Google shows web users organized information indexed by millions of key worlds instead of orderless web pages.
Now Microsoft, Yahoo and Google have become the synonym of Software, Online Ad and Search Engine. It is new technology that creates the prosperity of these IT Giants. However, this is not always the case, new technology doesn’t always means new successful company.
Take Netscape as example. The Netscape Navigator represented the advent of Internet Age with rich multimedia content. With this revolutionary product, Netscape became the favorite of the Wall Street and its market value surplus 5 billion dollars in a short time. This situation was totally changed when Microsoft entered the field of web browser. In short 3 years, Netscape suffered from sever defeat in the both web browser and intranet market. Under the giant financial pressure from the Wall street, Netscape finished its brilliant and short life and was sold to AOL.
Compared with the trategy of Netscape, the ending of youtube is more pleasant. The creators sold youtube to Google with a high price, at which time youtube was burning money and didn’t have clear profit model. The two creators gains 300 million dollars respectively. Many other companies have the similar happy ending, like hotmail, doubleclick.
All of these companies hold new technology that has total redefined a market, however some of them succeed at last and some other are sold. We must admit that creators’ characteristics play an important role in the process. But creators’ choice is not the only reason. the characteristic of the technology and market condition have pre-decided the destiny of each company.
To produce a new IT giant, a new technology must has the following characteristics:

1.  Foundational Application with huge market demand
Not all technology is enough to support a huge IT giant. OS, online AD and search engine all share a common point: it is a necessity and foundation of whole industry. Every PC needs Operating System and all other software must be build on API of operating system. Online AD is in the base of Internet Economy, no Internet Company without the support of Online AD. Search Engine is the portal of Internet and it directly determine other websites’ flow, which is essential to generate profit.
Although applications like google map is awesome. It is not enough to support a independent company because map service is not a basic application of Internet Industry. It is in the upper layer of online ad and search engine. Map services rely online ad to make profits and search engine to import users. However, no other service is based on map service. Mail serice is similar. That’s the reason why Google Map, Gmail and Hotmail indeed cannot operated by independent company, but only by Search Engine company or Online AD provider.
2. Easy to understand.
No investor want to invest in a program he doesn’t understand. No matter how good the technology is, you cannot even start without the initial investment.
3. Gap with existing technology Or Ignorance of Existing Gaints.
The fall down of Netscape owes to its violation of this condition. Despite its importance, Web browser is indeed a simple software which is easy to reduplicate. At first, Microsoft didn’t realize the importance of web browser and didn’t pay much attention to this emerging market. However, the improper speech of Netscape’s CEO and the alliance of Netscape and Sun, the main competitor of Microsoft at that time, infringed the godfather of software industry. A crazy revenge started when Microsoft bind their own web browser to OS. In fact, it took Microsoft only several months to develop their own web browser and performance of IE caught up with Netscape Navigator in one year. With the profit generated by Windows OS and Office suite, Microsoft had no intend to earn money before Netscape is closed down.
4. Low Monetary Investment At the beginning Stage And Foreseeable Profit Model.
This can be used to explain why youtube is sold to Google. It is well known that online video is a promising mainstream application in the future. And youtube has solidify its own consolidate its leading position in video sharing market.  If the cash of youtube is enough to support itself for 3 year or long, there is no need for steve chen to sold youtube. However, online video uses so many network bandwidth that few investor has patience and enough money to wait for such a long time. Besides, if youtube has forseeable profitable profit model which can helps to persuade investors, youtube can walk longer in the way of independent development.

Unfortunately, youtube lacks both of the essential conditions. As a result, it is reasonable to accept the price when Google bids for 1.5 billions.