Networks; The Power of Hubs

How do some  webpages on the Internet become so ubiquitous that we are rarely ever less than two short links away from viewing that page? How have some webpages become so popular among the hundreds of billions of webpages that are on the Internet currently, not to mention the new ones that pop up almost every second? More importantly how do certain pages become the centre for web activity and lead us to other pages within their network? Albert-Lázló Barabási explored these questions about networks and the organization/disorganization of interconnectivity within his Chapter titled Hubs and Connectors .

Barabási begins the piece by explaining both the power and importance of links within the web. Quite simply the larger number of incoming links that direct visitors to your webpage then the higher the number of visitors you will have, which in turn increases your overall profile on the web. This sets your page apart from others, allowing for greater visibility and secures your place within the hiearchy as a destination site online. The perception of networks, its growth and mechanisms has traditionally followed the random worldview theory of networks as defined by Erdos-Rényi. Within this theory the interior of networks are connected at random, by nodes (this can be thought of as webpages online) and each node within the network is not distinguishable from another, meaning that they uniformly connect to the other nodes within the network the same number of times. (This example is presented in the previous post.) Hence the distribution of the links is equal between all the nodes thus abolishing any distinguishable character that one node may have over another. This distinguishable character within a network typically infers that there is “a few highly connected nodes, or hubs.” (58) According to the random network theory hubs could not exist because all nodes share the same amount of links to other nodes. Additionally the total number of nodes is static over time, leaving no room for growth in terms of the number of nodes existing within the network. Typically this theory has been represented by bell curves; in which various nodes are dispersed randomly and all connect to each other an equal amount of times, thus creating a peak within the middle of the graph. Accordingly within this random network theory all webpages would have an equal opportunity to be seen and heard. Barabási observes that this theory of networks is quite simply not true, and holds no ‘real world’ application and/or example.

He contests that the Erdos-Rényi theory of networks, which consist of nodes that are randomly connected with “roughly the same number of incoming links”, is not applicable at all to the hierarchal pattern of popular webpages prevalent within the web today. Barabási states that some webpages such as Amazon.com, Yahoo! etc. have an exceptional amount of incoming links and thus dominate the web through the sheer enormity and range of their incoming links from various networks across the web. Webpages like Amazon inherently become the hubs of the Internet. Additionally Barabási notes that just like actors within Hollywood (as observed within the Kevin Bacon phenomena, in which Bacon is only 2.79 degrees removed from everyone within the Hollywood network) the strength of these Internet hubs do not rely on the sheer size of the network but the range. This means that the links these nodes make to other various distant networks adds range to the network that the hubs possess. This adds variety and brings dissimilar things together. In contrast to hubs, Barabási points out that his own webpage is so insignificant within the web, due to the near infinite range of the Internet network, that the chance of a person viewing it is about forty in a billion. Within this contrast lies the main concerning question of the Internet and it’s network; how do some pages become these hubs of the Internet, in comparison to the majority of pages that exists in near obscurity?

Barabási claims that the theory and research of networks as represented by Erdos-Rényi random network theory fails to answer this question because of it’s fundamental underlying assumptions; 1 Static: the amount of nodes in a Erdos-Rényi network are set, meaning there is no room for growth, which is in exact opposition to the nature of the web itself and 2 Equality (Random Interconnectivity): each node within the network contains an equal amount of links to the other nodes thus making each node indistinguishable from the last, hence the nodes are linked randomly and are equal (there can be no room for hub webpages such as Google!). Within the random network’s nodes are considered a “characteristic scale[,] embodied by the average node and fixed by the peak degree of distribution.” (pg70)

 Barabási dismisses this entire method of perceiving networks, and states that the actual distribution, connection and dominance (or lack thereof) certain webpages within the Internet is determined by inverting the two random network assumptions. The new foundations of network theory are ruled by the following;

1Growth: For each give period of time we add a new node to the network. This step underscores the fact that networks are assembled one node at a time

2 Preferential Attachment: We assume that each new node connects to the existing nodes with two links. The probability that it will choose a given node is proportional to the number of links the chosen node has. That is, given the choice between two nodes, one with twice as may link as the other; it is twice as likely that the new node will connect to the more connected node”.  (pg 86)

These foundations are further represented not in bell curves but by the Italian Economist Pareto’s 80/20 Rule and the power law phenomena it produces. Within a power law ruled network the majority of nodes have only a few links (incoming and outgoing) and these tiny nodes co-exist with a few big hubs that are highly connected with many nodes that traverse various differing networks. The power law exemplifies Pareto’s Rule, which states that 80% of effects are derived only from 20% of causes. The approximate 80/20 rule applied to the web means that “80% of the links on the web point to only 15% of total webpges”. (66)

Pareto’s 80/20 Rule and the power law graph that is supports provides us with a very different perspective on networks, especially according to how Barabási perceives networks in the Internet. The power law graph shows a slope that rises and then declines running horizontally parallel out to infinity.  The nodes located with this horizontal section account for the 80% of wepages and the peak of the slope which is located on the left hand side of the graph closest to the vertical line is the 15% of webpages that account for hubs. This unequal distribution of incoming links as presented within the power law graph is a more accurate description of the current state of the Internet. 

 

 As Barabási describes;

“The power law distribution thus forces us to abandon the idea of scale, or a characteristic node. In a continuous hierarchy there is no single node which we could pick out and claim to be characteristic of all the nodes. There is no intrinsic scale in these networks. This is the reason my research group started to describe networks with power-law degree distribution as scale-free.” (pg70)

This principle that describes the web and its interconnectivity is not a phenomenon distinctive to the web but applies to a variety of situations within human life.

The effect of power laws and the study of its presence is not a new phenomenon at all. In the last few years the existence of the Internet has dramatically changed how some view the various power laws at work within out world. The Long Tail as it was coined by Chris Anderson has become a staple of what some believe will be the future for commerce, specifically within the music industry. Anderson felt that the hubs within the music industry were an effect of the old 20th century system of hits culture which encompassed production, manufacturing and the distribution of music. These hubs within music could be seen as various major record labels, major publishing companies, big name acts, and their hit repertoire. For many years the music industry was ruled by power laws, with only 20% (termed by Anderson as the ‘head’ of the power law) of the products (the term products primarily refers to music recordings like CD’s. LP vinyl etc) accounted for 80%( termed by Anderson as the ‘tail’ of the power law) of the sales.  The 20%, of  the ‘head’, are comprised of the goliaths of the music industry such as Michael Jackson’s classic Thriller’ or by more recent releases like Rihanna’s album ‘A Good Girl Gone Bad’. The other 80% of niche genre music (such as instead of the main genre of Rock, you want Acid Rock) was virtually inaccessible, because the justification of producing, manufacturing, distributing and stocking this niche music  would not be meet via mass profits…until now.

Anderson proposed that with the democratization of distribution via Internet retail sites like Amazon, anyone could find and purchase a piece of music that they enjoyed. By riding the tail of the power law to the outer reaches music lovers could find music that was deeper within the niche of music that they always wanted to own, and/or discover new music. Additionally if we are to take a look at Barabási’s notion of networks within the Internet, the hubs of music such as Jay-Z could refer music lovers to other musicians within that rap/hip-hop genre of music like Nas, and the links from the Nas node could lead to other lesser known acts.

The sheer enormity of information and access that the Internet provides would allow for music lovers (or a consumer of any product/service) to be able to find exactly what they want, and be able to obtain that item. Through the Long Tail Anderson argues that we could begin to focus more on the niche products (or in this case music genres) that for so long have been relegated to the sidelines as the hit parade came into town. In fact Anderson argues that in terms of commerce businesses could focus on selling ‘alot of a little’. For example Anderson starts with just exploring the ‘A’ section of music genres and discovers a niche of Afro-Cuban Jazz. Although this isn’t a big genre within the ‘A’ section, in it there are hubs  like artist Tito Puente. You could focus as a retailer on having Tito Puente in addition to other artists within the same genre, like the Buena Vista Social Club. You may only sell a few copies of each artist, but the total of that entire sale of those artists in combination would equate to a profit either equal or similar to that amount if you might have sold if you were just focusing on the ‘head. Accordingly this would mark the end of hits, or is it?

Kanye, Lil’Wayne, Miley Cyrus, Pink, Kings of Leon etc, are all still selling millions of copies of their albums (or at the very least a solid half a million). These are mass marketed and mass produced artists, and more importantly as hubs of the music industry they still influence alot of other artists that are developing. Although in our preferences we might become very niche, the power and influence of hubs still remains. Hence although the playing field has been leveled, and as consumers we can have access to music like never before, we still need some sort of guidance. Anderson calls this ‘filtering’, which sites like Amazon with its Recommendations technology and Google through its search engine provide by offering a means of shifting through all of the products available.

 However, once again he is missing the bigger point of networks and the position of hubs within that network. For many music lovers finding new music is a journey that starts with a hub like John Mayer and then through the connections that artist makes as a hub whether it’s through genre, featured artists on his album or label mates, music lovers had to start at the ‘head’ of the power law and then ride to the ‘tail’ to find that music. If those hubs didn’t exist then discovery would be highly difficult. The  Erdos-Rényi random network theory was disproved because of this very thing;  if all the nodes have scale and there are no distinguishable characteristics making any one node different from another than the same information is passed through that type of network. Within the music industry this would mean that no new music would be able to be found, and by only focusing on the niches the bigger picture of music trends and tastes would be ignored. Hubs need niches and niches need hubs. Anderson’s continued crusade against hubs or ‘hits’ destabilizes the entire network theory; and its applications within music, Internet and life in general. Anderson fails to realize that as within any network; when a node has a choice between linking between two other nodes it will most likely choose the node that has the higher amount of links. The same can be said for music fans and webpages alike; hits are still hits.

About these ads

5 responses to “Networks; The Power of Hubs

  1. I believe that the accessibility of links may not be as far-removed from connectivity as the 80/20 model explains. The so-called cluster effect of links can be used as an example for this: with the rise of social media the pertinence of links and the accessibility of separate clusters are not dependent upon correlating nodes. This is because a social ranking system of these sites provide links of (theoretically) greater value than other. These sites like Stumble upon, Reddit, and Digg do not act as super nodes (like amazon or Google,) because they are not direct nodes. In turn, they are counter examples of the Power Law Distribution, because they establish links between preferential links without routing through nodes. This interlacing of links diminishes the obscurity of links, as it grants a higher degree of visibility to users. However, of course users must use these avenues of social media in order to increase the connectivity and distribution of links, yet this does not discount the fact that the connection between these links exists. Similarly, social networks negate the separation of access to nodes, because they create direct links between links.

  2. I also feel that it’s important to to consider the actual user when trying to represent internet traffic from links. Similar to the 80/20 rule with links, the same rule can be applied to users. Some people can religiously check a website all day, while some can check it once a day. I think if you take that into consideration with the 80/20 rule on links. The data can be biased. I also think it’s fascinating how some websites can achieve internet monopolies, the internet really isn’t built on the laws of competition. Once there is a website to do something, everyone is content on going to that website. That’s how iTunes can dominate the music market and YouTube can dominate the video market. In the world of the internet, in order to become a popular link, you really have to be first.

  3. Pingback: Cooperative Frameworks « Introduction to Digital Media

  4. As for the “long tail” becoming the future of the commerce, I think we may already be beyond that point. Maybe a few years ago it would have been the “future” but with such powerful internet hubs and connectivity made so easy, finding obscure products has become commonplace. Netflix actually build some of their business strategy around the long tail, which proves the viability of selling more specific goods. By offering more titles than their competitors, they were able to capitalize on a whole market that was unattractive to their competitors. Within a few years their library grew tenfold because they saw such growth potential.

    In response to tyl9876, there have been several websites that have overcome their predecessors, so superior usability will put you above the competition. The two examples that pop into mind right now are Google overcoming Yahoo, Ask, and all the incumbent search engines, and Facebook overcoming Myspace. (http://blog.compete.com/2009/02/26/facebook-myspace/)

  5. Using a bell curve, normal distribution, implies that links are random.

    “The Long Tail” talks about markets, but misses the subcultural basis for markets for persistent, formerly hit-based goods like Beatles records. For the die-hard Beatles fan, the songs, the artifacts have meaning that only Beatles fans understand. It’s not “I like the Beatles,” but ” I am a Beatle, or something along that line.”

    The cognitive link to “artifact-based” subcultures led me to ethnography, which in turn lead to something called a Skree Diagram, or a graph of a factor analysis, an extreme form of regression analysis built around correlations. Such diagrams show that, typically, Three factors account for as much as 85% of the variance of a situation. These diagrams look like long tails.

    We have used normal curves to build market segmentation, which ignored culture and its associated meanings. Our requirements processes do not account for meaning groups, aka cultures. Meaning loss is real and costly in software products.

    We need to move away from Bayesian statistics across the board. It makes for weak science, but still manages to serve as the basis for the social sciences.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s