版权信息

Copyright

考试周刊杂志
  • 名称:考试周刊
  • CN:22-1381/G4
  • ISSN:1673-8918
  • 收录:中国知网 万方数据
  • 网址:www.kszktg.com

联系编辑

论文资源

当前位置:考试周刊杂志社 > 论文资源 >

基于PageRank算法社交网络的改进与研究c

作者:冷若冰 袁航 字数:7607  点击:

In the final design of Page Rank, Bin and Page rule that the websites transmit the “importance measure” by using link. The “importance measure” of each website is equal to the sum of the “importance measure” that other websites transmit to it. So, the measure flows throughout the whole network. From the point of view of in-link, if a website gets a high measure, there may be two reasons. The first reason is that many websites give measure to it and the second is that few websites give measure to it but each of them gives lots of measure. From the point of view of out-link, now that the measure of a website is decided, the more out-links, the fewer measure each out-link can get.

Then we will show that how the “importance measure” is transmitted among websites. The top left corner website gets 100 measure. It transmits 50 measure to both the top right corner website and bottom right corner website through 2 out-links. The bottom left corner website only gets 9 measure. It transmits all its 9 measure to 3 websites and each of them gets 3 measure. Only one out-link transmits its measure to the top right corner website. The other two objects are not in the figure. So the top right corner finally gets 53 measure and the bottom right corner gets 50 measure. Since both of them have two out-links, the measure that each out-link transmits of the top right corner website is more than that of the bottom right corner.In-link of the website i: hyperlink directing to website i from other websites.Out-link of the website i: hyperlink directing to other websites from website i.

Define a directed network G=(V,E), V representsthe set of node, in other words, the set of all websites, E represents the set of directed edges in the network, which means hyperlinks. n equals to the number of websites in the network. So the PageRank value(represent as p(i)) of the website i can be define as :

P(i)=Oj means the number of the website j’s outer

link. In mathematics, we can get n linear equations with n unknown variables. A matrix can be used to represent all the equations. Use a n dimensional column vector P to represent all the PageRank values.

A equals to the adjacency matrix of the graph

The expression can be written as:

It can be seen that P is eigenvector that the eigenvalue of the matrix (1) corresponded to.

Solving this equation needs to satisfy some conditions. The matrix A must be a random matrix, which means it is irreducible(the directed graph that matrix A corresponding to is strong connected and nonperiodic. But a real network(or social network)doesn’t satisfy those conditions. In fact, the equations above can be inferred through Markov-Chains.AT needs some modifications to satisfy the conditions above. To makeirreducible,which means every node has outlinks, a concept (denoted as d) named damping factor is defined, multiply AT by d and add, e is an all 1 n-dimensional vector, which means, the probability of any oneof websites linking to other websites is at least (1-d), and a strong connected graph is formed.

A modified PageRank model can be deduced:

If some personalized settings for the initial matrix are needed, we can add a value to every element in the adjacent matrix and convert to (named ‘personalization vector’).The matrix G can be deuced:

The matrix G is also called ‘Google Matrix’, the formula above can be expressed as:

Thehere is the same as the p above, only through a transposition. equals to the vector of the PageRank value, . Define as the unit matrix’s column vector of column i, the PageRank value of the node i are equals to:

Since different personalization vector can be set and apparently for different vector V ,different can be deduced, so we use =(v) to represent it. In the simplest situation, assume v=e/n.

2 Definition of Community Tree

After getting Community Tree from the social network, the social network’s community and its organization structure can be deduced. The graph 2.6 is an example. Node 1 and node 5 are the cores of community 1 and community 2 respectively and the immediate leader of node 1 and node 2.

PageRank algorithm calculate a global value for every website through analyzing the links between websites. Which means the significance.Every member’s significance in the social network can be evaluated by PageRank, calculating m-Score value for every node. In a network, random walks implements the soft cluster of the nodes implicitly. Thus, random walks can be used for every member in finding its immediate leader. A Community Tree can be formed by connecting random walks and m-Score value of every node.

3 . Detailed design

First of all, we get a one-step probability transition matrix of the social network G. T is the jump frequency of Random Walks. After the standardization, we will get the t-step probability transition matrix M. Then, we call calc_m-Score(G) to calculate the m-Score value of each node. For each node i, we will find the most possible node j that node i will jump to after t steps by using the t-step probability transition matrix M. If the PageRank value of node j is large than that of node i, we consider node i the father node of node j.

Pseudo-code that calculates the improved CT Tree

Algorithm: revised_CT_Deriving

Input: Social Network G, Jump frequency t

Output: The improved CT Tree

Procedures:

1. CT ←[null,…,null]

2. A ←getOneStepTransMatrix(G)

3. Z ← diagonal matrix satisfied Zjj = ∑i[At]IJ

4. Mt ← At.Z-1

5. R ←calc_m– Score(G)

6. For each Pi in R

7.list ← Mt[i]

8. list.sort(reverse = True)

9. for k in len(list)

10. If R[k] > R[i]

11. CT[i]←k

12. k ←k-1

13. End

14. Return CT

In the improved CT_Deriving, when selecting the father node of node i, we will not choose the node with the largest t-step transition probability. Firstly, we sort the t-step transition probability of all nodes and check every PageRank value of node k until we find a node k whose PageRank value is bigger than that of node i. Then we will set node k’s PageRank value as node i’s PageRank value.

4.result

In the graph 3.10. the blue broken line represents the trend of PageRank value which is without offset, the red broken line represents the trend of PageRank value which is offset, the green broken line represents the trend of PageRank value when p_2 has been offset, it can be clearly seen that the PageRank value of node{5,6,7,11,7} is increasing by the level of offset, other nodes, otherwise, shows different level of decrease.

We can see that after offset, node{5,6,7,11} accesses in Candidate Set, also their action scope can be clearly seen. Node 17 didn’t access in the Candidate Set, after offset, its action scope changed from 3 nodes(13,18,22) to 4 nodes(12,13,18,22), but node1 remains in the Candidate Set, it can be seen that though it is not preponderate in ‘interested’, its ‘influence’ can’t be ignored since it has lots of ‘friends’.

3.10

In the model, we made some improvements on the creation method of the CT Tree. After some tests, we can reflect individual vector of user behavior by custom made. It finally affect the PageRank value of the user. In this case, the PageRank value consists of the information about network linking itself and user behavior. Combined with the improved Random Walks Algorithm, we can confirm the “loose relationship” among users. This relationship reflects that nodes may affect each other with a certain probability. We present the users’ dependency by using a CT Tree within a figure and select a certain number of decision nodes in the CT Tree. The information publisher can affect other nodes by reference nodes.


基于创新国际贸易的我国外贸研究
基于可靠性运算的流水型生产线优化研究
基于阿里巴巴电子商务平台T
基于构建和谐文化下的党校档案文化建设研究
基于485串口通迅的变频器控制方案
基于GM(1,1)和Elman神经网络的社会用电量预测模型
基于全球化视觉下我国高等美术教育中的问题及措施研究
基于加强建设工程过程控制的几点思考
基于CAS的Web应用单点登录系统的研究
基于中医医联体的健康服务管理平台的构建研究
基于支持向量机的蛋白质交互界面热点的预测的研究与改
基于spwv分解的电主轴振动信号的时频分析
基于Verilog HDL的电子电路设计图的一种可视化编程方法
遗传算法在离心式压缩机热力校核性计算中的应用
基于“合作性学习”的军队士官任职教育改革

主管单位:吉林省新闻出版局舆林报刊发展中心 主办单位:吉林省新闻出版局舆林报刊发展中心

CN:22-1381/G4 ISSN:1673-8918 考试周刊杂志社

校园英语 好家长 网站地图