A natural power-law alternative useful to estimate the number of isolated nodes

Author

Duarte, A.; Perez-Casany, M.

Type of activity

Presentation of work at congresses

Name of edition

2018 Sunbelt Conference of the International Network for Social Network Analysis

Date of publication

2018

Presentation's date

2018-07-01

Abstract

For the last decades, the Zipf distribution has been considered a good candidate to fit
the degree sequence of a real network. This is reasonable, because in a huge amount of
networks a few number of nodes are highly connected, while a large number of them are
slightly connected. Nevertheless, in a recent work of A. Briodo and A. Clauset it is proved,
by means of analyzing approximately 1000 degree sequences, that the fits obtained with the
Power-Law (PL) distribution are not as accurate as it w...

For the last decades, the Zipf distribution has been considered a good candidate to fit
the degree sequence of a real network. This is reasonable, because in a huge amount of
networks a few number of nodes are highly connected, while a large number of them are
slightly connected. Nevertheless, in a recent work of A. Briodo and A. Clauset it is proved,
by means of analyzing approximately 1000 degree sequences, that the fits obtained with the
Power-Law (PL) distribution are not as accurate as it was expected. One of the main reasons
for this lack of fit, is the fact that the PL is linear in log-log scale and thus, it is not flexible
enough to fit the top-concavity that usually is observed in real data.
In this work we present a new alternative distribution named the Zipf-Poisson Stopped
Sum (Zipf-PSS). This family is obtained by applying the concept of Poisson Stopped Sum to
the Zipf distribution. This means that a random variable (r.v.) Y ~ Zip f - PSS(a, ¿ ) if, and
only if, it is the addition of N independent and identically distributed r.v’s X i with a Zip f (a)
distribution, where N is Poisson(¿ ) distributed. Thus, the new model is bi-parametric with
parameters (a, ¿ ) ¿ (1, +8) × (0, +8), where a corresponds to the Zipf parameter and ¿
to the Poisson one.
The Zipf-PSS has several advantages with respect to the Zipf distribution. The first one
is the model interpretation. This model appears in a natural way when one considers the
influence of the time in the graph generation mechanism. Imagine that we have a graph
with Instagram profiles in the nodes, and that we are interested in the number of likes that a
person does in a day, which correspond to the degree of a node. In this situation, it has sense
to assume that the number of likes for each user’s connection follows a Zipf distribution,
and that the number of times that the user connects in a day is Poisson distributed. These
assumptions give rise to assume that the total number of likes of a person per day comes
from a Zipf-PSS distribution. Other important advantages are the following: i) it is more
flexible than the Zipf model since the additional parameter allows for the top-concavity, ii)
it contains the zero value in its support, allowing the estimation of the isolated nodes when
they are not observed as, for instance, in wireless networks, iii) the Zipf model is obtained
when the Poisson parameter tends to zero. Thus, parameter ¿ is interpreted as a measure of
departure from the Zipf model.
In order to see the better performance of the Zipf-PSS distribution, we fit the degree
sequences of several real networks. The fits obtained are compared by means of the Akaïke
Information Criterion with the ones achieved by others bi-parametric models.