2.2. Temporal Triangles

This is a synthetic dataset used as a benchmark kernel for assessing graph search performance.

2.2.1. Dataset Description

We use an RMAT Graph generator to generate a graph with a power-law degree distribution. Useful information about RMAT generation can be found here:

  • PaRMAT is a parallel RMAT graph generator

  • Graph500 is a benchmark aimed at assessing graph algorithms, but not specifically graph search. It has a description of RMAT generation.

We then attach a single property value to each edge called timestamp that has a uniformly random integer between 0 and 10,000.

2.2.2. Algorithm/pattern

The goal of this benchmark is to find all patterns in the generated graph that have a directed 3-cycle where the timestamps on the 3 edges are in monotonically increasing order around the cycle. That is, the timestamps are non-decreasing around the cycle.

../../_images/TT.png

A final constraint is that the elapsed time from the first to the third edge be within some small threshold of time. The idea is that the presence of the cycle represents a collection of things that were observed in the data around some single event (or cause) of the three edges.

2.2.3. Reference Implementation

2.2.3.1. Cypher description of this graph pattern:

MATCH (a)-[e0]->(b)-[e1]->(c)-[e2]->(a)
WHERE a <> b AND b <> c AND a <> c
  AND e0.time <= e1.time
  AND e1.time <= e2.time
  AND e2.time - e0.time < 42
RETURN a, e0.time AS t0, b, e1.time AS t1, c, e2.time AS t2

2.2.3.2. Gremlin description of this graph pattern:

t.V().match(
  __.as('a').outE('edge').as('e0').values('time').as('t0').math('_ + 42').as('t3').select('e0').inV().as('b').where('b',neq('a')),
  __.as('b').outE('edge').as('e1').values('time').as('t1').where('t1',gte('t0')).select('e1').inV().as('c').where('c',neq('b')).where('c',neq('a')),
  __.as('c').outE('edge').as('e2').values('time').as('t2').where('t2', lt('t3')).where('t2',gte('t1')).select('e2').inV().as('a')
)

2.2.4. Dataset Generator

The datasets are generated in two phases:

  1. Generate the topology using an RMAT generator.

  2. Enrich the graph structure by adding a uniformly-distributed timestamp to each edge.

2.2.4.1. Generating the topology

We used PaRMAT to generate the initial topology of the graph using the following options: PaRMAT -nVertices XXXXX -nEdges YYYYY. This generates a file called out.txt.

Our convention was to generate one vertex for every ten edges, and to consider the size of the dataset to be the number of edges.

To generate a size-100 graph, we would do: PaRMAT -nVertices 10 -nEdges 100.

2.2.4.2. Enrich with timestamps.

#!/usr/bin/env python

import argparse
import random
import sys

def genOutput(infile, outfile, seed=None, timestampRange=10000):
    """Generate the output file from the input file"""
    rand = random.SystemRandom()
    if seed:
        rand = random.Random(long(seed))
    with open(infile, 'r') as f:
        for line in f:
            t = rand.randint(0,timestampRange)
            flds = line.split(",")
            row = [int(x) for x in flds]
            rowstr = "%d" % row[0]
            for k in row[1:]:
               rowstr += ",%d" % k
            outfile.write("%s,%d\n" % (rowstr,t))

def temporalTriangleEnrichment(argv):
    """The main entry point for the temporal triangle enrichment generator"""
    # ---- Step 1:  process the arguments
    parser = argparse.ArgumentParser(prog='tt-enrichment',
                                description="Temporal Triangle data enrichment")
    parser.add_argument('-d', '--debug', dest='debug', action="store_true",
                        help="provide debugging information")
    parser.add_argument('-o', '--outfile', dest='outfile', default='out.csv',
                        help="the output data file, default: out.csv")
    parser.add_argument('-r', '--rmat', dest='rmat', default='out.txt',
                        help="the filename holding the RMAT generated file (default \"out.txt\")")
    parser.add_argument('-s', '--seed', dest='seed', default=None,
                        help="the seed to use for the randome generator (default None)")
    parser.add_argument('-t', '--timeRange', dest='timeRange', default=10000,
                        help="the timestamp range (0 to N-1), default N=10000")

    args = parser.parse_args(argv[1:])
    if args.debug:
        print "Args:", str(args)

    # ---- Step 2: Generate the output
    with open(args.outfile, 'w') as outfile:
        genOutput(args.rmat, outfile,
                  seed=args.seed, timestampRange=args.timeRange)

#----------------------------------------------------------------------------
if __name__ == "__main__":
    temporalTriangleEnrichment(sys.argv)

2.2.5. Pre-generated datasets

We have already generated many datasets over a wide range of sizes that can be downloaded directly from our site: