1.2. Unified Host and Network Dataset

1.2.1. Data modifications

We modified the data in the following way:

  • We transformed “PortXXXX” to “XXXX” so that we can use an INTEGER data type for storing and manipulating both the Source Port and Destination Port.

  • We transformed the Windows Logging Service (wls) data from JSON to CSV format.

  • The wls records are divided into two categories:

    • host-events about a single device (file name has appended _1v.csv)

    • authentication events (auth-event) that may involve one or two devices (file name has appended _2v.csv)

  • The wls records have inconsistent usage of these three fields: LogHost, Source, and Destination. Within any record, there may be one or two of Source, and Destination fields that have an empty string. Since we wish to use the Source and Destination fields as indicators of the source and target vertices of each edge, we ensure they have non-empty values by doing the following updates to each record (independently):

    • if Source == "" then set Source = LogHost;

    • if Destination == "" then set Destination = LogHost;

1.2.2. Graph schema

There is a single vertex type that LANL calls device.

import xgt
con = xgt.Connection()
devices = con.create_vertex_frame(
    name='Devices',
    schema=[['device',xgt.TEXT]],

The Netflow edges look like this.

import xgt
con = xgt.Connection()
netflow = con.create_edge_frame(
    name='Netflow',
    schema=[['epoch_time', xgt.INT],
            ['duration', xgt.INT],
            ['src_device', xgt.TEXT],
            ['dst_device', xgt.TEXT],
            ['protocol', xgt.INT],
            ['src_port', xgt.INT],
            ['dst_port', xgt.INT],
            ['src_packets', xgt.INT],
            ['dst_packets', xgt.INT],
            ['src_bytes', xgt.INT],
            ['dst_bytes', xgt.INT]],
    source = 'Devices',
    target = 'Devices',
    source_key='src_device',
    target_key='dst_device')

The host-event wls edges look like this.

import xgt
con = xgt.Connection()
host_events = con.create_edge_frame(
    name='HostEvents',
    schema=[['epoch_time', xgt.INT],
            ['event_id', xgt.INT],
            ['log_host', xgt.TEXT],
            ['user_name', xgt.TEXT],
            ['domain_name', xgt.TEXT],
            ['logon_id', xgt.INT],
            ['process_name', xgt.TEXT],
            ['process_id', xgt.INT],
            ['parent_process_name', xgt.TEXT],
            ['parent_process_id', xgt.INT]],
    source = 'Devices',
    target = 'Devices',
    source_key='log_host',
    target_key='log_host')

The auth-event wls edges look like this.

import xgt
con = xgt.Connection()
auth_events = con.create_edge_frame(
    name='AuthEvents',
    schema=[['epoch_time',xgt.INT],
            ['event_id', xgt.INT],
            ['log_host', xgt.TEXT],
            ['logon_type',xgt.INT],
            ['logon_typeDescription',xgt.TEXT],
            ['username',xgt.TEXT],
            ['domain_name',xgt.TEXT],
            ['logon_id',xgt.INT],
            ['subject_username',xgt.TEXT],
            ['subject_domain_name',xgt.TEXT],
            ['subject_logon_id',xgt.TEXT],
            ['status',xgt.TEXT],
            ['source',xgt.TEXT],
            ['service_name',xgt.TEXT],
            ['destination',xgt.TEXT],
            ['authentication_package',xgt.TEXT],
            ['failure_reason',xgt.TEXT],
            ['process_name',xgt.TEXT],
            ['process_id',xgt.INT],
            ['parent_process_name',xgt.TEXT],
            ['parent_process_id',xgt.INT]],
    source = 'Devices',
    target = 'Devices',
    source_key = 'source',
    target_key = auth_target_key)