Quickstart

This guide will show you how to get started using Intake to read packet capture (PCAP) data. It assumes the reader is already familiar with tcpdump, the command-line packet analyzer. Given a tcpdump command, we will show how you can find the equivalent set of network packets with the Intake PCAP plugin.

Installation

For conda users, the Intake PCAP plugin is installed with the following commands:

conda install -c intake intake-pcap

If you wish to follow along with the tcpdump examples, consult your OS for the appropriate installation instructions.

Reading Existing PCAP File

The simplest use case for this plugin is to read an existing PCAP file. Assuming the path to this file is in the variable, filename, this will read the entire file into a dataframe.:

>>> import intake
>>> ds = intake.open_pcap(filename)
>>> df = ds.read()
>>> df
                        time       src_host src_port       dst_host dst_port protocol
0 2018-01-09 08:16:12.210010   192.168.0.39    54703  172.123.4.567      443      udp
1 2018-01-09 08:16:12.210910   192.168.0.39    54703  172.123.4.567      443      udp
2 2018-01-09 08:16:12.236176  172.123.4.567      443   192.168.0.39    54703      udp
3 2018-01-09 08:16:12.236543  172.123.4.567      443   192.168.0.39    54703      udp
4 2018-01-09 08:16:12.236726   192.168.0.39    54703  172.123.4.567      443      udp
5 2018-01-09 08:16:12.236791   192.168.0.39    54703  172.123.4.567      443      udp
6 2018-01-09 08:16:12.252565  172.123.4.567      443   192.168.0.39    54703      udp
7 2018-01-09 08:16:12.313082  172.123.4.567      443   192.168.0.39    54703      udp
8 2018-01-09 08:16:12.313479  172.123.4.567      443   192.168.0.39    54703      udp
...

The remaining sections will describe other use cases. But first, we must setup a sample catalog and its associated data.

Creating Sample Data

For Unix/Linux users with access to tcpdump, you can bootstrap a sample PCAP file with local traffic using the following:

sudo tcpdump -c 100 -w local.pcap

This will capture 100 packets (including but not exclusive to IP traffic) from the default network interface and write it to a file.

Otherwise, you can use examples/dump-live.py to write local traffic to a PCAP file. The syntax for this script is:

python examples/dump-live.py PATH INTERFACE LIMIT

where PATH is the path to a PCAP file, INTERFACE is the OS-specific network interface, and LIMIT is the number of captured packets.

Creating a Catalog

The remaining examples assume the existence of a catalog description file, catalog.yml, in the same directory as local.pcap.:

sources:
  raw_live:
    driver: pcap
    args:
      urlpath: ~
      interface: en0
      chunksize: 10
  raw_local:
    driver: pcap
    args:
      urlpath: '{{ CATALOG_DIR }}/local.pcap'
  tcp_local:
    driver: pcap
    args:
      urlpath: '{{ CATALOG_DIR }}/local.pcap'
      protocol: tcp
  udp_local:
    driver: pcap
    args:
      urlpath: '{{ CATALOG_DIR }}/local.pcap'
      protocol: udp

This file defines several sources based on the raw sample data we created in the previous section. We will now describe the output associated with each entry.

Reading a Live Stream

To read a live stream of packets, you will need to start the Python interpreter or Jupyter as a privileged user (root on Unix-like systems).

NOTE: Intake does not currently support streaming packets from the network interface. Packets will be placed into a dataframe in chunks (which can be adjusted by the user).

Example: Unfiltered tcpdump

This example will show the first 10 packets on the default interface. Each packet will be timestamped and the raw IP address will be displayed. No packets will be filtered. The exact output will vary depending on your local machine.:

$ sudo tcpdump -c 10 -tttt -n -q
2018-01-08 23:37:21.882212 IP 8.8.8.8.53 > 192.168.0.39.61362: UDP, length 172
2018-01-08 23:37:21.882927 IP 192.168.0.39.61447 > 52.12.34.56.443: tcp 0
2018-01-08 23:37:21.953415 IP 52.23.45.67.443 > 192.168.0.39.61445: tcp 0
2018-01-08 23:37:21.953528 IP 192.168.0.39.61445 > 52.23.45.67.443: tcp 0
2018-01-08 23:37:21.991435 IP 52.12.34.56.443 > 192.168.0.39.61447: tcp 0
2018-01-08 23:37:21.991523 IP 192.168.0.39.61447 > 52.12.34.56.443: tcp 0
2018-01-08 23:37:21.993620 IP 192.168.0.39.61447 > 52.12.34.56.443: tcp 517
2018-01-08 23:37:22.093955 IP 52.12.34.56.443 > 192.168.0.39.61447: tcp 0
2018-01-08 23:37:22.099580 IP 52.12.34.56.443 > 192.168.0.39.61447: tcp 1448
2018-01-08 23:37:22.099587 IP 52.12.34.56.443 > 192.168.0.39.61447: tcp 1448

Example: Get unfiltered stream of packets without catalog

This example is equivalent to the tcpdump example, except the packets will be available in a dataframe. The network interface is required though (typical values are en0 for macOS and eth0 for Linux).:

>>> import intake
>>> ds = intake.open_pcap(None, interface='en0', chunksize=10)
>>> df = ds.read()
>>> df
                        time      src_host src_port         dst_host dst_port protocol
0 2018-01-09 07:42:36.055605   52.12.34.56      443     192.168.0.39    61614      tcp
1 2018-01-09 07:42:36.055682  192.168.0.39    61614      52.12.34.56      443      tcp
2 2018-01-09 07:42:37.839555  192.168.0.39    17500  255.255.255.255    17500      udp
3 2018-01-09 07:42:37.840472  192.168.0.39    17500    192.168.0.255    17500      udp
4 2018-01-09 07:42:37.890092  192.168.0.39    61614      52.12.34.56      443      tcp
5 2018-01-09 07:42:37.890243  192.168.0.39    61616      52.12.34.56      443      tcp
6 2018-01-09 07:42:37.912166   52.12.34.56      443     192.168.0.39    61616      tcp
7 2018-01-09 07:42:37.912237  192.168.0.39    61616      52.12.34.56      443      tcp
8 2018-01-09 07:42:37.912399  192.168.0.39    61616      52.12.34.56      443      tcp
9 2018-01-09 07:42:37.912833  192.168.0.39    61376     104.12.34.56     4070      tcp

Example: Get unfiltered stream of packets with catalog

This example is equivalent to the tcpdump example, except the packets will be available in a dataframe. The raw_live data source is defined above.:

>>> from intake.catalog import Catalog
>>> c = Catalog("catalog.yml")
>>> df = c.raw_live.read()
>>> df
                        time     src_host src_port         dst_host dst_port protocol
0 2018-01-09 07:47:26.825023  192.168.0.1    36123  239.255.255.250     1900      udp
1 2018-01-09 07:47:26.825845  192.168.0.1    36123  239.255.255.250     1900      udp
2 2018-01-09 07:47:26.826602  192.168.0.1    36123  239.255.255.250     1900      udp
3 2018-01-09 07:47:26.827547  192.168.0.1    36123  239.255.255.250     1900      udp
4 2018-01-09 07:47:26.828168  192.168.0.1    36123  239.255.255.250     1900      udp
5 2018-01-09 07:47:26.829162  192.168.0.1    36123  239.255.255.250     1900      udp
6 2018-01-09 07:47:26.829865  192.168.0.1    36123  239.255.255.250     1900      udp
7 2018-01-09 07:47:26.830832  192.168.0.1    36123  239.255.255.250     1900      udp
8 2018-01-09 07:47:26.831615  192.168.0.1    36123  239.255.255.250     1900      udp
9 2018-01-09 07:47:26.832476  192.168.0.1    36123  239.255.255.250     1900      udp

Reading a PCAP File

Example: Unfiltered tcpdump

This example will show the first 10 packets from local.pcap. Each packet will be timestamped and the raw IP address will be displayed. No packets will be filtered. The exact output will vary depending on your local machine:

$ tcpdump -c 10 -tttt -n -q -r local.pcap
2018-01-09 00:16:12.210010 IP 192.168.0.39.54703 > 172.123.4.567.443: UDP, length 1350
2018-01-09 00:16:12.210910 IP 192.168.0.39.54703 > 172.123.4.567.443: UDP, length 998
2018-01-09 00:16:12.236176 IP 172.123.4.567.443 > 192.168.0.39.54703: UDP, length 1350
2018-01-09 00:16:12.236543 IP 172.123.4.567.443 > 192.168.0.39.54703: UDP, length 31
2018-01-09 00:16:12.236726 IP 192.168.0.39.54703 > 172.123.4.567.443: UDP, length 41
2018-01-09 00:16:12.236791 IP 192.168.0.39.54703 > 172.123.4.567.443: UDP, length 38
2018-01-09 00:16:12.251367 STP 802.1d, Config, Flags [none], bridge-id 7b00.01:23:45:67:89:00.8002, length 35
2018-01-09 00:16:12.252565 IP 172.123.4.567.443 > 192.168.0.39.54703: UDP, length 30
2018-01-09 00:16:12.313082 IP 172.123.4.567.443 > 192.168.0.39.54703: UDP, length 814
2018-01-09 00:16:12.313479 IP 172.123.4.567.443 > 192.168.0.39.54703: UDP, length 16

Example: Get unfiltered stream of packets without catalog

This example is equivalent to the tcpdump example, except the packets will be available in a dataframe. You should note that there is one less packet in the output since the plugin only shows IP traffic; the tcpdump command includes all traffic by default.:

>>> import intake
>>> ds = intake.open_pcap("local.pcap")
>>> df = ds.read()
>>> df
                        time       src_host src_port       dst_host dst_port protocol
0 2018-01-09 08:16:12.210010   192.168.0.39    54703  172.123.4.567      443      udp
1 2018-01-09 08:16:12.210910   192.168.0.39    54703  172.123.4.567      443      udp
2 2018-01-09 08:16:12.236176  172.123.4.567      443   192.168.0.39    54703      udp
3 2018-01-09 08:16:12.236543  172.123.4.567      443   192.168.0.39    54703      udp
4 2018-01-09 08:16:12.236726   192.168.0.39    54703  172.123.4.567      443      udp
5 2018-01-09 08:16:12.236791   192.168.0.39    54703  172.123.4.567      443      udp
6 2018-01-09 08:16:12.252565  172.123.4.567      443   192.168.0.39    54703      udp
7 2018-01-09 08:16:12.313082  172.123.4.567      443   192.168.0.39    54703      udp
8 2018-01-09 08:16:12.313479  172.123.4.567      443   192.168.0.39    54703      udp

Example: Get unfiltered stream of packets with catalog

This example is equivalent to the tcpdump example, except the packets will be available in a dataframe. You should note that there is one less packet in the output since the plugin only shows IP traffic; the tcpdump command includes all traffic by default.:

>>> from intake.catalog import Catalog
>>> c = Catalog("catalog.yml")
>>> df = c.raw_local.read()
>>> df
                        time       src_host src_port       dst_host dst_port protocol
0 2018-01-09 08:16:12.210010   192.168.0.39    54703  172.123.4.567      443      udp
1 2018-01-09 08:16:12.210910   192.168.0.39    54703  172.123.4.567      443      udp
2 2018-01-09 08:16:12.236176  172.123.4.567      443   192.168.0.39    54703      udp
3 2018-01-09 08:16:12.236543  172.123.4.567      443   192.168.0.39    54703      udp
4 2018-01-09 08:16:12.236726   192.168.0.39    54703  172.123.4.567      443      udp
5 2018-01-09 08:16:12.236791   192.168.0.39    54703  172.123.4.567      443      udp
6 2018-01-09 08:16:12.252565  172.123.4.567      443   192.168.0.39    54703      udp
7 2018-01-09 08:16:12.313082  172.123.4.567      443   192.168.0.39    54703      udp
8 2018-01-09 08:16:12.313479  172.123.4.567      443   192.168.0.39    54703      udp

Filter data

The PCAP plugin will only show IP traffic. If you wish to only see traffic from one protocol, then you can specify one of these values (tcp, udp, icmp, and igmp) on the data source.

If you are familiar with the powerful filtering capabilities of tcpdump, then you will notice that the plugin’s filter is limited at this time.

Example: Get filtered stream of packets without catalog

>>> import intake
>>> ds = intake.open_pcap("local.pcap", protocol='udp')
>>> df = ds.read()
>>> df
                        time       src_host src_port       dst_host dst_port protocol
0 2018-01-09 08:16:12.210010   192.168.0.39    54703  172.123.4.567      443      udp
1 2018-01-09 08:16:12.210910   192.168.0.39    54703  172.123.4.567      443      udp
2 2018-01-09 08:16:12.236176  172.123.4.567      443   192.168.0.39    54703      udp
3 2018-01-09 08:16:12.236543  172.123.4.567      443   192.168.0.39    54703      udp
4 2018-01-09 08:16:12.236726   192.168.0.39    54703  172.123.4.567      443      udp
5 2018-01-09 08:16:12.236791   192.168.0.39    54703  172.123.4.567      443      udp
6 2018-01-09 08:16:12.252565  172.123.4.567      443   192.168.0.39    54703      udp
7 2018-01-09 08:16:12.303790  172.123.4.567      443   192.168.0.39    54703      udp
8 2018-01-09 08:16:12.313082  172.123.4.567      443   192.168.0.39    54703      udp
9 2018-01-09 08:16:12.313479  172.123.4.567      443   192.168.0.39    54703      udp

Example: Get filtered stream of packets with catalog

>>> from intake.catalog import Catalog
>>> c = Catalog("catalog.yml")
>>> df = c.udp_local.read()
>>> df
                        time       src_host src_port       dst_host dst_port protocol
0 2018-01-09 08:16:12.210010   192.168.0.39    54703  172.123.4.567      443      udp
1 2018-01-09 08:16:12.210910   192.168.0.39    54703  172.123.4.567      443      udp
2 2018-01-09 08:16:12.236176  172.123.4.567      443   192.168.0.39    54703      udp
3 2018-01-09 08:16:12.236543  172.123.4.567      443   192.168.0.39    54703      udp
4 2018-01-09 08:16:12.236726   192.168.0.39    54703  172.123.4.567      443      udp
5 2018-01-09 08:16:12.236791   192.168.0.39    54703  172.123.4.567      443      udp
6 2018-01-09 08:16:12.252565  172.123.4.567      443   192.168.0.39    54703      udp
7 2018-01-09 08:16:12.303790  172.123.4.567      443   192.168.0.39    54703      udp
8 2018-01-09 08:16:12.313082  172.123.4.567      443   192.168.0.39    54703      udp
9 2018-01-09 08:16:12.313479  172.123.4.567      443   192.168.0.39    54703      udp

Display packet payload

By default, the full packet data is not included. However, if you wish to see the binary data, then you can set payload=True on the data source. For example,:

>>> import intake
>>> ds = intake.open_pcap("local.pcap", payload=True)
>>> df = ds.read()
>>> df
                        time       src_host src_port       dst_host dst_port protocol  payload
0 2018-01-09 08:16:12.210010   192.168.0.39    54703  172.123.4.567      443      udp  j23j4n234023023d
1 2018-01-09 08:16:12.210910   192.168.0.39    54703  172.123.4.567      443      udp  df9b9i293ivaiqid
2 2018-01-09 08:16:12.236176  172.123.4.567      443   192.168.0.39    54703      udp  j23irg93f9129ed1
3 2018-01-09 08:16:12.236543  172.123.4.567      443   192.168.0.39    54703      udp  ni23nf2jg92j3f91
4 2018-01-09 08:16:12.236726   192.168.0.39    54703  172.123.4.567      443      udp  12dj1nd1281j2d12
5 2018-01-09 08:16:12.236791   192.168.0.39    54703  172.123.4.567      443      udp  ni12rn30fj9j1j2e
6 2018-01-09 08:16:12.252565  172.123.4.567      443   192.168.0.39    54703      udp  18291n182d12j912
7 2018-01-09 08:16:12.303790  172.123.4.567      443   192.168.0.39    54703      udp  21nd91n2f192fn91
8 2018-01-09 08:16:12.313082  172.123.4.567      443   192.168.0.39    54703      udp  n93f293nf2398f23
9 2018-01-09 08:16:12.313479  172.123.4.567      443   192.168.0.39    54703      udp  9tt9090239d903g9