Overview

The USC/ISI GAWSEED project is threat feeds of datasets from global sources that may be used to search enterprise logs for potential concerns.

Please do not redistribute any data found in these feeds without permission.

You may wish to subscribe to the #isi-gift channel in slack too.

Status

These feeds should currently be considered ALPHA/BETA in quality and may occasionally be wiped, updated, refreshed, etc as the project gathers feedback and works toward a stable platform.

The feed data to date is intended to show a breadth of data that fits into this space, not all of which may be as useful or relevant as others. EG, the mediawiki blacklist data is probably the least useful but shows an example of where threat data can be pulled from. Not all of the feeds describe are populated, and this feed list has a lot of additions lined up for early 2020.

A code and framework to search Kafka/Druid/Bro data for matching against threats in the feeds to produce reports is available on github. It needs further documentation to be super-useful. See Wes for example configuration files for running it.

Kafka stream information

The threat feed is published via kafka:

bootstrap_servers: ['k01.ant.isi.edu', 'k02.ant.isi.edu']

The servers are not currently open to the publish. To request your source address (or small block) to be whitelisted, please contact Wes Hardaker.

Feed Format

Each Kafka topic will contain only a single type of data as a search key (eg: just ipaddresses, just dns names, etc). But they may contain multiple sources of information, which are deliniated by ’tag’s.

Each feed item will consist of a JSON dictionary with the following entries:

Datatypes

Datatypes currently in use:

- 'ip_address'
- 'url_regexp'
- 'url_raw'
- 'dns_regexp'
- 'dns_raw'
- 'ip_address_range' ['a.b.c.d','w.x.y.z']
- 'ip_address_with_mask' (a.b.c.d/n)

Tags

Each feed item will be associated with a tag indicative of its source. Tags will be a ‘:’ separated list, starting with the source of the information (e.g. “uscisi” if it originated from our analysis), followed by more details about it’s sub-source.

Attributes

Any other supplimental, tag-specific, attributes will be included in the other_attributes field. Note: the goal of the feed is to provide minimal amount of information to search on. A future “callback” mechanism will be created to serve additional information about a particular entry. Thus, the attributes inserted are generally limited to just what is needed to perform a remote search on the data.

Priority

The priority value described in the field list below ranges from 1-10 and is an base indication of how important that feed tag should be considered dangerous by a network administrator when a match is found. Each feed entry may have a ‘priority_adj’ value that should be added to the feed’s base_priority.

Update Frequency

The feeds are updated on a periodic basis. The ‘update_frequency’ indicates the typical period that the feed is expected to add new values. Note that repeated keys are possible from period to period and may serve as an indication that the threat is still current.

Source

If the source for the feed is external, a ‘source’ field below will indicate where the feed data came from.

Current Feed Items

GAWSEED honeypot source ips

IP addresses known to have logged into Internet honeypots. Connection matches from these IoCs to any destination should be suspect. Of particular importance are internal destinations running SSH servers, as a match may indicate a sign of compromise. Also important are outgoing connections to these addresses, which may indicate compromise or potential data ex-filtration.

GAWSEED honeypot destination ips

IP addresses known to be outgoing targets from honeypot commands being issued by intruders. Outgoing connections to matching addresses from an enterprise could be early indicators of potential compromise or data ex-filtration.

GAWSEED honeypot destination dns names

DNS names known to have been used by attackers logging into honeypots and downloading files from servers running on these names. The format of the value field will be a raw string coming from a DNS extraction. A priority_adj should exist that reflects the confidence level of a particular entry being problematic based on its popularity within the DNS. Less popular entries are perceived to be more important, while more popular entries (github.com, google, etc) have a negative priority_adj score.

GAWSEED honeypot destination dns names

DNS names known to have been used by attackers when logging into honeypots and downloading files from servers running on these names. The format of the value field will be a regexp string coming from a DNS extraction. A priority_adj should exist that reflects the confidence level of a particular entry being problematic based on its popularity within the DNS. Less popular entries are perceived to be more important, while more popular entries (github.com, google.com, etc) have a negative priority_adj score.

domain expert dns names

DNS names known by human analysts to be of concern. These names may have been gathered from media, recent events, internal research, etc.

GAWSEED honeypot destination URLs

URLs patterns known to have ties to intruders that have logged into honeypots and executed commands like wget/curl/etc referencing these patterns. Currently this is mainly suffixes of URLs matching script names that have been downloaded. We ignore unique URL suffixes. Unfortunately, common false positives do exist in this data feed since hackers have been routinely adding common path suffixes to their download targets, such as /21, /80 and /f.

GAWSEED honeypot destination URLs

URLs known to have ties to intruders that have logged into honeypots and executed commands like wget/curl/etc referencing these patterns. Currently this is mainly suffixes of URLs matching script names that have been downloaded. We ignore unique URL suffixes. Unfortunately, common false positives do exist in this data feed since hackers have been routinely adding common path suffixes to their download targets, such as /21, /80 and /f.

mediawiki blacklist

IP addresses known to have contributed spam to mediawiki based wikis. Connections from these addresses should be reviewed, though it is unclear how malicious the source might be.

GAWSEED DDoS sources

IP addresses known to have suddenly increased in sending potentially suspicious DNS requests to a domain.

GAWSEED DGA Sources

IP addresses known to have suddenly increased in sending potentially suspicious DNS requests to one or more public suffix list break points.

GAWSEED DGA Names

A regular expression to match DNS names that were seen in a potential Dynamically Generated Algorithm DNS query storm. Similar DNS requests within an enterprise may be an indicator of machines running botnet software.

SANS/DShield recommend block list

From the source: This list summarizes the top 20 attacking class C (/24) subnets over the last three days. The number of ‘attacks’ indicates the number of targets reporting scans from this subnet.

SANS attack source list

From the source: “Information summary from the last 30 days about source IPs with return limit.”

SANS threat category research addresses

These are supposedly ip addresses known to be research projects, and thus likely safe. Ish. Useful for creating accept-lists.

GAWSEED DNS tunnel detection domain-names

DNS names from DNS tunnel broker endpoints. DNS tunnels are a common way to circumvent firewalls, restricted access, etc. They are less efficient than other tunnel technologies, so usage likely indicates suspicious behavior. Queries from enterprise networks may indicate hosts that are hiding traffic or violating netowrk usage policies.

B-Root DNS Backscatter

This feed reports IPv4 and v6 addresses that show large amounts of DNS backscatter (reverse DNS queries from many different places). DNS backscatter often occurs when firewalls and IDS systems look up who is bothering them, and such IP addresses are often scanners or spammers. Backscatter also occurs in some benign behaviors, such as NTP servers. For details and technical papers, see https://ant.isi.edu/datasets/dns_backscatter/ . The current feed includes daily counts (an indication of level of activity). A future version will add classification (spammer, scanner, ntp, cloud, etc.). This feed could cause a high number of false positives if lots of spam is being received by the network from common spam sources. The priority_adj field should help remove some of the false positives by considering only records with a priority_adj of 1.

Spamhaus DROP list

From the source: Spamhaus Don’t Route Or Peer List (DROP) (c) 2020 The Spamhaus Project The DROP list will not include any IP address space under the control of any legitimate network - even if being used by “the spammers from hell”. DROP will only include netblocks allocated directly by an established Regional Internet Registry (RIR) or National Internet Registry (NIR) such as ARIN, RIPE, AFRINIC, APNIC, LACNIC or KRNIC or direct RIR allocations.

Spamhaus EDROP list

From the source: Spamhaus Extended DROP List (EDROP) (c) 2020 The Spamhaus Project EDROP is an extension of the DROP list that includes suballocated netblocks controlled by spammers or cyber criminals. EDROP is meant to be used in addition to the direct allocations on the DROP list.

Spamhaus DROPv6 list

From the source: Spamhaus IPv6 DROP List (DROPv6) (c) 2020 The Spamhaus Project The DROPv6 list includes IPv6 ranges allocated to spammers or cyber criminals. DROPv6 will only include IPv6 netblocks allocated directly by an established Regional Internet Registry (RIR) or National Internet Registry (NIR) such as ARIN, RIPE, AFRINIC, APNIC, LACNIC or KRNIC or direct RIR allocations.

dataplane vnc rfb list

From the source: Entries below consist of fields with identifying characteristics of a source IP address that has been seen initiating a VNC remote frame buffer (RFB) session to a remote host. This report lists hosts that are suspicious of more than just port scanning. These hosts may be VNC server cataloging or conducting various forms of remote access abuse. Each entry is sorted according to a route origination ASN. An entry for the IP address may be listed more than once if there are multiple origin AS (MOAS) announcements for the covering prefix. We use the Team Cymru IP address to ASN mapping service to construct an origin AS number and name.

dataplane dns ANY list

From the source: Entries below consist of fields with identifying characteristics of a source IP address that has been identified as sending recursive DNS IN ANY queries to a remote host. This report lists addresses that may be cataloging open DNS resolvers for the purpose of later using them to facilitate DNS amplification and reflection attacks. Each entry is sorted according to a route origination ASN. An entry for the IP address may be listed more than once if there are multiple origin AS (MOAS) announcements for the covering prefix. We use the Team Cymru IP address to ASN mapping service to construct an origin AS number and name.

dataplane SSH PW authentication attempts

From the source: Entries below consist of fields with identifying characteristics of a a source IP address that has been seen attempting to remotely login to a host using SSH password authentication. This report lists hosts that are highly suspicious and are likely conducting malicious SSH password authentication attacks. Each entry is sorted according to a route origination ASN. An entry for the IP address may be listed more than once if there are multiple origin AS (MOAS) announcements for the covering prefix. We use the Team Cymru IP address to ASN mapping service to construct an origin AS number and name.

URLhaus abuse list

Note: entries marked as “offline” by the source have a reduced priority_adj From the source: URLhaus is a project operated by abuse.ch. The purpose of the project is to collect, track and share malware URLs, helping network administrators and security analysts to protect their network and customers from cyber threats. Submissions to URLhaus are being shared with security solution providers, antivirus vendors and blacklist providers, including: Google Safe Browsing (GSB) Spamhaus DBL SURBL

URLhaus abuse list raw

Note: entries marked as “offline” by the source have a reduced priority_adj From the source: URLhaus is a project operated by abuse.ch. The purpose of the project is to collect, track and share malware URLs, helping network administrators and security analysts to protect their network and customers from cyber threats. Submissions to URLhaus are being shared with security solution providers, antivirus vendors and blacklist providers, including: Google Safe Browsing (GSB) Spamhaus DBL SURBL This feed is identical to the urlhaus:abuse:urls but is in raw form rather than regexp encoded.

Honeypot active root-kit and other binaries

These binaries were seen being used within a honeypot network. If transferred across an enterprise boundary, they almost certainly are indicative of a compromise. Some false positives may be present, but for known false positives a priority_adj field will be added. The other_attributes field will contain extra information that includes a file_type and mime attribute describing the contents of the file, as returned by various combinations of magic/mime analysis.

Top SANS malicious binaries seen

From the source: MOST PREVALENT MALWARE FILES COMPILED BY TALOS SECURITY INTELLIGENCE AND RESEARCH GROUP

CISCO Talos Intelligence IP block list

A blocklist compiled by Talos Intelligence at CISCO

abuse.ch feodotracker IP block list

From the source: Dridex, Heodo (aka Emotet) and TrickBot botnet command&control servers (C&Cs) reside on compromised servers and servers that have been rent and setup by the botnet herder itself for the sole purpose of botnet hosting. Feodo Tracker offers a blocklist of IP addresses that are associated with such botnet C&Cs that can be used to detect and block botnet C2 traffic from infected machines towards the internet. An IP address will only get added to the blocklist if it responds with a valid botnet C2 response. However, a botnet C2 may become offline later. The Botnet C2 IP Blocklist is available in different formats documented below.

abuse.ch malware tracker

New additions to abuse.ch in the last 48 hours, per the source

openphish URLs

Data feed collected from openphish.com; there terms of service for using the data can be found here: https://openphish.com/terms.html

openphish URLs

Data feed collected from openphish.com; there terms of service for using the data can be found here: https://openphish.com/terms.html

Parsons TPOT known bad IPv4 addresses

This list contains a list of known bad IP addresses from Parson’s running t-pot instance. Note: Parsons feed is asn: ASN associated with the IP address if known ‘0’ if unknown commands_risk simple interpretation of command line commands attempted by IP address, currently these values 0: no command seen 3: commands seen 6: particularly bad commands seen

Data Format {“timestamp”: “%d”, “value”: “%s”, “datatype”: “ip_address”, “tag”: “parsons:gawseed:badips:tpot{%d}”}, “other_attributes”: { “asn”: “%d”, “commands_risk”: “%d” } }

Parsons TPOT cumulative bad IPv4 addresses

This is a list of bad IP information for 1-day window across multiple Parson t-pot instances. Note: Parsons feed is count: total number of times the IP was seen tpots: number of tpots the IP was seen on winstart: timestamp of window start time winend: timestamp of window end time Data Format { “value”: “%s”, “datatype”: “ip_address”, “tag”: “parsons:gawseed:cumulative:badips:tpots”, “other_attributes”: { “count”: “%d”, “tpots”: “%d”, “winstart”: “%d”, “winend”: “%d” } }

James Brine discovered brute force addresses

IP addresses attempting to brute force SSH discovered by James Brine. See https://jamesbrine.com.au/ for details.

IP Summary Feed

A summary of the daily IP addresses seen with a priority_adj

ISI/mcollins research IP addresses

The acknowledged scanner repository tracks the ip addresses and other indicia of scanners which, we believe are non-hostile.

ISI/mcollins research IP address blocks

The acknowledged scanner repository tracks the ip addresses and other indicia of scanners which, we believe are non-hostile.

SSH scans received at B-root anycast instance in ARI

IP addresses known to have attempted probing the SSH port at the USC/ISI root server in ARI.

SSH scans received at B-root anycast instance in AMS

IP addresses known to have attempted probing the SSH port at the USC/ISI root server in AMS.

SSH scans received at B-root anycast instance in IAD

IP addresses known to have attempted probing the SSH port at the USC/ISI root server in IAD.

SSH scans received at B-root anycast instance in LAX

IP addresses known to have attempted probing the SSH port at the USC/ISI root server in LAX.

SSH scans received at B-root anycast instance in SIN

IP addresses known to have attempted probing the SSH port at the USC/ISI root server in SIN.

SSH scans received at B-root anycast instance in MIA

IP addresses known to have attempted probing the SSH port at the USC/ISI root server in MIA.

SANS Suspicious Domain list - High

This service has since been suspended, but remains in the gawseed descriptive list for historical purposes. From the source: This list consists of High Level Sensitivity website URLs A priority_adj should exist that reflects the confidence level of a particular entry being problematic based on its popularity within the DNS. Less popular entries are perceived to be more important, while more popular entries (github.com, google, etc) have a negative priority_adj score. UPDATE: this list has been retracted from SANS and is no longer active, but this description is left for historic purposes.

SANS Suspicious Domain list - Medium

This service has since been suspended, but remains in the gawseed descriptive list for historical purposes. From the source: This list consists of Medium Level Sensitivity website URLs A priority_adj should exist that reflects the confidence level of a particular entry being problematic based on its popularity within the DNS. Less popular entries are perceived to be more important, while more popular entries (github.com, google, etc) have a negative priority_adj score. UPDATE: this list has been retracted from SANS and is no longer active, but this description is left for historic purposes.

SANS Suspicious Domain list - Low

This service has since been suspended, but remains in the gawseed descriptive list for historical purposes. From the source: – This list consists of Low Level Sensitivity website URLs. – A priority_adj should exist that reflects the confidence level of a particular entry being problematic based on its popularity within the DNS. Less popular entries are perceived to be more important, while more popular entries (github.com, google, etc) have a negative priority_adj score. UPDATE: this list has been retracted from SANS and is no longer active, but this description is left for historic purposes. Note :This may contain many false positives

Covid-19 malware ip addresses

This source is deprecated but is kept in the GAWSEED source list for historical purposes. From the source: IP addresses known to be used by malware targeting people’s fear with respect to the COVID-19 pandemic. Note: the confidence level of this data is unknown and may result in false positives.

Covid-19 malware URLs

This source is deprecated but is kept in the GAWSEED source list for historical purposes. From the source: URLs known to be used by malware targeting people’s fear with respect to the COVID-19 pandemic. Note: the confidence level of this data is unknown and may result in false positives.

Covid-19 malware URLs

This source is deprecated but is kept in the GAWSEED source list for historical purposes. From the source: URLs known to be used by malware targeting people’s fear with respect to the COVID-19 pandemic. Note: the confidence level of this data is unknown and may result in false positives.

Covid-19 DNS names from malicious URLs

This source is deprecated but is kept in the GAWSEED source list for historical purposes. From the source: DNS names known to be used by malware targeting people’s fear with respect to the COVID-19 pandemic. A priority_adj should exist that reflects the confidence level of a particular entry being problematic based on its popularity within the DNS. Less popular entries are perceived to be more important, while more popular entries (github.com, google, etc) have a negative priority_adj score. Note: the confidence level of this data is unknown and may result in false positives.

Covid-19 DNS names from malicious URLs

This source is deprecated but is kept in the GAWSEED source list for historical purposes. From the source: DNS names known to be used by malware targeting people’s fear with respect to the COVID-19 pandemic. A priority_adj should exist that reflects the confidence level of a particular entry being problematic based on its popularity within the DNS. Less popular entries are perceived to be more important, while more popular entries (github.com, google, etc) have a negative priority_adj score. Note: the confidence level of this data is unknown and may result in false positives.
These are the raw DNS names, not regexp encoded.

Covid-19 newly registered DNS names

This source is deprecated but is kept in the GAWSEED source list for historical purposes. From the source: DNS names recently registered related to the COVID-19 pandemic. A priority_adj should exist that reflects the confidence level of a particular entry being problematic based on its popularity within the DNS. Less popular entries are perceived to be more important, while more popular entries (github.com, google, etc) have a negative priority_adj score. Note: the confidence level of this data is very low and will result in (likely many) false positives.

JKellog90 Honeypot source ip addresses

This source is deprecated but is kept in the GAWSEED source list for historical purposes. IP addresses known to have logged into an SSH based honeypot. Matches to any connection should be suspect. Of particular importance are internal destinations running SSH servers, as a match may indicate a sign of compromise. Also important are outgoing connections to these addresses, which may indicate compromise or potential data ex-filtration.

JKellog90 Honeypot binaries

This source is deprecated but is kept in the GAWSEED source list for historical purposes. sha256 sums of files of files uploaded to an SSH based honeypot.

JKellog90 Honeypot raw URLs

This source is deprecated but is kept in the GAWSEED source list for historical purposes. URLs accessed from within an ssh honeypot, likely containing malicious binaries.

JKellog90 Honeypot regexp URLs

This source is deprecated but is kept in the GAWSEED source list for historical purposes. Regexps for URLs accessed from within an ssh honeypot, likely containing malicious binaries.

Example Feed Data

Here are some example ouputs from the gawseed_ipaddress feed:

{"timestamp": "1576627200",
 "value": "94.191.72.82:5893",
 "datatype": "ip_address",
 "tag": "uscisi:gawseed:honeypot_ips:urls:dests"}
{"timestamp": "1576627200",
 "value": "211.141.179.140",
 "datatype": "ip_address",
 "tag": "uscisi:gawseed:honeypot_ips:urls:dests"}
{"value": "14.191.226.204",
 "timestamp": "%s",
 "datatype": "ip_address",
 "tag": "uscisi:gawseed:honeypot_ips:dests"}
{"value": "37.204.101.200",
 "timestamp": "%s",
 "datatype": "ip_address",
 "tag": "uscisi:gawseed:honeypot_ips:dests"}
{"value": "140.207.46.136",
 "timestamp": "%s",
 "datatype": "ip_address",
 "tag": "uscisi:gawseed:honeypot_ips:dests"}
{"value": "149.129.57.214",
 "timestamp": "%s",
 "datatype": "ip_address",
 "tag": "uscisi:gawseed:honeypot_ips:dests"}
{"value": "121.170.222.92",
 "timestamp": "%s",
 "datatype": "ip_address",
 "tag": "uscisi:gawseed:honeypot_ips:dests"}