The USC/ISI GAWSEED project is threat feeds of datasets from global sources that may be used to search enterprise logs for potential concerns.
Please do not redistribute any data found in these feeds without permission.
You may wish to subscribe to the #isi-gift channel in slack too.
These feeds should currently be considered ALPHA/BETA in quality and may occasionally be wiped, updated, refreshed, etc as the project gathers feedback and works toward a stable platform.
The feed data to date is intended to show a breadth of data that fits into this space, not all of which may be as useful or relevant as others. EG, the mediawiki blacklist data is probably the least useful but shows an example of where threat data can be pulled from. Not all of the feeds describe are populated, and this feed list has a lot of additions lined up for early 2020.
A code and framework to search Kafka/Druid/Bro data for matching against threats in the feeds to produce reports is available on github. It needs further documentation to be super-useful. See Wes for example configuration files for running it.
The threat feed is published via kafka:
bootstrap_servers: ['k01.ant.isi.edu', 'k02.ant.isi.edu']
The servers are not currently open to the publish. To request your source address (or small block) to be whitelisted, please contact Wes Hardaker.
Each Kafka topic will contain only a single type of data as a search key (eg: just ipaddresses, just dns names, etc). But they may contain multiple sources of information, which are deliniated by ’tag’s.
Each feed item will consist of a JSON dictionary with the following entries:
Datatypes currently in use:
- 'ip_address'
- 'url_regexp'
- 'url_raw'
- 'dns_regexp'
- 'dns_raw'
- 'ip_address_range' ['a.b.c.d','w.x.y.z']
- 'ip_address_with_mask' (a.b.c.d/n)
Each feed item will be associated with a tag indicative of its source. Tags will be a ‘:’ separated list, starting with the source of the information (e.g. “uscisi” if it originated from our analysis), followed by more details about it’s sub-source.
Any other supplimental, tag-specific, attributes will be included in the other_attributes field. Note: the goal of the feed is to provide minimal amount of information to search on. A future “callback” mechanism will be created to serve additional information about a particular entry. Thus, the attributes inserted are generally limited to just what is needed to perform a remote search on the data.
The priority value described in the field list below ranges from 1-10 and is an base indication of how important that feed tag should be considered dangerous by a network administrator when a match is found. Each feed entry may have a ‘priority_adj’ value that should be added to the feed’s base_priority.
The feeds are updated on a periodic basis. The ‘update_frequency’ indicates the typical period that the feed is expected to add new values. Note that repeated keys are possible from period to period and may serve as an indication that the threat is still current.
If the source for the feed is external, a ‘source’ field below will indicate where the feed data came from.
IP addresses known to have logged into Internet honeypots. Connection matches from these IoCs to any destination should be suspect. Of particular importance are internal destinations running SSH servers, as a match may indicate a sign of compromise. Also important are outgoing connections to these addresses, which may indicate compromise or potential data ex-filtration.
IP addresses known to be outgoing targets from honeypot commands being issued by intruders. Outgoing connections to matching addresses from an enterprise could be early indicators of potential compromise or data ex-filtration.
DNS names known to have been used by attackers logging into honeypots and downloading files from servers running on these names. The format of the value field will be a raw string coming from a DNS extraction. A priority_adj should exist that reflects the confidence level of a particular entry being problematic based on its popularity within the DNS. Less popular entries are perceived to be more important, while more popular entries (github.com, google, etc) have a negative priority_adj score.
DNS names known to have been used by attackers when logging into honeypots and downloading files from servers running on these names. The format of the value field will be a regexp string coming from a DNS extraction. A priority_adj should exist that reflects the confidence level of a particular entry being problematic based on its popularity within the DNS. Less popular entries are perceived to be more important, while more popular entries (github.com, google.com, etc) have a negative priority_adj score.
DNS names known by human analysts to be of concern. These names may have been gathered from media, recent events, internal research, etc.
URLs patterns known to have ties to intruders that have logged into honeypots and executed commands like wget/curl/etc referencing these patterns. Currently this is mainly suffixes of URLs matching script names that have been downloaded. We ignore unique URL suffixes. Unfortunately, common false positives do exist in this data feed since hackers have been routinely adding common path suffixes to their download targets, such as /21, /80 and /f.
URLs known to have ties to intruders that have logged into honeypots and executed commands like wget/curl/etc referencing these patterns. Currently this is mainly suffixes of URLs matching script names that have been downloaded. We ignore unique URL suffixes. Unfortunately, common false positives do exist in this data feed since hackers have been routinely adding common path suffixes to their download targets, such as /21, /80 and /f.
IP addresses known to have contributed spam to mediawiki based wikis. Connections from these addresses should be reviewed, though it is unclear how malicious the source might be.
IP addresses known to have suddenly increased in sending potentially suspicious DNS requests to a domain.
IP addresses known to have suddenly increased in sending potentially suspicious DNS requests to one or more public suffix list break points.
A regular expression to match DNS names that were seen in a potential Dynamically Generated Algorithm DNS query storm. Similar DNS requests within an enterprise may be an indicator of machines running botnet software.
From the source: This list summarizes the top 20 attacking class C (/24) subnets over the last three days. The number of ‘attacks’ indicates the number of targets reporting scans from this subnet.
From the source: “Information summary from the last 30 days about source IPs with return limit.”
These are supposedly ip addresses known to be research projects, and thus likely safe. Ish. Useful for creating accept-lists.
DNS names from DNS tunnel broker endpoints. DNS tunnels are a common way to circumvent firewalls, restricted access, etc. They are less efficient than other tunnel technologies, so usage likely indicates suspicious behavior. Queries from enterprise networks may indicate hosts that are hiding traffic or violating netowrk usage policies.
This feed reports IPv4 and v6 addresses that show large amounts of DNS backscatter (reverse DNS queries from many different places). DNS backscatter often occurs when firewalls and IDS systems look up who is bothering them, and such IP addresses are often scanners or spammers. Backscatter also occurs in some benign behaviors, such as NTP servers. For details and technical papers, see https://ant.isi.edu/datasets/dns_backscatter/ . The current feed includes daily counts (an indication of level of activity). A future version will add classification (spammer, scanner, ntp, cloud, etc.). This feed could cause a high number of false positives if lots of spam is being received by the network from common spam sources. The priority_adj field should help remove some of the false positives by considering only records with a priority_adj of 1.
From the source: Spamhaus Don’t Route Or Peer List (DROP) (c) 2020 The Spamhaus Project The DROP list will not include any IP address space under the control of any legitimate network - even if being used by “the spammers from hell”. DROP will only include netblocks allocated directly by an established Regional Internet Registry (RIR) or National Internet Registry (NIR) such as ARIN, RIPE, AFRINIC, APNIC, LACNIC or KRNIC or direct RIR allocations.
From the source: Spamhaus Extended DROP List (EDROP) (c) 2020 The Spamhaus Project EDROP is an extension of the DROP list that includes suballocated netblocks controlled by spammers or cyber criminals. EDROP is meant to be used in addition to the direct allocations on the DROP list.
From the source: Spamhaus IPv6 DROP List (DROPv6) (c) 2020 The Spamhaus Project The DROPv6 list includes IPv6 ranges allocated to spammers or cyber criminals. DROPv6 will only include IPv6 netblocks allocated directly by an established Regional Internet Registry (RIR) or National Internet Registry (NIR) such as ARIN, RIPE, AFRINIC, APNIC, LACNIC or KRNIC or direct RIR allocations.
From the source: Entries below consist of fields with identifying characteristics of a source IP address that has been seen initiating a VNC remote frame buffer (RFB) session to a remote host. This report lists hosts that are suspicious of more than just port scanning. These hosts may be VNC server cataloging or conducting various forms of remote access abuse. Each entry is sorted according to a route origination ASN. An entry for the IP address may be listed more than once if there are multiple origin AS (MOAS) announcements for the covering prefix. We use the Team Cymru IP address to ASN mapping service to construct an origin AS number and name.
From the source: Entries below consist of fields with identifying characteristics of a source IP address that has been identified as sending recursive DNS IN ANY queries to a remote host. This report lists addresses that may be cataloging open DNS resolvers for the purpose of later using them to facilitate DNS amplification and reflection attacks. Each entry is sorted according to a route origination ASN. An entry for the IP address may be listed more than once if there are multiple origin AS (MOAS) announcements for the covering prefix. We use the Team Cymru IP address to ASN mapping service to construct an origin AS number and name.
From the source: Entries below consist of fields with identifying characteristics of a a source IP address that has been seen attempting to remotely login to a host using SSH password authentication. This report lists hosts that are highly suspicious and are likely conducting malicious SSH password authentication attacks. Each entry is sorted according to a route origination ASN. An entry for the IP address may be listed more than once if there are multiple origin AS (MOAS) announcements for the covering prefix. We use the Team Cymru IP address to ASN mapping service to construct an origin AS number and name.
Note: entries marked as “offline” by the source have a reduced priority_adj From the source: URLhaus is a project operated by abuse.ch. The purpose of the project is to collect, track and share malware URLs, helping network administrators and security analysts to protect their network and customers from cyber threats. Submissions to URLhaus are being shared with security solution providers, antivirus vendors and blacklist providers, including: Google Safe Browsing (GSB) Spamhaus DBL SURBL
Note: entries marked as “offline” by the source have a reduced priority_adj From the source: URLhaus is a project operated by abuse.ch. The purpose of the project is to collect, track and share malware URLs, helping network administrators and security analysts to protect their network and customers from cyber threats. Submissions to URLhaus are being shared with security solution providers, antivirus vendors and blacklist providers, including: Google Safe Browsing (GSB) Spamhaus DBL SURBL This feed is identical to the urlhaus:abuse:urls but is in raw form rather than regexp encoded.
These binaries were seen being used within a honeypot network. If transferred across an enterprise boundary, they almost certainly are indicative of a compromise. Some false positives may be present, but for known false positives a priority_adj field will be added. The other_attributes field will contain extra information that includes a file_type and mime attribute describing the contents of the file, as returned by various combinations of magic/mime analysis.
From the source: MOST PREVALENT MALWARE FILES COMPILED BY TALOS SECURITY INTELLIGENCE AND RESEARCH GROUP
A blocklist compiled by Talos Intelligence at CISCO
From the source: Dridex, Heodo (aka Emotet) and TrickBot botnet command&control servers (C&Cs) reside on compromised servers and servers that have been rent and setup by the botnet herder itself for the sole purpose of botnet hosting. Feodo Tracker offers a blocklist of IP addresses that are associated with such botnet C&Cs that can be used to detect and block botnet C2 traffic from infected machines towards the internet. An IP address will only get added to the blocklist if it responds with a valid botnet C2 response. However, a botnet C2 may become offline later. The Botnet C2 IP Blocklist is available in different formats documented below.
New additions to abuse.ch in the last 48 hours, per the source
Data feed collected from openphish.com; there terms of service for using the data can be found here: https://openphish.com/terms.html
Data feed collected from openphish.com; there terms of service for using the data can be found here: https://openphish.com/terms.html
This list contains a list of known bad IP addresses from Parson’s running t-pot instance. Note: Parsons feed is asn: ASN associated with the IP address if known ‘0’ if unknown commands_risk simple interpretation of command line commands attempted by IP address, currently these values 0: no command seen 3: commands seen 6: particularly bad commands seen
Data Format {“timestamp”: “%d”, “value”: “%s”, “datatype”: “ip_address”, “tag”: “parsons:gawseed:badips:tpot{%d}”}, “other_attributes”: { “asn”: “%d”, “commands_risk”: “%d” } }
This is a list of bad IP information for 1-day window across multiple Parson t-pot instances. Note: Parsons feed is count: total number of times the IP was seen tpots: number of tpots the IP was seen on winstart: timestamp of window start time winend: timestamp of window end time Data Format { “value”: “%s”, “datatype”: “ip_address”, “tag”: “parsons:gawseed:cumulative:badips:tpots”, “other_attributes”: { “count”: “%d”, “tpots”: “%d”, “winstart”: “%d”, “winend”: “%d” } }
IP addresses attempting to brute force SSH discovered by James Brine. See https://jamesbrine.com.au/ for details.
A summary of the daily IP addresses seen with a priority_adj
The acknowledged scanner repository tracks the ip addresses and other indicia of scanners which, we believe are non-hostile.
The acknowledged scanner repository tracks the ip addresses and other indicia of scanners which, we believe are non-hostile.
IP addresses known to have attempted probing the SSH port at the USC/ISI root server in ARI.
IP addresses known to have attempted probing the SSH port at the USC/ISI root server in AMS.
IP addresses known to have attempted probing the SSH port at the USC/ISI root server in IAD.
IP addresses known to have attempted probing the SSH port at the USC/ISI root server in LAX.
IP addresses known to have attempted probing the SSH port at the USC/ISI root server in SIN.
IP addresses known to have attempted probing the SSH port at the USC/ISI root server in MIA.
This service has since been suspended, but remains in the gawseed descriptive list for historical purposes. From the source: This list consists of High Level Sensitivity website URLs A priority_adj should exist that reflects the confidence level of a particular entry being problematic based on its popularity within the DNS. Less popular entries are perceived to be more important, while more popular entries (github.com, google, etc) have a negative priority_adj score. UPDATE: this list has been retracted from SANS and is no longer active, but this description is left for historic purposes.
This service has since been suspended, but remains in the gawseed descriptive list for historical purposes. From the source: This list consists of Medium Level Sensitivity website URLs A priority_adj should exist that reflects the confidence level of a particular entry being problematic based on its popularity within the DNS. Less popular entries are perceived to be more important, while more popular entries (github.com, google, etc) have a negative priority_adj score. UPDATE: this list has been retracted from SANS and is no longer active, but this description is left for historic purposes.
This service has since been suspended, but remains in the gawseed descriptive list for historical purposes. From the source: – This list consists of Low Level Sensitivity website URLs. – A priority_adj should exist that reflects the confidence level of a particular entry being problematic based on its popularity within the DNS. Less popular entries are perceived to be more important, while more popular entries (github.com, google, etc) have a negative priority_adj score. UPDATE: this list has been retracted from SANS and is no longer active, but this description is left for historic purposes. Note :This may contain many false positives
This source is deprecated but is kept in the GAWSEED source list for historical purposes. From the source: IP addresses known to be used by malware targeting people’s fear with respect to the COVID-19 pandemic. Note: the confidence level of this data is unknown and may result in false positives.
This source is deprecated but is kept in the GAWSEED source list for historical purposes. From the source: URLs known to be used by malware targeting people’s fear with respect to the COVID-19 pandemic. Note: the confidence level of this data is unknown and may result in false positives.
This source is deprecated but is kept in the GAWSEED source list for historical purposes. From the source: URLs known to be used by malware targeting people’s fear with respect to the COVID-19 pandemic. Note: the confidence level of this data is unknown and may result in false positives.
This source is deprecated but is kept in the GAWSEED source list for historical purposes. From the source: DNS names known to be used by malware targeting people’s fear with respect to the COVID-19 pandemic. A priority_adj should exist that reflects the confidence level of a particular entry being problematic based on its popularity within the DNS. Less popular entries are perceived to be more important, while more popular entries (github.com, google, etc) have a negative priority_adj score. Note: the confidence level of this data is unknown and may result in false positives.
This source is deprecated but is kept in the GAWSEED source list for historical purposes. From the source: DNS names known to be used by malware targeting people’s fear with respect to the COVID-19 pandemic. A priority_adj should exist that reflects the confidence level of a particular entry being problematic based on its popularity within the DNS. Less popular entries are perceived to be more important, while more popular entries (github.com, google, etc) have a negative priority_adj score. Note: the confidence level of this data is unknown and may result in false positives.
These are the raw DNS names, not regexp encoded.
This source is deprecated but is kept in the GAWSEED source list for historical purposes. From the source: DNS names recently registered related to the COVID-19 pandemic. A priority_adj should exist that reflects the confidence level of a particular entry being problematic based on its popularity within the DNS. Less popular entries are perceived to be more important, while more popular entries (github.com, google, etc) have a negative priority_adj score. Note: the confidence level of this data is very low and will result in (likely many) false positives.
This source is deprecated but is kept in the GAWSEED source list for historical purposes. IP addresses known to have logged into an SSH based honeypot. Matches to any connection should be suspect. Of particular importance are internal destinations running SSH servers, as a match may indicate a sign of compromise. Also important are outgoing connections to these addresses, which may indicate compromise or potential data ex-filtration.
This source is deprecated but is kept in the GAWSEED source list for historical purposes. sha256 sums of files of files uploaded to an SSH based honeypot.
This source is deprecated but is kept in the GAWSEED source list for historical purposes. URLs accessed from within an ssh honeypot, likely containing malicious binaries.
This source is deprecated but is kept in the GAWSEED source list for historical purposes. Regexps for URLs accessed from within an ssh honeypot, likely containing malicious binaries.
Here are some example ouputs from the gawseed_ipaddress feed:
"timestamp": "1576627200",
{"value": "94.191.72.82:5893",
"datatype": "ip_address",
"tag": "uscisi:gawseed:honeypot_ips:urls:dests"}
"timestamp": "1576627200",
{"value": "211.141.179.140",
"datatype": "ip_address",
"tag": "uscisi:gawseed:honeypot_ips:urls:dests"}
"value": "14.191.226.204",
{"timestamp": "%s",
"datatype": "ip_address",
"tag": "uscisi:gawseed:honeypot_ips:dests"}
"value": "37.204.101.200",
{"timestamp": "%s",
"datatype": "ip_address",
"tag": "uscisi:gawseed:honeypot_ips:dests"}
"value": "140.207.46.136",
{"timestamp": "%s",
"datatype": "ip_address",
"tag": "uscisi:gawseed:honeypot_ips:dests"}
"value": "149.129.57.214",
{"timestamp": "%s",
"datatype": "ip_address",
"tag": "uscisi:gawseed:honeypot_ips:dests"}
"value": "121.170.222.92",
{"timestamp": "%s",
"datatype": "ip_address",
"tag": "uscisi:gawseed:honeypot_ips:dests"}