Network Analysis #2

Building a decent PCAP analysis engine has turned out to be a lot of work. Since my last post I decided to scrap the Postgres database and RESTful api design for an Elasticsearch backend. The decision was primarily motivated by how unscalable the entire setup was. Every time I wanted to add a new log I had to create a corresponding model. Initially this seemed simple, but defining constraints became a huge issue. For example, DNS queries sometimes returned hundreds of lines of response data, which would break INSERTs 1 in 10000 times. A schema based backend forced me to define fields for every potential value, but many times only a fraction of the fields were populated, leaving tons of null data in my database. Another issue was the absolute shit, data-transport protocol I improvised. Moving the data from analysis nodes to storage nodes often took twice as long as the actual analysis.

Switching to an Elastic backend was a huge pain, but ended up being the perfect solution to the unstructured data I was storing. The result payed off as I did not have to define new log sources. Instead the processing nodes translate each extracted log into a a list of dictionaries. Each dictionary represents a row in the log. The result is wrapped in a JSON object, given a type and then stored in an analysis index on my Elastic cluster.

I cut analysis times from 2 minutes to 8 seconds by reading and not skimming the BRO IDS documentation (really important to do that). Up to this point I had assumed BRO operated only in live mode, and did not realize -r would read PCAP files in offline mode, generating logs without reading directly off the network card. Previously I had been using tcpreplay to replay the PCAP over a physical network interface at max speed. This was fairly inefficient, even with PF_RING kernel modules installed the process took almost 2 minutes. I swapped the tcpreplay method with bro -r and could almost instantaneously get results.

Another area I spent a lot of time on was the UI itself. The UI got a complete redesign. During stages of the submission and analysis process the interface rearranges, displaying only relevant information during that stage of the analysis process. When analysis of the PCAP file is completed only the analysis panel and a small tools interface are available. I also incorporated several jQuery-UI widgets to allow drag-drop and resizing of panels.drag

I took advantage of lobipanel.js’s built in full-screen mode incase a user wants to focus attention on one specific panel.screen-shot-2016-11-22-at-9-00-53-pm

Another concept I had been experimenting with is that of row-specific tools. The idea being that each row contains data which may constitute further analysis. I decided to categorize each potential cell value as a datatype using various regex patterns. When a user click “tools” on the left of any entry, that row is parsed, and fields which were assigned a datatype are extracted. I then generate a set of tools which can be used provide further information about extracted row-data.

Two row-tools I’ve built so far are a simple IP2Geo tool as well as a whois lookup.

screen-shot-2016-11-22-at-9-11-34-pmscreen-shot-2016-11-22-at-9-10-47-pm

I also plan on adding tools which make it easy to pivot between corresponding connections in various BRO logs (those sharing connection UID).

The last major improvement came with incorporating Suricata into my analysis nodes. BRO is great at extracting protocol information and giving you a good idea of the content of the PCAP file. This provides context and is necessary for any decent PCAP analysis, however out-of-the-box BRO is not very good at telling you if PCAP data contains indicators of malicious activity. Suricata on the other hand uses Emerging Threat signatures and can instantly tell whether or not malicious binaries, suspicious HTTP requests, or other IOCs exist within the capture.

screen-shot-2016-11-22-at-9-18-59-pm

The next steps of the project are around making this tool actually useful. Up to this point I have been capturing a ton of data about individual PCAPs but ultimately throwing the PCAP away once analysis is complete. I want to allow the user to download the PCAP as well as artifacts extracted from it, for this I am considering several large-scale storage options.

Hopefully, my next update will be weeks not months from now.

 

Network Analysis #1 – New Projects!

Two years ago I began working on SmartTorrent, which was a sentiment analysis based torrent search engine. The goal of the project was to rank torrent searches based on user content such as comments and determine whether or not a torrent was safe to download. At the time this seemed like a feasible goal, however as the project grew I began to realize how incredibly complex the problem actually was. I realized my approach was inherently flawed as I was assuming some level of consistency in semantic structure of the content I was analyzing. Frustrated by this and the monolithic pile of crap the tool had become  I decided to discontinue work on the project and begin work on a more feasible one; automating various aspects of network analysis for incident responders.

Actually, this has taken the form of several projects the two foremost being a web-based (I hate the word “cloud”) packet-capture (PCAP) analysis engine (think VirusTotal with PCAPs) and a NetFlow log visualizer which identifies top-talkers, potential lateral movement, and other incident response related metrics.

The web-based PCAP analysis engine allows a user to upload a PCAP file to our site, where it is offloaded to a processing node, replayed over a virtual network interface, and analyzed by several IDSs. The resulting analysis will return:

  1. Protocols found within the packet.
  2. Detailed logs of all connections
  3. Signatures fired.
  4. A list of related PCAP submissions containing similar data.

pk-1

The obvious value of this tool comes from it’s ability to group similar packet-captures into one consolidated view. This allows an analyst to search our database using indicators such as IP, host-name, URLs, etc. and receive results which could be used to extend existing blacklists.

Over the next few months I will go into greater detail about each of these projects as I add features.

The link to the Netflow log visualizer can be found here. Please feel free to fork and improve.

 

 

Security, Sentiment Analysis, and Machine Learning

Human beings are unpredictable, and for those of us in security this poses a problem. Even the most resilient systems with multiple layers of security are subject to failure because of human error (usually incompetence). However, where individuals often by themselves make stupid errors that can compromise themselves or your infrastructure as a whole, groups of individuals together often reach intelligent consensuses. This phenomena is known as group intelligence, and can be a powerful tool for identifying evil in data-driven communities such as: content-hosting sites, public forums, image-boards, and torrent-sites.

The current solution to such problems involves hiring moderators to crawl comments/discussions, and remove any spam or harmful download links from the site. Although this solution works it is time-consuming, and depending on the size of your site it can be expensive as well. In the context of forums and image-boards moderator bots are also used, but these bots usually only fire on a few key words or combination of words, and don’t really persist or analyze this potentially useful data later. To make this data useful you need a way to persist, correlate, and quantify this data. You also need a feed back loop that essentially learns based on new content.

For the sake of avoiding confusion I will use a torrent downloading site as an easy-to-understand illustration. Most torrent sites host multiple torrents allowing their users to rate and comment each. The mechanism we will use to rate this content is called sentiment analysis which will give us a negative or positive integer rating based on the “feeling” of each individual comment. These comment “feelings” are calculated based on a content-specific criteria of good and bad words and signatures. The overall rating of the content can then be calculated by adding up ratings of individual comments.


 

Here is a very simplified wordlist containing a few key words and their values, negative, positive, or neutral.

Negators:

no -1
not -1
never -1
Positive:

amazing +5
good +3
excellent +5
Negative:

awful -5
bad -3
horrible -5
malware -5
Context:

copy 0
video 0
software 0
Now let’s use this small amount of context to analyze the following comment.

“This is a good copy does not contain malware”

Bolded words are those that our sentiment analysis algorithm understands

This is a good copy does not contain malware

Evaluated:

(good + copy) + (not * malware) = (3 + 0) + (-1 * -5) = + 8 rating


 

Obviously, the flexible syntax of natural language can pose problems to even the most advance natural language processing algorithms. However, as long as you are able to correctly interpret the majority of your content you do not need to worry so much about these outliers, after all we are trying to find a consensus.

Once you have a consensus you can use this data to advance your knowledge-base through learning algorithms. For example suppose you have one-hundred comments for a particular torrent. Eighty of these comments received positive scores, twenty negative. Think about what this tells us: with a population of this size we can reasonably say the content is both safe and good quality. We can now use association algorithms to look for commonly reoccurring unknown words in our newly identified positive comments. From there we can say with some certainty these newly identified words are positive, and add them to our existing positive wordlist. This information is persisted and used in the next analysis cycle. The same concept can be applied with negative words as well, and more specific “feelings” can be assigned to word groups by custom signatures.

The ultimate goal is to make your data somewhat context aware, where each cycle of analysis builds on the previous cycle. This way as the community content on your site gross so does the overall “intelligence” of the algorithm. In the next few months I will be adding write-ups to this blog on my own “context-aware” security project, and share what I have learned from the data I have gleaned.

Screen Shot 2014-06-16 at 3.13.02 PM

~Jamin Becker

Security Analysts Discuss SIEM’s – ElasticSearch/Logstash/Kibana vs ArcSight, Splunk, and more

 

Hello! The conversation below took place between multiple analyst who work in different security operation centers at multiple different companies. Their names and affiliations in the actual discussion have been removed for privacy/attribution reasons.

Some of the contributors include but are not limited to:

Brandon Levene | @SeraphimDomain

Chris Clark | @xenosec

Christophe Vandeplas | @cvandeplas

Jeff Atkinson

Jorge Capmany | @theweeZ

Greg Martin | @gregcmartin

Max Rogers | @MaxRogers5

Hopefully others who are trying to build their security operations program can benefit from the discussion below!

 

Hi there,

The universal question that everyone had in the past comes up for me
to. We are faced with a pleasant surprise in budgets, but an
unpleasant surprise in time-to-spend-the-money.

This means I don’t have the time to do real comparisons and tests with
the various potential products. Aka: We need to choose in 1 or max 2
weeks, and don’t even have the time to do the research / webinars /

Any advice about pro’s and contra’s are definitely welcome.

The use:

  • Detection of (targeted) malware and attacks (also needs to be able to automate searches based on IOCs, can be scripted)
  • Adds ‘context’ for the analysis of IDS alerts (so searches must be relatively fast, aka not 10 minutes per search)
  • Incident response : use IOCs to search for other victims, grouping must be possible (like group all source_ip’s)
  • 2 possible modes: data comes in live (production environment) , data is batch imported (external-incident response)

Volume:

  • Around 30 GB of data / day cleartext

What technical, usability and price/quality pro’s and contra’s would
you have for

  • Splunk
  • ELK (Elasticsearch, Logstash, Kibana)
  • ArcSight

The goal here is not to be greedy with money. So if ELK is chosen we
will buy support and spend the ‘license price’ in a
consultancy/development investment.

I already have limited experience with Splunk and ELK, none with ArcSight.

Thanks a lot!
-Sam


I would take the ELK route. ArcSight is going to get very expensive, very quickly. Since you will have to pay per how much you’re pumping into it. Same goes for splunk. Your dollars will most likely be better spent with ELK. ELK is very scaleable so should you want to add more log sources and your current set up can’t handle it…you just buy another machine and add it to the cluster. It’s a very simple process. You can also send almost any log format into ELK as long as you dedicate a little time to create “filters” inside logstash that normalize the data. I’ve found the configuration of these filters to be very simple and work better for when you have weird logs coming in. With arcsight you’ll probably have to deal with “Smart Connectors” which are used to do the parsing of the logs. You can make custom connectors but ELK seems to be the simplest with how easily you can filter logs.

If you have 30Gb’s a day then thats only about 900Gb’s a month. If you buy just ONE Dell PowerEdge r720 and max it out, you will have 32TB’s of disk space. This means on just one box you will be able to retain almost 3 years of logs. You just won’t get that type of retention elsewhere.
– detection of (targeted) malware and attacks (also needs to be
able to automate searches based on IOCs, can be scripted):

Use Kibana to set up queries and automate it/build dashboards and they will run on whatever interval you decide. These are fairly easy to set up. I believe it even supports sending sms or email alerts if things start popping off.
– adds ‘context’ for the analysis of IDS alerts (so searches must be
relatively fast, aka not 10 minutes per search):

In my experience of using ELK…It’s fast. Much faster than ArcSight ever was. I hear they are moving to the new CORR database though and the demo’s i’ve seen of it are fast. Splunk….splunk has always been very fast.
– Incident response : use IOCs to search for other victims, grouping
must be possible (like group all source_ip’s):

ELK does this naturally and it’s very simple. You could easily do a query to see all hosts that reached out to a certain domain over the last X days and very quickly you will know every host as long as the logs are making it into ElasticSearch.
– 2 possible modes: data comes in live (production environment) , data
is batch imported (external-incident response):

Very easy to configure any type of log to be sent to ELK. You can also very quickly spin up logstash and start sending logs to the logstash server adhoc. A big part of why Logstash is so successful is that it gives you options on how to get logs to the server. You can run the logstash agent on devices and it will “ship” logs to the central server, you can configure syslog using rsyslog, syslog-ng, or other syslog tools to send syslog to the central server, or you can use Lumberjack which has the ability to ship logs from devices and encrypt the traffic so that the logs can’t be read over the wire.
WARNING: With any of these solutions, I HIGHLY suggest that you dedicate a person full time to maintaining the logs coming in and the health of the servers. Their role will be to create the filters that make logs query-able, configure deceives to send logs to the central logging server, make sure disk and CPU are being used adequately.

TL;DR Go ahead and buy as many Dell PowerEdge r720’s as you can, max them out, get someone in to teach you all about how amazing ELK is. Then make sure someone has the cycles to maintain the logs coming in and the system health of the servers. Buy the logstash book it’s a great starter guide.

I wish you the best of luck!

– Doug


At $dayjob we use both Splunk and Arcsight. Although we’re only starting to implement Arcsight (ESM + Logger) now, and my impressions are rather superficial, here they are:
With Arcsight the first decision you will have to make is whether you want physical or software appliances. You have typically two components, a logger, and an ESM (or an express appliance which is a mix of both).

Physical appliances will limit your storage options, heavily. Software appliances are basically VMs.

Logger does log management (surprise!) and provides basic analysis capabilities (I would say they are trying to mimic Splunk since logger 5).

Licensing model is deliberately confusing, or at least it was when we bought it. Logger uses different metrics for the licensing than ESM. EPS vs GB/day. At some point they also use “monitored devices” for licensing. Oh and also concurrent users of the console. They also charge for that.
http://www8.hp.com/us/en/software-solutions/arcsight-logger-log-management/tech-specs.html
http://www8.hp.com/us/en/software-solutions/arcsight-esm-enterprise-security-management/tech-specs.html

Correlation appears to be decent and works as advertised so far, but we have only implemented the basic use cases, bruteforce, etc.

It has some neat visualizations out of the box.

The console is java based. There are also “read only” and feature stripped web interfaces.

Price wise, it is the most expensive. Scaling it is also expensive. You might have to swap from physical to virtual appliances, etc.

“It’s not for big data stuff”, “you cannot just throw all your logs at it, you have to filter”. You get these warnings from $consultancy when you are about to implement.

From my PoV it is aimed at first level SoC and mostly searching for “known bads”, and tracking them (it has a case management tool built in).

 

Splunk:
License is pricey, but it’s worth it (probably not as expensive as arcsight). It is also more transparent. You get X GB/day, you can have as many analysts as you want looking a it, and can come from an unlimited number of devices.

Data analysis capabilities are better compared to Arcsight and Logger. Splunk search language is powerful.

Visualization capabilities are definitely better than Arcsight and Logger, but not so “out of the box” for some use cases e.g. visualizing alerts in terms of origin and affected objects (unless you buy or get splunk apps that have already done the visualization bit). You can even use d3js for visualizations now. To me, visualization is rather important.

Newer versions allow you to pivot on data with a functionality similar to excel pivot tables.

Correlations are slightly more complicated than with Arcsight.

If you want to, for example, search your historical logs for lists of indicators, it is probably the best tool between the two.

Web interface.

It’s also easy to scale to reduce the time of your searches.
It makes my life easier and helps model cases to feed into Arcsight.

Fully scriptable, you can add context with lookups. REST API. And sdks for multiple languages and web frameworks.

More of an analyst framework than a SIEM per se, but will fulfil a lot of the tasks that a SIEM does easily.
ELK stack:
Only starting to play with it now hence I cannot provide educated feedback. I am primarily looking into it for “secondary” logs that I cannot afford (yet) to have in Splunk.

My $0.02 ~ 0.014 Eur.

Happy to receive feedback, since my experience is limited by time/effort constraints.

– Randall


+1 to Sam’s points. Right on target.

One caveat to note is that Splunk in it’s current maturity (especially with the Enterprise Security app) is a lot closer to being “ready to go” out of the box. Since time is an issue, you may want to seriously consider a Splunk deployment as a short term plan (year+); then slowly work ELK into what you want as a long term, scalable, management system.

Also consider long term storage, I.e duplicating streams for long term retention into something “big data”. That will likely help in the long term with queries in your “main” solution.

-Kent


 

So far in this thread we have seen a few key factors mentioned:

  • ElasticSearch is very scaleable
  • The purchase of any of these technologies should come with personnel to administer the technology
  • ArcSight’s pricing model can be very confusing
  • ArcSight is probably the most expensive of the three options
  • Splunk is very close to being ready “Out of the Box”

 


Love this thread. These are some of the decisions we are struggling with right now. We have been using ArcSight for years at $dayjob, so if you or anyone wants to talk specifics let me know. Just some high-level observations:

ArcSight ESM is great for SOC-type work. When you need real-time correlation, automation and workflow to support multiple people across multiple tiers (L1, L2, etc) working cases together it really shines. For example, we have a number of regex-based rules looking for exploit kit hits, followed by downloads of exe to suggest successful compromise. I don’t know how you would do these multi-stage correlation with a “search” type product, which is what I consider ELK and Splunk to be, but I could be wrong.

ArcSight Logger has been trying to catch up to Splunk for years, but they are always 2-3 years behind across all functionality Right now we are suffering with being unable to scale storage/search large volumes of data, so are looking into ELK/Hadoop. For example, we collect about 250M-275M proxy logs entries per day. Can’t use Logger to search across more than 3-5 days at a time before queries start taking too long to complete or timing out. Not exactly acceptable in my book and I am not even talking about visualization/dashboards. Those are non-existent. I’ve had a number of conversations with a lot of folks there and they obviously know about this and have been making some advancements, especially with recent changes in product management, but that’s still at least a year from being seen in the product.

As a result of the above issues our current plan is to stream logs into Hadoop and then run ELK on top snce ELK can read/index data form HDFS. Doing this instead of just straight ELK because longer term we think we’ll want to run other tools on the data in Hadoop, specifically for visualization that ELK can’t do, or machine-learning type jobs for which we’d need MapReduce functionality.

Love this thread and other recent threads on ELK, so let’s keep it going.

-Tom


Can you give example of what kind of workflows you’re managing with ArchSight? Is it only search/correlation or also business-logic, report, collaboration, etc?

-Michael


<disclaimer> Have only worked with Splunk </disclaimer>

One area that I immediately thought of in your use cases was host-based capabilities. I know that’s not part of the original equation, but I’d be curious to know if ELK would be cost effective enough to also consider a host-based (CarbonBlack, Tanium, etc.) logging solution (if not already present). Anyone else have any idea about that?

If you’re looking for possible context, hunting capability via a wider range of IOCs, etc. Then it would be a huge benefit to get those logs feeding in as well.

Just a tertiary thought.

-Steve


In my opinion, the two platforms you are looking at can be apples and oranges depending in how you implement. I think the answer to your question lies in the amount of resources you have to dedicate to either. ArcSite – as with any other siem – requires care and feeding, but the limitations of that tool also have a limiting effect on the run rate and effort required to realize value.

Splunk is a different animal. It is vastly more powerful than a siem, but requires a large up-front investment in people, process, and tech to provide the same level of value a siem will out of the box. You are then able to take it well beyond, into some very cool stuff.

-Robert


To add to the apples & oranges argument, I think it’s worthwhile that IMO Splunk’s not a SIEM and ArcSight (or any other SIEM) are not log managers/log searchers. Two different things, two different use cases and two different approaches.

If you’re into free-form, human driven, what-if searches and explorations then Splunk (or any other free-form log manipulation toolset) will fare much better. Setting up alerting utilizing regexes and defined known-bad strings in a SIEM is not the same, your success comes from the intelligence the human users running it have.
If you want workflow-driven, automated, audit-able and repeatable first-line alerting for well defined conditions (and running defined regexe’s against data realtime is defined conditions IMO) then you’ll fare better with a SIEM. You won’t do as good if you want to drill into any of the great patterns provided on this list on the fly with most SIEMs though; Splunk et al will most likely outperform.

At the end of the day it’s your usecase, and your human resources that are the factors here. At $dayjob we run both approaches, as Splunk’s not providing the strict framework we need for BAU activities while the SIEMs we checked/deploy do not fare as good with free-range searches.

-Randy


Has anyone considered some way to manage multiple analysts? I’m trying to figure out how to allow collaboration between different domain experts working on the same incident.

-Michael


 

That’s what I meant in my earlier post with workflow vs go-nuts searches. If you do the latter, you probably need a very robust ticketing system that can be used to attach logs, data, screenshots etc. while keeping access rights controlled. Very different problem space 🙂

-Randy


 

As the conversation progressed, the following opinions surfaced:

  • ArcSight’s Logger technology seems to be trying to catch up to Splunk and just can’t compete.
  • ArcSight’s Logger tends to time out or take hours/days to pull back data based on queries
  • ArcSight works well when a team needs to have multiple analyst working on an incident. (Decent case management structure designed to aid SOCs)
  • Splunk and ELK don’t appear to be strict enough to assist in audit/compliance tracking

+1 to Roberts points here.

ELK seems an ideal longer term and retroactive alerting/search
platform… between that and splunk is more of a religious debate as
the functionality is very similar. This may very well be all that is
required depending upon your IR capabilities and overall maturity. It
“can do” nearly everything ArcSight can do, and this capability
(longer term retention with the ability to run queries and searches)
is a required first step as often you won’t have proactive
indicators/signatures.

A well tuned SIEM is worth the investment in time and energy to
facilitate top level time sensitive alerting and correlation rule
application. Yes, the log normalization and optimization is a bit time
consuming at first, but the ability to watch it light up like a
Christmas tree in real time as you detect every piece of an
adversaries attack is priceless, gap analysis in your defenses is
almost instinctive in a properly tuned SIEM, as well as the ability to
perform very quick triage and preliminary DFIR investigation.

If you must chose ONE I’d go Splunk/ELK (Disclaimer I don’t have
direct experience with ELK used in this way…) or similar solution
which allows for retroactive alerting off of chron’ed searches/queries
and IOC ingestion.

Ideally this will be topped by a minimal retention (3-5 days) SIEM
deployment in which your Intrusion Analysts can live, pivot, and
triage. I love ArcSight, it quickly becomes a part of you if you
harness it’s capabilities. I’d stay away from NITRO like the plague or
anything which attempts to tell YOU what’s bad/normalize internally
based on opaque criteria, but anything that allows for effective
“active lists” “active channels” and boolean filter/query/alerting
creation is a great front end.

TL:DR: If you are mature enough to actionably respond to real time
alerting at scale, and constantly evolve those criteria based on
threat intel you want a SIEM on top of the required Longer term
logstorage/search/investigation platform (Splunk/ELK).

– Troy


As others have mentioned, beyond actually “using” the technology, carefully consider who is going to operate & maintain the system(s); this is often overlooked. In my experience some platforms are easier than others, but all require it.

If you don’t have a body to do maintenance and optimization, look at getting some managed support service from a vendor. Short term you can get usability out of the system until you can get body(s) in-house to operate.

-Nathan


At $dayjob we have both Splunk and ArcSight and I agree with the difference in usage. Ad Hoc searches v.s. workflow, automation etc.

Great note by Nathan, both get up and digesting standard logs quickly giving 75% solution in my opinion. The rest will take talent and/or vendor support.

Every shop has resource constraints that they have to operate within.  $dayjob prefers to put the money toward vendor support and we still have flexibility with these tools.  This abstracts the technology from the individuals building the system to help ease support and transition.

I admit my ELK experience is limited. That said I am building out my custom platform with ELK and see how it handles.

Thanks to all for the insight. We can open source our entire infrastructure but we can not reallocate those savings for additional people. So $dayjob chose to use the money and get vendor expertise, although your mileage may vary…

-Jerry


That’s a great point Nathan. The cap-ex vs op-ex issue is very real.

Beyond the bare minimum (network sensors and a log storage mechanism
for incoming web/email/dns/AV/access logs). I’d take people over
product all day long. Having millions in tools and watching thousands
of alerts fall on the floor daily is not a good time.
We’re moving forward, but right now comprehensive netsec is still a
human driven game though the force multipliers (tools) are becoming
much more effective … I think an honest accounting of your staff’s
core competencies is required prior, and I’d be careful to rely on any
single points of failure on the personnel side.. especially in
critical roles.

Replacing highly skilled folks in this environment is a time intensive
and expensive process as the industry is not standardized/mature…
there should at least be some available talent out there for
ArcSight/Splunk given the install base. Some of the more cutting edge
and/or custom solutions carry a much higher risk on the personnel side
finding the right mix of sec-analyst/engineer/developer is neigh
impossible.

-Troy


I wanted to chime in here on some general lessons from doing a SIEM deployment, and learning along the way. I didn’t use either of the products mentioned in the thread, but I thought that this may be helpful.

Your use cases fall into two categories, which I will call online (live, streaming, logs as they are coming in), and offline (searching historical logs for newly identified indicators). I do both of these with a tool that wasn’t really designed to do both.

TL;DR:
given the choice, allocate more money toward analyst salaries and less toward software.
You need dedicated people for the care and feeding of the solution itself (admins/engineers) besides analysts to process that data
Focus on a good flexible ‘back-end-database-ish” solution, and build on top of it.
Use VIPs for Syslog, because syslogs can be easily dropped
over-spec your LPD/EPS. The solution will always expand, it will never contract.
For example:
The tool that we use at $dayjob now is designed to be a live alerting SIEM. It is built on top of MS SQL Server, and the architecture is great for live alerting. That is what they designed it for. The rules language is powerful and it is easy to use. The limitation is the historical search capabilities.

The tool we used at $PreviousDayJob was a great distributed database and could return searches in no time flat. We put something else on top of it to do the SIEM analysis. The direct SQLish access was great for our analyst to build his own scripts to query against.
General:
Lesson #1: “offline” Searching the last “N Days” of logs is much different than alarming and alerting on logs as they come in. They are very different.
Lesson #2: Not all tools do both “online” and “offline” well, some do both mediocre and neither well, some do one well and the other not so much

Architecture:
Lesson #1: There are limitations to VM deployments and SAN architecture that will show up in high volume log capture and SIEM deployments. Appliances are more expensive, but can be higher performance. SAN/VM deployments can be cheaper, but you have to be mindful of performance.
Lesson #2: The time it takes your people (or consultants) to tune and optimize the VM infrastructure may make it worth purchasing appliances
Lesson #3: Regardless of which you choose, you will need more of it than you think, more of it than you estimate, and more of it than what you think you know to be the proper log volumes.
Lesson #4: Some SIEM solutions just stop accepting logs when they hit their LPD/EPS/MB limits.
Lesson #5: Syslogs can be dropped (even with NG). You need to watch network volumes, especially on VM’s. You probably want to use a VIP for log collection if you have high volume syslogs.
Lesson #6: Make a list of every device you own, and get a sample of the log. Make it part of the contract that they actually can parse each of those log source types into the fields you want.

Use Cases:
Lesson #1: Regression searches are not built in to most platforms (e.g Search for one IP address, find all sources that communicated with that malicious IP, find all IP’s that communicated with those source IP’s, filter out known (e.g. Google, Yahoo, known corporate partners), etc.). You will have to build this yourself.

Lesson #2: A good analyst is worth more than any product. If you have your choice of allocations, more analysts is better than a more complex SIEM engine to do online alerting.

Staffing:
Lesson #1: You need separate teams of dedicated analysts to process the data, and engineer/admins to administer the platform and keep it running

-Tim


I concur that log management/search is a different core capability fromSIEM. As the vendors in different spaces have matured, they have added more features of the other (e.g Splunk is becoming more SIEM-like).

For starting out, I’m in the camp of getting your arms around the security
significant logs first for IR purposes, then layering in whatever alerting
is necessary to move forward. This drives me to start with Splunk/ELK.
Each time I’ve stood this up, we found that we got to 85-90% of the total
goal of the capability (log collection + correlation + alerting), and the
additional cost required to stand up a SIEM on top wasn’t worth the
additional value. More/Better analysts has always been our choice.

Agreed that Splunk/ELK is a bit of a religious war, and I think they
provide very similar capabilities. From my experience, I have seen Splunk
do well for a couple core reasons (each of these in my experienceŠnot
necessarily universal):

  • It has been more stable
  • Easier to onboard data (DBConnect, Universal Forwarder, scripted inputs). More user friendly to define field extractions/normalization after indexing
  • More support from the vendor
  • Better acceptance rate from analysts (Partly due to going through training but the search language also seems a bit easier)
  • Enterprise Security (ES) app, which is a premium app, provides some of the capabilities typically found in a SIEM. It also has a reasonable threat feed framework which is easily extensible.

Nothing’s a silver bullet, but that’s my .00004BC

-Jack

#####

TL;DR – Go with ELK. Spend your money on a few servers and professional services for implementation.

I’ve been there, done that, with all aspects of what you’re asking. The “you need to spend it now” aspect is always a good time. Others have pointed out cap-ex vs op-ex, but it’s important to understand if this is a one-time pot of money or a perpetual bump. If it’s just one time, then your commercial products simply aren’t an option. Both ArcSight and Splunk have a significant annual spend aspect. Further, those costs will only increase with time/amount of events. Buy a bunch of servers, some proserv to help you get ELK all set up, and call it a day.

Setting aside the money aspect, the other thing I’ve been through is having used all tools you mention in a larger environment, so I’ll try and walk through our experiences with them.

We used to rely solely on ArcSight ESM. That might have worked eight years ago, but even six years ago it couldn’t keep up. It was barely able to keep up with the event throughput you’re asking about. While they have worked on event throughput, they have a fundamental architecture issue. ESM cannot scale beyond a single “app” server. This means all of the correlation aspects can only scale to the largest Linux capable server you can find. On top of that, the rules language is nonsense and you’ll need to send people to training to get the most out of it. We added ArcSight Logger about four years ago (pretty much when it came out) to get reasonable event retention, but even with 8 servers, it was dropping events and searching was stupid slow (24 hours to search over 14 days of data – searching for anything longer would cause the logger to crash). And even with all that we had spent on ArcSight (it was a lot), we weren’t even really using the event correlation capability. It was doing stupid things like X logins in Y minutes, but that doesn’t need to be done real time. A year and a half ago, we walked away from all things ArcSight.

Our grand design for ArcSight replacement was a combination of Splunk and Hadoop. We dual stream events into Splunk and Hadoop simultaneously and then set a retention in Splunk to drop events after 90 days. We shoehorned event correlation into Splunk with a combination of saved searches and a home-grown system sitting outside of Splunk that takes the results, does some aggregation, and then sends the results as alerts into our case management system. Hadoop is long term event retention as well as larger correlation/analysis jobs. For instance, we automatically mail all admins a list of their logins for the last 24 hours so they can spot any inconsistencies.

Hadoop works really well for retention and bigger jobs. About 6 months ago, we realized we had some issues with Splunk. One of them is that its scalability and high availability seems bolted on. Plus, indexers have pretty heavy IO requirements, so your servers are spendy. As our EPS kept increasing, we were teetering on a decision point, faced with having to buy another $25k server. Looking at all that, we’ve made a decision to walk away from Splunk. For what it’s worth, we’re pushing about 150k EPS today.

Our new design is ELK + Hadoop + Spark + custom event normalizer. The nice thing about this approach is that it is highly scalable (just add more cheap servers) and 100% open source. This means that our data is 100% free to do with as we please. We don’t have to worry about future flexibility as our data isn’t locked up. With the way that ELK, Hadoop, and Spark scale, we’re shooting for something in the range of 1M Events Per Second and everything tells us it’s doable. We use ElasticSearch for another open source project (moloch full packet capture) and it scales very well performance wise (we can search across 60TB of data in less than a minute, and searches for shorter time windows do come back in seconds). We’re not yet using percolators, but they have been designed for simple rules based triggers.

Right alongside our ELK install in the new architecture. We dual deliver (Flume for HDFS and Logstash for ElasticSearch) to both our short term event search system (ELK) and to our analysis and repository (Hadoop). We use Cloudera Enterprise for HDFS, MapReduce, Impala, and Spark. The management interface makes spinning up new cluster nodes or new products/features very trivial.

Ultimately, I can’t speak highly enough for ELK. It’s come a long way in a year, and there’s even cooler stuff on the horizon. We’re very happy with what we’ve seen thus far and it’s a critical part of our new log management system.

A few recommendations:

  1. Don’t use VMs for ElasticSearch. It -might- be possible to make it work, but we tried VMs and then switched to physical hardware. Lots of stability issues we had just magically disappeared.
  2. Invest a bit in ElasticSearch’s “developer” support for ELK. It’s essentially a PS engagement for a few months to help you get it all set up and tuned correctly. ElasticSearch also offers ongoing support, but it’s that developer support that will help you hit the ground running.
  3. Use many, cheap boxes. ES scales horizontally quite nicely. We spend roughly $4k per box and we don’t have to think twice if we need more capacity. This also means you don’t have to buy today for what you’re expecting a year from now. You can just buy for today and then add more tomorrow.

Once we actually have all of our new system laid out, I’d be happy to share if people are interested. We’re planning on open sourcing the event normalizer so that the whole thing is all open source.

-Dan


 

At this point it seems that ArcSight has won the hearts of those who want to create a formalized and tiered Security Operations Center but ELK/Splunk type tools provide a more adaptable, dynamic, and innovative  platform. Some believe Security Operation Centers should use SIEM technologies such as ArcSight for only a few days worth of live/real-time event alerting and leave log management/historical needs to ELK or Splunk:

  • More talk on who will administer the technology you choose
  • It seems that people use tools such as ArcSight or ELK/Splunk as both Log managment tools and SIEMs
  • If your current security program is mostly human driven (someone retroactively searching for evil) go with an ELK or Splunk instance
  • If your program is mature enough to respond to security incidents in real time at large scale, a SIEM may be the right move.
  • “ElasticSearch scales horizontally quite nicely. We spend roughly $4k per box and we don’t have to think twice if we need more capacity. This also means you don’t have to buy today for what you’re expecting a year from now. You can just buy for today and then add more tomorrow.”
  • I believe that this statement holds a lot of truth, “If you must chose ONE I’d go Splunk/ELK (Disclaimer I don’t have direct experience with ELK used in this way…) or similar solution which allows for retroactive alerting off of chron’ed searches/queries and IOC ingestion. Ideally this will be topped by a minimal retention (3-5 days) SIEM deployment in which your Intrusion Analysts can live, pivot, and triage. I love ArcSight, it quickly becomes a part of you if you harness it’s capabilities. I’d stay away from NITRO like the plague or anything which attempts to tell YOU what’s bad/normalize internally based on opaque criteria”

I hope not to offend with this, but…

Temper your expectation of Splunk Professional Services – especially on the security side. They are perfect for deployment-type, operational tasks, but fall down with anything complex or advanced…we went through several before we gave up and started requesting engineers.

Having said that, Fred Wilmot at Splunk is maybe one of the smartest people I know. They have brilliant folks, you just have to ask.

-Robert


 

Agreed, not meant to offend either but my experience with PS was less
than satisfactory.

They struggled with basics, like field extractions, regex, and even
their own application. Sadly, we’ve never had a working ES install; but
I did utilize their TAs heavily. Be wary of their RegEx, it may work for
most, but taking a few minutes to check the transforms you use is well
worth it.

I love Splunk, and spend most of my day hunting through 180 days of
data. I also invested a good bit of time field extracting as much as I
could, in a way we can use the CIM to search through as much data
effectively as possible.

– Matt


 

Next, the discussion shifts to whether or not a SIEM type solution could be hosted in the cloud.


I’ve noticed server-investment raised a few times, and it is indeed a hassle for me too. Has anyone ever considered building their SIEM solution in the cloud?

-Michael


Whilst it is possible to do (you can certainly do it with ArcSight and I see no reason why you can’t do it with the others), there would be a few considerations (one might be the delivery of logs in a timely manner if you have large amounts of data going into the cloud such as Firewall logs) and of course you also need to be blessed by your regulators to do it (if you are subject to the fantastic joy which is regulation).

-Trevor


I am guessing uploading to cloud will be as fast as routing to private-cloud SIEM in a central location. Especially for global orgs.

Anyone running cloud-based SIEM and would be willing to share his experience, here or privately?

-Michael


We’ve played with Storm a bit (Splunk’s cloud offering), but not seriously. I have a friend who gets great use out of it as a log repo for various honeypots, but nothing more serious than that. It can be a bit delayed, depending on a number of factors.

While we’re talking about cloud here, I’d love to know how folks are getting logs from cloud providers to their store. We’ve not yet found an elegant way to do this for a couple of our providers…and they aren’t exactly helping.

-Robert


I would take a look at sumo logic, it was built by the original developers of ArcSight who wanted to do cloud SIEM right.

I’ve seen many attempts to run SIEM in a virtualized environment and almost all of them failed.

I would strongly recommend against it unless your using a built from the ground up cloud product like sumo logic, loggly, Splunk cloud etc.

-Mike


One more product that might be of interest.

There is one company that has a solution that is somewhat similar to the structure of ELK (in that it uses nodes for compute and storage distributing both the DB and CPU across commodity systems). It used to be known by the name “Sensage”, but now goes by Hawkeye. I received a sales call from these guys a while back and they were doing a makeover of Sensage into a full fledged SIEM. I know that the old Sensage was a good platform for log storage and queries/reporting, but I know nothing about the “new hotness” from Hexis that uses it as a back end.

http://www.hexiscyber.com/products/hawkeye-ap/extending-your-existing-siem

-John


Maybe the best answer is to look into using both in your infrastructure– In my experience, Splunk is a very powerful tool for initial aggregation and high-level analysis of data. I use it pretty often for gluing different data sets together and prototyping ideas, as it’s very tolerant of different log formats (check out cefkv add-on for processing ArcSight CEF files). Then I usually implement the prototypes into production analytics using open source stacks such as ElasticSearch or Hadoop.

If you’re building custom analytics or processes to search and act on data at near real-time speeds, I’d check out ElasticSearch (neat statistics capabilities, and great free-text search). If you’re feeling really ambitious Apache Storm or Druid.io for large scale statistics. If you don’t have a dedicated development team but want to build interactive workflows you can still do quite a bit using Splunk.

-Elon


 

It seems like there are several different type of product functionality being discussed:

  • Data aggregation:
    – Splunk, ELK, NetWitness, etc.

    • Log collection
    • Packet collection
  • Threat detection:
    – Seems to be mainly open source/home-grown solutions.

    • Data enrichment
    • Threat intelligence
    • Correlation functionality
  • Work-flow: 
    – Seems to be ArcSight, but high overlap with Splunk… What else is in here?

    • Audit trail / Work-logging
    • Collaboration
    • I’m not sure what else goes here, but gut says this is important. Ideas?
Anything else I missed?
It seems that putting this all under the umbrella of “SIEM” doesn’t cut it anymore. Too many different requirements for one vendor to cover well. What else do you think is missing? Do you have any recommendations for every domain separately? How do you manage all the different solutions? What’s missing? Maybe there’s a place for some open source development?
-Michael

Good call in pulling these out into different categories. We often use SIEM to describe all of these as one thing but it’s clear that we should be moving away from that in both discussion and in practice. The reason ArcSight fails at it’s job is because it tries to solve $WORKFLOW, $THREAT-DETECTION, and $DATA-AGGREGATION. I’d propose changing $WORKFLOW to the term $CASE-MANAGEMENT. Workflow could mean interacting with multiple tools but what we are really talking about is case management. Tools that I see being under the $CASE-MANAGEMENT section include things like:

  • Jira | https://ucdavis.jira.com/wiki/display/PMO/Computer+Security+Incident+Response+Capability
  • ArcSight | Lots of case management/ticket system functionality for incidents
  • RSA Archer | Case management tool to track incidents, keep audit trail, collaborate on incidents

I’m sure many others can add to my limited list.

The day we stop buying SIEM’s to solve all 3 problems and use (ELK or Splunk + Jira + Netwitness or Moloch + (Snort, host based detection, real-time sandbox detection, etc) will be a good day.

-Doug


Here’s how we look at it (I tried ascii art, but failed):

Log collection (rsyslog + custom log fetchers where needed) feeds into log aggregation (rsyslog) which feeds into log normalization (to be finalized, we have POC code now). The normalization tier is a bus that simultaneously feeds log search (ES) and log retention (Hadoop) and log correlation/alerting (Spark) and log enrichment (this is where threat feeds fit in). Enrichment does its thing and feeds back into log aggregation with the original log line added to the new enrichment (which all feeds back through the search/retention/correlation/alerting layer). Alerting then feeds into our case management system, which does include workflow, contact management, etc (this last bit is a $DayJob app, which we’ll be sharing for free soonish). The alerts will include enriched data, as well as pre-populated searches back into ES for easy investigations. Hadoop is there for long term retention and also running analysis jobs.

-Dan


Btw, for those interested. Below is an overview of the approximate prices of Elasticsearch support.

Platinum support: 1h/4h/1d response
– per node 9k EUR
– 10 nodes : 45k EUR
– 25 nodes: 102k EUR

Gold: 4h/1d/2d response
– per node 6k EUR
– 10 nodes: 30k EUR
– 25 nodes: 68k EUR

Silver: 1d/2d/4d response
– per node : 3,4k EUR
– 10 nodes : 17k EUR
– 25 nodes : 38k EUR

Development support: 2d response
– 6 months: 20k EUR
– 3 months: 13k EUR

They also offer training.

This is fairly reasonable if you compare the total cost of their competitors (license + yearly support)

-Sam


As for the support contracts, I’ll take a moment to note that those are list prices. There’s wiggle room even at those node counts. As the node counts go up, so do the discounts. Unfortunately due to NDA I can’t share what we negotiated down to on the list (but it’s much lower per node than list). For those considering support, decide if platinum is -really- worth it to you. We’ve got enough redundancy built in that even if one of our clusters explodes, we can still recover. Gold support is plenty if you prepare. Oh, and one other thing, the pricing is per node and per project. That means if you run, say, logstash on top of ElasticSearch -and- another ElasticSearch back-ended app, you’ll have to pay double. I’ve told them several times how stupid this pricing model is (we’ve been talking with them for nearly a year about support), but they’re very insistent. So be prepared.

-Dan

 

 There we have it! This conversation was one of the best I’ve seen thus far on the SIEM vs ELK & Splunk debate. I hope that others out there who are building out their security operations program can benefit from the text above. Towards the end of the conversation it appears that many analyst believe that buying one of these technologies to solve all of your SOC problems isn’t the best way to approach the problem. Lot’s of analyst believe a shift needs to happen so that SIEM technologies are responsible for alerting on real time data within ~a couple weeks. On top of that, Log management tools should be used to handle everything past 1 month of data. They are designed to take in massive amounts of logs and give analysts the ability to hunt retroactively for evil in their environment.  Please feel free to comment on this post with your thoughts and questions. Thanks to all the analysts who took the time to participate in this discussion and provide their experience out with the security community!

Tagged , ,

Easy BASH for making data difficult to recover without damaging your filesystem.

There are a many programs designed specifically to recover lost data, even when a file has been “deleted” from a users hard-drive. This is not difficult to do because when a file is “deleted,” all that is happening is the pointer to a specific block of data is removed. The removal of this pointer is instantaneous, the removal of the data-block is not. However, once the pointer is removed the filesystem allows overwriting of the data-block. This means that files you meant to delete are still sometimes recoverable even months after they are removed from your computer. Most modern recovery programs are able to reverse this process by looking at the headers of data-blocks, and determining what type of file it resides there.

If you are paranoid about your security like I am you may ask yourself how do I prevent this from happening. One way is to utilize software that overwrites this data-block, with random patterns of data several times. Obviously the more passes the harder it is for the original data to ever be recovered. Another method, that works on a much larger scale, is to create so much write activity on the disk, that the original data will almost always be corrupted.

This script does just that, by generating a specified number of temporary files and then deleting them, overwriting the equivalent amount of data stored at that location.

[code language=”bash”]

#! /bin/bash

read -p "How much space do you want to overwrite? [MB]" space
read -p "Enter a valid directory path [example: /home/user/]: " directory
i=0
overwrites=$(($space/2))
while [ $i -lt 8500 ]; do
template+=$RANDOM
(( i = i+1 ))
done
i=0
while [ $i -lt $overwrites ]; do
clear
echo $(($i + 1))/$overwrites overwrites ~$(( 2*$i )) MB
(( i = i+1 ))
j=0
while [ $j -lt 50 ]; do
(( j = j+1 ))
echo $template >> $directory/$i-temp
done
done

i=0

while [ $i -lt $overwrites ]; do
(( i = i+1 ))
rm $directory/$i-temp
done
[/code]

Still if you want to guarantee the permanent deletion of your data, Thermite is the way to go.

~Jamin Becker

Tagged , , , , , , , , ,

Quick & Easy Malware Discovery/Submission

In this quick project I decided that the goal would be to automate the downloading of malware and submitting samples to VirusTotal that aren’t currently in VirusTotal. I decided that to gather the malware I would use Maltrieve.

From the github, “Maltrieve originated as a fork of mwcrawler. It retrieves malware directly from the sources as listed at a number of sites, including Malc0de, Malware Black ListMalware Domain ListMalware PatrolSacour.cnVX VaultURLqery, and CleanMX” I would like to thank  for taking the time to build this out.

To upload samples to VirusTotal, I utilized a script written by @it4sec. It can be found at http://ondailybasis.com/blog/wp-content/uploads/2012/12/yaps.py_.txt. All I had to do was add my API key to the script and tell it what directory my samples were in. At that point, it handles checking if the sample is already in the VirustTotal data set and if it isn’t, it will upload it. It even keeps track of everything in a log file for future reference.

I added a cron job that runs Maltrieve at the top of every hour and another cron job that runs yaps.py 30 minutes after. This essentially allows me to pull down new samples every hour and do my part in uploading new samples to VirusTotal.

Analysis:
So far I’ve pulled down 7,424 malware samples by using Maltrieve over the last few days. Out of that 7,424 samples, ~1400 samples have never been seen by Virus Total. I’ve found different variants of malware such as Zeus, Asprox, and lot’s of malicious iframe injections on web pages. I’m actually surprised in the amount of unique samples being uploaded, I was expecting someone to be doing this exact same process and uploading samples sooner than I can.

The next step of the project is to get malware uploaded automatically to Malwr.com to generate sandboxing of the samples. I look forward to expanding this out and hopefully receiving some input on what direction this should/could go.

– Max Rogers

Tagged

Fast easy Linux Monitoring with Bash

When it comes to security on a large scale it is usually necessary to setup IDS/IPS to monitor network traffic. But what happens if the attacker already has backdoors installed on your Linux box? The following script will take care of just that, giving the admin a bird’s eye view of what is currently going on, on his system.

This script monitors network traffic, system changes, recent and current logins, user permissions, command aliases, and cronjobs.

[code language=”bash”]
if [ $(whoami) != "root" ];then
echo "THIS SCRIPT MUST BE RUN AS ROOT!"
exit
fi

find / -name .bashrc > temp4 &
md5sum /etc/passwd /etc/group /etc/profile md5sum /etc/sudoers /etc/hosts /etc/ssh/ssh_config /etc/ssh/sshd_config > temp2
ls -a /etc/ /usr/ /sys/ /home/ /bin/ /etc/ssh/ >> temp2
while true;
do
netstat -n -A inet | grep ESTABLISHED > temp
outgoing_ftp=$(cat temp | cut -d ‘:’ -f2 | grep "^21" | wc -l)
incoming_ftp=$(cat temp | cut -d ‘:’ -f3 | grep "^21" | wc -l)

outgoing_ssh=$(cat temp | cut -d ‘:’ -f2 | grep "^22" | wc -l)
incoming_ssh=$(cat temp | cut -d ‘:’ -f3 | grep "^22" | wc -l)

outgoing_telnet=$(cat temp | cut -d ‘:’ -f2 | grep "^23" | wc -l)
incoming_telnet=$(cat temp | cut -d ‘:’ -f3 | grep "^23" | wc -l)

echo "ACTIVE NETWORK CONNECTIONS:"
echo "—————————"
if [ $outgoing_telnet -gt 0 ]; then
echo $outgoing_telnet successful outgoing telnet connection.
fi

if [ $incoming_telnet -gt 0 ]; then
echo $incoming_telnet successful incoming telnet session.
fi

if [ $outgoing_ssh -gt 0 ]; then
echo $outgoing_ssh successful outgoing ssh connection.
fi

if [ $incoming_ssh -gt 0 ]; then
echo $incoming_ssh successful incoming ssh session.
fi

if [ $outgoing_ftp -gt 0 ]; then
echo $outgoing_ftp successful outgoing ftp connection.
fi

if [ $incoming_ftp -gt 0 ]; then
echo $incoming_ftp successful incoming ftp session.
fi

if [ $incoming_ftp -gt 0 ]; then
echo $incoming_ftp successful incoming ftp session.
fi
cat temp
sleep 5
clear

echo "CURRENT LOGIN SESSIONS:"
echo "———————–"
w
echo
echo "RECENT LOGIN SESSIONS:"
echo "———————-"
last | head -n5
sleep 5
clear

sleepingProcs=$(pstree | grep sleep)
if [[ ! -z "$sleepingProcs" ]];then
echo "SLEEP PROCESSES:"
echo "—————-"
sleep 5
clear
fi

#Check for changes to important files.

md5sum /etc/passwd /etc/group /etc/profile md5sum /etc/sudoers /etc/hosts /etc/ssh/ssh_config /etc/ssh/sshd_config > temp3
ls -a /etc/ /usr/ /sys/ /home/ /bin/ /etc/ssh/ >> temp3
fileChanges=$(diff temp2 temp3)
if [[ ! -z "$fileChanges" ]];then
echo CHANGE TRACKER:
echo -e "\n"
echo "$fileChanges"
sleep 5
clear
fi

echo "CRON JOBS:"
echo "Found Cronjobs for the following users:"
echo "—————————————"
ls /var/spool/cron/crontabs
echo
echo "Cronjobs in cron.d:"
echo "——————-"
ls /etc/cron.d/
sleep 5
clear

echo "ALIASES:"
echo "——–"
alias
echo
echo ".BASHRC LOCATIONS:"
echo "——————"
cat temp4 | while read line
do
echo $line
done
sleep 5
clear

echo "USERS ABLE TO LOGIN:"
echo "——————–"
grep -v -e "/bin/false" -e "/sbin/nologin" /etc/passwd | cut -d ‘:’ -f1
sleep 5
clear

echo "CURRENT PROCESS TREE:"
echo "———————"
pstree
sleep 7
clear
done

[/code]

~Jamin Becker

Linksys & Netgear Backdoor by the Numbers

If you’d like to just skip to the data, feel free to scroll on down. Research is not endorsed or attributable to $DayJob 🙂

After reading Rick Lawshae’s post on Hunting Botnets with ZMAP, I started wondering what types of cool things ZMAP can be used for. It wasn’t but a day or two later that something fell into my hands. On January first, Eloi Vanderbeken posted his findings on a backdoor that listens on TCP port 32764. The backdoor appears to affect older versions of Netgear and Linksys routers but some users are also reporting that other brands are also affected by the backdoor. Eloi was also able to write a python script that had the ability to check for the vulnerability among other functions. To get more info on the backdoor and how Eloi discovered it, you can check it out here: https://github.com/elvanderb/TCP-32764/blob/master/backdoor_description_for_those_who_don-t_like_pptx.pdf.

Once I had wrapped up my reading on his work, I got excited. I realized that I finally have a way of answering a question we usually go without knowing. Almost every couple months you hear someone say, “There’s another backdoor in XYZ product!” and that’s about when media blows up, PR statements are released, Snort sigs are written, and we all wait for the first exploits to start rolling out.

I know that I don’t speak for all but I feel like the general mindset is that when a major backdoor or ZeroDay starts to make headlines, we think that hundreds of thousands, maybe millions of users, are affected by the vulnerability. With this in mind I set out to answer the question, “How bad is it?”

Step one was to figure out how to use Zmap so I installed it on my kali VM and gave it a shot. I followed the extremely simple instructions on their webpage an in one line I had my scan configured “$ zmap –p 32764 –o OpenPorts.csv”.

I then went to my VPS provider of choice and purchased a VPS that had a gigabit connection to the intertubes . I loaded up a vanilla install of Ubuntu server 12.04 and installed Zmap. Before I launched the scan, I made sure to read the Scanning Best Practices section of the Zmap documentation which lists things such as “Conduct scans no longer than time need for research” and “Should someone ask you to cease scanning their environment, add their range to your blacklist”.

The scan took roughly 22 hours to complete. The Zmap documentation and advertising states that you can get it done in less than an hour but I think they used a cluster format plus 22 hours isn’t bad by any means. 22 hours and 13 abuse complaints later (all complaints were acknowledged and scanning was ceased), I had my list of roughly 1.5 million IP addresses that were currently had TCP port 32764 open.  1.5 million…I thought to my self “That’s a pretty big number.”

I knew that this probably wasn’t statistically accurate though because there had been no validation that the backdoor service was the service listening on those open ports. To help validate how many of our 1.5 million users were actually vulnerable, I pulled in my friend Farel (Jamin) Becker.

Using Eloi’s findings Jamin was able to write some scripts using bash and python that allowed for us to quickly check the 1.5 million hosts for the vulnerability. It did this by simply reaching out to the port and seeing if there were indicators that the service was running. No exploitation or malicious actions were taken against the vulnerable routers.  Our checking was comparable to when you try and connect to a web page.

To effectively check for the vulnerable service, Jamin’s scripts functioned by splitting the list of 1.5 million IPs into roughly 2000 different list. Then the system was able to spin up 2000 independent instances of python to perform the work. To do this, we needed a pretty beefy computer so we rented the top EC2 server we could find. Needless to say it worked beautifully and only cost about $2.40 for the hour it took to complete the validation.

This is where the real data comes in. My first thought was “Oh man, here comes the part where we get to tell the world 400,000 routers are vulnerable RIGHT NOW!” The results were actually quite surprising. It turns out that only 4,998 routers were exposed and vulnerable. Safe to say that I expected more and I feel most would too. Below is some statistical data around what Jamin and I found. Geo data was gathered by querying the Maxmind Database.

ByCountry

byISP

ByState

-Max Rogers & Jamin Becker

Tagged , ,

5,000 whois lookups in under a minute

There have been many times in my IT career I have been required to solve very specific problems. My first step to solving these problems almost always begins with the same question: “Has it been done before?” Most of the time, if the problem is specific enough the answer is a resounding, “No.”

The other day I came across the following conundrum. I was asked to whois 5000 hosts in a reasonable amount of time (minutes) given a server with only one processor core and 2GB of RAM.

I ended up creating a BASH script to do the following:

1. Break down the huge text file into smaller text files each containing only a few IPs.

2. Spin up a separate process for each of these text files to do the whois lookups.

[code language=”bash”]
#!/bin/bash
#Super fast whois lookup – Jamin B
echo Enter the file you want to read in:
read file
echo Spawn into how many threads?
read splitNumber
totalLines=`<$file wc -l` #determine the total number of lines
currentLine=0
currentFile=0
splitSize=$(($totalLines/$splitNumber))
echo Each file will contain $splitSize IPs.
rm -rf tmpdir
mkdir tmpdir
rm -rf whois
mkdir whois
cat $file | while read line
do
(( currentLine = currentLine +1))
if [ $(($currentLine%$splitSize)) == 0 ]; then
(( currentFile = currentFile + 1))
clear
echo Creating file $currentFile in tmpdir
cat /proc/meminfo | grep MemFree
fi
echo $line >> tmpdir/$currentFile
done
i=0
while [ $i -lt $splitNumber ]; do
(( i = i + 1 ))
clear
echo Spawning process for file number $i
cat tmpdir/$i | while read line
do
whois $line > whois/$line.txt &
done
done
[/code]

The result: 5000 whois reports generated in under a minute!

Screen Shot 2014-01-10 at 1.27.54 PM

-Jamin Becker