Hello! The conversation below took place between multiple analyst who work in different security operation centers at multiple different companies. Their names and affiliations in the actual discussion have been removed for privacy/attribution reasons.
Some of the contributors include but are not limited to:
Brandon Levene |
Chris Clark |
Christophe Vandeplas |
Jorge Capmany |
Greg Martin |
Max Rogers |
Hopefully others who are trying to build their security operations program can benefit from the discussion below!
The universal question that everyone had in the past comes up for me
to. We are faced with a pleasant surprise in budgets, but an
unpleasant surprise in time-to-spend-the-money.
This means I don’t have the time to do real comparisons and tests with
the various potential products. Aka: We need to choose in 1 or max 2
weeks, and don’t even have the time to do the research / webinars /
Any advice about pro’s and contra’s are definitely welcome.
- Detection of (targeted) malware and attacks (also needs to be able to automate searches based on IOCs, can be scripted)
- Adds ‘context’ for the analysis of IDS alerts (so searches must be relatively fast, aka not 10 minutes per search)
- Incident response : use IOCs to search for other victims, grouping must be possible (like group all source_ip’s)
- 2 possible modes: data comes in live (production environment) , data is batch imported (external-incident response)
- Around 30 GB of data / day cleartext
What technical, usability and price/quality pro’s and contra’s would
you have for
- ELK (Elasticsearch, Logstash, Kibana)
The goal here is not to be greedy with money. So if ELK is chosen we
will buy support and spend the ‘license price’ in a
I already have limited experience with Splunk and ELK, none with ArcSight.
Thanks a lot!
I would take the ELK route. ArcSight is going to get very expensive, very quickly. Since you will have to pay per how much you’re pumping into it. Same goes for splunk. Your dollars will most likely be better spent with ELK. ELK is very scaleable so should you want to add more log sources and your current set up can’t handle it…you just buy another machine and add it to the cluster. It’s a very simple process. You can also send almost any log format into ELK as long as you dedicate a little time to create “filters” inside logstash that normalize the data. I’ve found the configuration of these filters to be very simple and work better for when you have weird logs coming in. With arcsight you’ll probably have to deal with “Smart Connectors” which are used to do the parsing of the logs. You can make custom connectors but ELK seems to be the simplest with how easily you can filter logs.
If you have 30Gb’s a day then thats only about 900Gb’s a month. If you buy just ONE Dell PowerEdge r720 and max it out, you will have 32TB’s of disk space. This means on just one box you will be able to retain almost 3 years of logs. You just won’t get that type of retention elsewhere.
– detection of (targeted) malware and attacks (also needs to be
able to automate searches based on IOCs, can be scripted):
Use Kibana to set up queries and automate it/build dashboards and they will run on whatever interval you decide. These are fairly easy to set up. I believe it even supports sending sms or email alerts if things start popping off.
– adds ‘context’ for the analysis of IDS alerts (so searches must be
relatively fast, aka not 10 minutes per search):
In my experience of using ELK…It’s fast. Much faster than ArcSight ever was. I hear they are moving to the new CORR database though and the demo’s i’ve seen of it are fast. Splunk….splunk has always been very fast.
– Incident response : use IOCs to search for other victims, grouping
must be possible (like group all source_ip’s):
ELK does this naturally and it’s very simple. You could easily do a query to see all hosts that reached out to a certain domain over the last X days and very quickly you will know every host as long as the logs are making it into ElasticSearch.
– 2 possible modes: data comes in live (production environment) , data
is batch imported (external-incident response):
Very easy to configure any type of log to be sent to ELK. You can also very quickly spin up logstash and start sending logs to the logstash server adhoc. A big part of why Logstash is so successful is that it gives you options on how to get logs to the server. You can run the logstash agent on devices and it will “ship” logs to the central server, you can configure syslog using rsyslog, syslog-ng, or other syslog tools to send syslog to the central server, or you can use Lumberjack which has the ability to ship logs from devices and encrypt the traffic so that the logs can’t be read over the wire.
WARNING: With any of these solutions, I HIGHLY suggest that you dedicate a person full time to maintaining the logs coming in and the health of the servers. Their role will be to create the filters that make logs query-able, configure deceives to send logs to the central logging server, make sure disk and CPU are being used adequately.
TL;DR Go ahead and buy as many Dell PowerEdge r720’s as you can, max them out, get someone in to teach you all about how amazing ELK is. Then make sure someone has the cycles to maintain the logs coming in and the system health of the servers. Buy the logstash book it’s a great starter guide.
I wish you the best of luck!
At $dayjob we use both Splunk and Arcsight. Although we’re only starting to implement Arcsight (ESM + Logger) now, and my impressions are rather superficial, here they are:
With Arcsight the first decision you will have to make is whether you want physical or software appliances. You have typically two components, a logger, and an ESM (or an express appliance which is a mix of both).
Physical appliances will limit your storage options, heavily. Software appliances are basically VMs.
Logger does log management (surprise!) and provides basic analysis capabilities (I would say they are trying to mimic Splunk since logger 5).
Licensing model is deliberately confusing, or at least it was when we bought it. Logger uses different metrics for the licensing than ESM. EPS vs GB/day. At some point they also use “monitored devices” for licensing. Oh and also concurrent users of the console. They also charge for that.
Correlation appears to be decent and works as advertised so far, but we have only implemented the basic use cases, bruteforce, etc.
It has some neat visualizations out of the box.
The console is java based. There are also “read only” and feature stripped web interfaces.
Price wise, it is the most expensive. Scaling it is also expensive. You might have to swap from physical to virtual appliances, etc.
“It’s not for big data stuff”, “you cannot just throw all your logs at it, you have to filter”. You get these warnings from $consultancy when you are about to implement.
From my PoV it is aimed at first level SoC and mostly searching for “known bads”, and tracking them (it has a case management tool built in).
License is pricey, but it’s worth it (probably not as expensive as arcsight). It is also more transparent. You get X GB/day, you can have as many analysts as you want looking a it, and can come from an unlimited number of devices.
Data analysis capabilities are better compared to Arcsight and Logger. Splunk search language is powerful.
Visualization capabilities are definitely better than Arcsight and Logger, but not so “out of the box” for some use cases e.g. visualizing alerts in terms of origin and affected objects (unless you buy or get splunk apps that have already done the visualization bit). You can even use d3js for visualizations now. To me, visualization is rather important.
Newer versions allow you to pivot on data with a functionality similar to excel pivot tables.
Correlations are slightly more complicated than with Arcsight.
If you want to, for example, search your historical logs for lists of indicators, it is probably the best tool between the two.
It’s also easy to scale to reduce the time of your searches.
It makes my life easier and helps model cases to feed into Arcsight.
Fully scriptable, you can add context with lookups. REST API. And sdks for multiple languages and web frameworks.
More of an analyst framework than a SIEM per se, but will fulfil a lot of the tasks that a SIEM does easily.
Only starting to play with it now hence I cannot provide educated feedback. I am primarily looking into it for “secondary” logs that I cannot afford (yet) to have in Splunk.
My $0.02 ~ 0.014 Eur.
Happy to receive feedback, since my experience is limited by time/effort constraints.
+1 to Sam’s points. Right on target.
One caveat to note is that Splunk in it’s current maturity (especially with the Enterprise Security app) is a lot closer to being “ready to go” out of the box. Since time is an issue, you may want to seriously consider a Splunk deployment as a short term plan (year+); then slowly work ELK into what you want as a long term, scalable, management system.
Also consider long term storage, I.e duplicating streams for long term retention into something “big data”. That will likely help in the long term with queries in your “main” solution.
So far in this thread we have seen a few key factors mentioned:
- ElasticSearch is very scaleable
- The purchase of any of these technologies should come with personnel to administer the technology
- ArcSight’s pricing model can be very confusing
- ArcSight is probably the most expensive of the three options
- Splunk is very close to being ready “Out of the Box”
Love this thread. These are some of the decisions we are struggling with right now. We have been using ArcSight for years at $dayjob, so if you or anyone wants to talk specifics let me know. Just some high-level observations:
ArcSight ESM is great for SOC-type work. When you need real-time correlation, automation and workflow to support multiple people across multiple tiers (L1, L2, etc) working cases together it really shines. For example, we have a number of regex-based rules looking for exploit kit hits, followed by downloads of exe to suggest successful compromise. I don’t know how you would do these multi-stage correlation with a “search” type product, which is what I consider ELK and Splunk to be, but I could be wrong.
ArcSight Logger has been trying to catch up to Splunk for years, but they are always 2-3 years behind across all functionality Right now we are suffering with being unable to scale storage/search large volumes of data, so are looking into ELK/Hadoop. For example, we collect about 250M-275M proxy logs entries per day. Can’t use Logger to search across more than 3-5 days at a time before queries start taking too long to complete or timing out. Not exactly acceptable in my book and I am not even talking about visualization/dashboards. Those are non-existent. I’ve had a number of conversations with a lot of folks there and they obviously know about this and have been making some advancements, especially with recent changes in product management, but that’s still at least a year from being seen in the product.
As a result of the above issues our current plan is to stream logs into Hadoop and then run ELK on top snce ELK can read/index data form HDFS. Doing this instead of just straight ELK because longer term we think we’ll want to run other tools on the data in Hadoop, specifically for visualization that ELK can’t do, or machine-learning type jobs for which we’d need MapReduce functionality.
Love this thread and other recent threads on ELK, so let’s keep it going.
Can you give example of what kind of workflows you’re managing with ArchSight? Is it only search/correlation or also business-logic, report, collaboration, etc?
<disclaimer> Have only worked with Splunk </disclaimer>
One area that I immediately thought of in your use cases was host-based capabilities. I know that’s not part of the original equation, but I’d be curious to know if ELK would be cost effective enough to also consider a host-based (CarbonBlack, Tanium, etc.) logging solution (if not already present). Anyone else have any idea about that?
If you’re looking for possible context, hunting capability via a wider range of IOCs, etc. Then it would be a huge benefit to get those logs feeding in as well.
Just a tertiary thought.
In my opinion, the two platforms you are looking at can be apples and oranges depending in how you implement. I think the answer to your question lies in the amount of resources you have to dedicate to either. ArcSite – as with any other siem – requires care and feeding, but the limitations of that tool also have a limiting effect on the run rate and effort required to realize value.
Splunk is a different animal. It is vastly more powerful than a siem, but requires a large up-front investment in people, process, and tech to provide the same level of value a siem will out of the box. You are then able to take it well beyond, into some very cool stuff.
To add to the apples & oranges argument, I think it’s worthwhile that IMO Splunk’s not a SIEM and ArcSight (or any other SIEM) are not log managers/log searchers. Two different things, two different use cases and two different approaches.
If you’re into free-form, human driven, what-if searches and explorations then Splunk (or any other free-form log manipulation toolset) will fare much better. Setting up alerting utilizing regexes and defined known-bad strings in a SIEM is not the same, your success comes from the intelligence the human users running it have.
If you want workflow-driven, automated, audit-able and repeatable first-line alerting for well defined conditions (and running defined regexe’s against data realtime is defined conditions IMO) then you’ll fare better with a SIEM. You won’t do as good if you want to drill into any of the great patterns provided on this list on the fly with most SIEMs though; Splunk et al will most likely outperform.
At the end of the day it’s your usecase, and your human resources that are the factors here. At $dayjob we run both approaches, as Splunk’s not providing the strict framework we need for BAU activities while the SIEMs we checked/deploy do not fare as good with free-range searches.
Has anyone considered some way to manage multiple analysts? I’m trying to figure out how to allow collaboration between different domain experts working on the same incident.
That’s what I meant in my earlier post with workflow vs go-nuts searches. If you do the latter, you probably need a very robust ticketing system that can be used to attach logs, data, screenshots etc. while keeping access rights controlled. Very different problem space 🙂
As the conversation progressed, the following opinions surfaced:
- ArcSight’s Logger technology seems to be trying to catch up to Splunk and just can’t compete.
- ArcSight’s Logger tends to time out or take hours/days to pull back data based on queries
- ArcSight works well when a team needs to have multiple analyst working on an incident. (Decent case management structure designed to aid SOCs)
- Splunk and ELK don’t appear to be strict enough to assist in audit/compliance tracking
+1 to Roberts points here.
ELK seems an ideal longer term and retroactive alerting/search
platform… between that and splunk is more of a religious debate as
the functionality is very similar. This may very well be all that is
required depending upon your IR capabilities and overall maturity. It
“can do” nearly everything ArcSight can do, and this capability
(longer term retention with the ability to run queries and searches)
is a required first step as often you won’t have proactive
A well tuned SIEM is worth the investment in time and energy to
facilitate top level time sensitive alerting and correlation rule
application. Yes, the log normalization and optimization is a bit time
consuming at first, but the ability to watch it light up like a
Christmas tree in real time as you detect every piece of an
adversaries attack is priceless, gap analysis in your defenses is
almost instinctive in a properly tuned SIEM, as well as the ability to
perform very quick triage and preliminary DFIR investigation.
If you must chose ONE I’d go Splunk/ELK (Disclaimer I don’t have
direct experience with ELK used in this way…) or similar solution
which allows for retroactive alerting off of chron’ed searches/queries
and IOC ingestion.
Ideally this will be topped by a minimal retention (3-5 days) SIEM
deployment in which your Intrusion Analysts can live, pivot, and
triage. I love ArcSight, it quickly becomes a part of you if you
harness it’s capabilities. I’d stay away from NITRO like the plague or
anything which attempts to tell YOU what’s bad/normalize internally
based on opaque criteria, but anything that allows for effective
“active lists” “active channels” and boolean filter/query/alerting
creation is a great front end.
TL:DR: If you are mature enough to actionably respond to real time
alerting at scale, and constantly evolve those criteria based on
threat intel you want a SIEM on top of the required Longer term
logstorage/search/investigation platform (Splunk/ELK).
As others have mentioned, beyond actually “using” the technology, carefully consider who is going to operate & maintain the system(s); this is often overlooked. In my experience some platforms are easier than others, but all require it.
If you don’t have a body to do maintenance and optimization, look at getting some managed support service from a vendor. Short term you can get usability out of the system until you can get body(s) in-house to operate.
At $dayjob we have both Splunk and ArcSight and I agree with the difference in usage. Ad Hoc searches v.s. workflow, automation etc.
Great note by Nathan, both get up and digesting standard logs quickly giving 75% solution in my opinion. The rest will take talent and/or vendor support.
Every shop has resource constraints that they have to operate within. $dayjob prefers to put the money toward vendor support and we still have flexibility with these tools. This abstracts the technology from the individuals building the system to help ease support and transition.
I admit my ELK experience is limited. That said I am building out my custom platform with ELK and see how it handles.
Thanks to all for the insight. We can open source our entire infrastructure but we can not reallocate those savings for additional people. So $dayjob chose to use the money and get vendor expertise, although your mileage may vary…
That’s a great point Nathan. The cap-ex vs op-ex issue is very real.
Beyond the bare minimum (network sensors and a log storage mechanism
for incoming web/email/dns/AV/access logs). I’d take people over
product all day long. Having millions in tools and watching thousands
of alerts fall on the floor daily is not a good time.
We’re moving forward, but right now comprehensive netsec is still a
human driven game though the force multipliers (tools) are becoming
much more effective … I think an honest accounting of your staff’s
core competencies is required prior, and I’d be careful to rely on any
single points of failure on the personnel side.. especially in
Replacing highly skilled folks in this environment is a time intensive
and expensive process as the industry is not standardized/mature…
there should at least be some available talent out there for
ArcSight/Splunk given the install base. Some of the more cutting edge
and/or custom solutions carry a much higher risk on the personnel side
finding the right mix of sec-analyst/engineer/developer is neigh
I wanted to chime in here on some general lessons from doing a SIEM deployment, and learning along the way. I didn’t use either of the products mentioned in the thread, but I thought that this may be helpful.
Your use cases fall into two categories, which I will call online (live, streaming, logs as they are coming in), and offline (searching historical logs for newly identified indicators). I do both of these with a tool that wasn’t really designed to do both.
given the choice, allocate more money toward analyst salaries and less toward software.
You need dedicated people for the care and feeding of the solution itself (admins/engineers) besides analysts to process that data
Focus on a good flexible ‘back-end-database-ish” solution, and build on top of it.
Use VIPs for Syslog, because syslogs can be easily dropped
over-spec your LPD/EPS. The solution will always expand, it will never contract.
The tool that we use at $dayjob now is designed to be a live alerting SIEM. It is built on top of MS SQL Server, and the architecture is great for live alerting. That is what they designed it for. The rules language is powerful and it is easy to use. The limitation is the historical search capabilities.
The tool we used at $PreviousDayJob was a great distributed database and could return searches in no time flat. We put something else on top of it to do the SIEM analysis. The direct SQLish access was great for our analyst to build his own scripts to query against.
Lesson #1: “offline” Searching the last “N Days” of logs is much different than alarming and alerting on logs as they come in. They are very different.
Lesson #2: Not all tools do both “online” and “offline” well, some do both mediocre and neither well, some do one well and the other not so much
Lesson #1: There are limitations to VM deployments and SAN architecture that will show up in high volume log capture and SIEM deployments. Appliances are more expensive, but can be higher performance. SAN/VM deployments can be cheaper, but you have to be mindful of performance.
Lesson #2: The time it takes your people (or consultants) to tune and optimize the VM infrastructure may make it worth purchasing appliances
Lesson #3: Regardless of which you choose, you will need more of it than you think, more of it than you estimate, and more of it than what you think you know to be the proper log volumes.
Lesson #4: Some SIEM solutions just stop accepting logs when they hit their LPD/EPS/MB limits.
Lesson #5: Syslogs can be dropped (even with NG). You need to watch network volumes, especially on VM’s. You probably want to use a VIP for log collection if you have high volume syslogs.
Lesson #6: Make a list of every device you own, and get a sample of the log. Make it part of the contract that they actually can parse each of those log source types into the fields you want.
Lesson #1: Regression searches are not built in to most platforms (e.g Search for one IP address, find all sources that communicated with that malicious IP, find all IP’s that communicated with those source IP’s, filter out known (e.g. Google, Yahoo, known corporate partners), etc.). You will have to build this yourself.
Lesson #2: A good analyst is worth more than any product. If you have your choice of allocations, more analysts is better than a more complex SIEM engine to do online alerting.
Lesson #1: You need separate teams of dedicated analysts to process the data, and engineer/admins to administer the platform and keep it running
I concur that log management/search is a different core capability fromSIEM. As the vendors in different spaces have matured, they have added more features of the other (e.g Splunk is becoming more SIEM-like).
For starting out, I’m in the camp of getting your arms around the security
significant logs first for IR purposes, then layering in whatever alerting
is necessary to move forward. This drives me to start with Splunk/ELK.
Each time I’ve stood this up, we found that we got to 85-90% of the total
goal of the capability (log collection + correlation + alerting), and the
additional cost required to stand up a SIEM on top wasn’t worth the
additional value. More/Better analysts has always been our choice.
Agreed that Splunk/ELK is a bit of a religious war, and I think they
provide very similar capabilities. From my experience, I have seen Splunk
do well for a couple core reasons (each of these in my experienceŠnot
- It has been more stable
- Easier to onboard data (DBConnect, Universal Forwarder, scripted inputs). More user friendly to define field extractions/normalization after indexing
- More support from the vendor
- Better acceptance rate from analysts (Partly due to going through training but the search language also seems a bit easier)
- Enterprise Security (ES) app, which is a premium app, provides some of the capabilities typically found in a SIEM. It also has a reasonable threat feed framework which is easily extensible.
Nothing’s a silver bullet, but that’s my .00004BC
TL;DR – Go with ELK. Spend your money on a few servers and professional services for implementation.
I’ve been there, done that, with all aspects of what you’re asking. The “you need to spend it now” aspect is always a good time. Others have pointed out cap-ex vs op-ex, but it’s important to understand if this is a one-time pot of money or a perpetual bump. If it’s just one time, then your commercial products simply aren’t an option. Both ArcSight and Splunk have a significant annual spend aspect. Further, those costs will only increase with time/amount of events. Buy a bunch of servers, some proserv to help you get ELK all set up, and call it a day.
Setting aside the money aspect, the other thing I’ve been through is having used all tools you mention in a larger environment, so I’ll try and walk through our experiences with them.
We used to rely solely on ArcSight ESM. That might have worked eight years ago, but even six years ago it couldn’t keep up. It was barely able to keep up with the event throughput you’re asking about. While they have worked on event throughput, they have a fundamental architecture issue. ESM cannot scale beyond a single “app” server. This means all of the correlation aspects can only scale to the largest Linux capable server you can find. On top of that, the rules language is nonsense and you’ll need to send people to training to get the most out of it. We added ArcSight Logger about four years ago (pretty much when it came out) to get reasonable event retention, but even with 8 servers, it was dropping events and searching was stupid slow (24 hours to search over 14 days of data – searching for anything longer would cause the logger to crash). And even with all that we had spent on ArcSight (it was a lot), we weren’t even really using the event correlation capability. It was doing stupid things like X logins in Y minutes, but that doesn’t need to be done real time. A year and a half ago, we walked away from all things ArcSight.
Our grand design for ArcSight replacement was a combination of Splunk and Hadoop. We dual stream events into Splunk and Hadoop simultaneously and then set a retention in Splunk to drop events after 90 days. We shoehorned event correlation into Splunk with a combination of saved searches and a home-grown system sitting outside of Splunk that takes the results, does some aggregation, and then sends the results as alerts into our case management system. Hadoop is long term event retention as well as larger correlation/analysis jobs. For instance, we automatically mail all admins a list of their logins for the last 24 hours so they can spot any inconsistencies.
Hadoop works really well for retention and bigger jobs. About 6 months ago, we realized we had some issues with Splunk. One of them is that its scalability and high availability seems bolted on. Plus, indexers have pretty heavy IO requirements, so your servers are spendy. As our EPS kept increasing, we were teetering on a decision point, faced with having to buy another $25k server. Looking at all that, we’ve made a decision to walk away from Splunk. For what it’s worth, we’re pushing about 150k EPS today.
Our new design is ELK + Hadoop + Spark + custom event normalizer. The nice thing about this approach is that it is highly scalable (just add more cheap servers) and 100% open source. This means that our data is 100% free to do with as we please. We don’t have to worry about future flexibility as our data isn’t locked up. With the way that ELK, Hadoop, and Spark scale, we’re shooting for something in the range of 1M Events Per Second and everything tells us it’s doable. We use ElasticSearch for another open source project (moloch full packet capture) and it scales very well performance wise (we can search across 60TB of data in less than a minute, and searches for shorter time windows do come back in seconds). We’re not yet using percolators, but they have been designed for simple rules based triggers.
Right alongside our ELK install in the new architecture. We dual deliver (Flume for HDFS and Logstash for ElasticSearch) to both our short term event search system (ELK) and to our analysis and repository (Hadoop). We use Cloudera Enterprise for HDFS, MapReduce, Impala, and Spark. The management interface makes spinning up new cluster nodes or new products/features very trivial.
Ultimately, I can’t speak highly enough for ELK. It’s come a long way in a year, and there’s even cooler stuff on the horizon. We’re very happy with what we’ve seen thus far and it’s a critical part of our new log management system.
A few recommendations:
- Don’t use VMs for ElasticSearch. It -might- be possible to make it work, but we tried VMs and then switched to physical hardware. Lots of stability issues we had just magically disappeared.
- Invest a bit in ElasticSearch’s “developer” support for ELK. It’s essentially a PS engagement for a few months to help you get it all set up and tuned correctly. ElasticSearch also offers ongoing support, but it’s that developer support that will help you hit the ground running.
- Use many, cheap boxes. ES scales horizontally quite nicely. We spend roughly $4k per box and we don’t have to think twice if we need more capacity. This also means you don’t have to buy today for what you’re expecting a year from now. You can just buy for today and then add more tomorrow.
Once we actually have all of our new system laid out, I’d be happy to share if people are interested. We’re planning on open sourcing the event normalizer so that the whole thing is all open source.
At this point it seems that ArcSight has won the hearts of those who want to create a formalized and tiered Security Operations Center but ELK/Splunk type tools provide a more adaptable, dynamic, and innovative platform. Some believe Security Operation Centers should use SIEM technologies such as ArcSight for only a few days worth of live/real-time event alerting and leave log management/historical needs to ELK or Splunk:
- More talk on who will administer the technology you choose
- It seems that people use tools such as ArcSight or ELK/Splunk as both Log managment tools and SIEMs
- If your current security program is mostly human driven (someone retroactively searching for evil) go with an ELK or Splunk instance
- If your program is mature enough to respond to security incidents in real time at large scale, a SIEM may be the right move.
- “ElasticSearch scales horizontally quite nicely. We spend roughly $4k per box and we don’t have to think twice if we need more capacity. This also means you don’t have to buy today for what you’re expecting a year from now. You can just buy for today and then add more tomorrow.”
- I believe that this statement holds a lot of truth, “If you must chose ONE I’d go Splunk/ELK (Disclaimer I don’t have direct experience with ELK used in this way…) or similar solution which allows for retroactive alerting off of chron’ed searches/queries and IOC ingestion. Ideally this will be topped by a minimal retention (3-5 days) SIEM deployment in which your Intrusion Analysts can live, pivot, and triage. I love ArcSight, it quickly becomes a part of you if you harness it’s capabilities. I’d stay away from NITRO like the plague or anything which attempts to tell YOU what’s bad/normalize internally based on opaque criteria”
I hope not to offend with this, but…
Temper your expectation of Splunk Professional Services – especially on the security side. They are perfect for deployment-type, operational tasks, but fall down with anything complex or advanced…we went through several before we gave up and started requesting engineers.
Having said that, Fred Wilmot at Splunk is maybe one of the smartest people I know. They have brilliant folks, you just have to ask.
Agreed, not meant to offend either but my experience with PS was less
They struggled with basics, like field extractions, regex, and even
their own application. Sadly, we’ve never had a working ES install; but
I did utilize their TAs heavily. Be wary of their RegEx, it may work for
most, but taking a few minutes to check the transforms you use is well
I love Splunk, and spend most of my day hunting through 180 days of
data. I also invested a good bit of time field extracting as much as I
could, in a way we can use the CIM to search through as much data
effectively as possible.
Next, the discussion shifts to whether or not a SIEM type solution could be hosted in the cloud.
I’ve noticed server-investment raised a few times, and it is indeed a hassle for me too. Has anyone ever considered building their SIEM solution in the cloud?
Whilst it is possible to do (you can certainly do it with ArcSight and I see no reason why you can’t do it with the others), there would be a few considerations (one might be the delivery of logs in a timely manner if you have large amounts of data going into the cloud such as Firewall logs) and of course you also need to be blessed by your regulators to do it (if you are subject to the fantastic joy which is regulation).
I am guessing uploading to cloud will be as fast as routing to private-cloud SIEM in a central location. Especially for global orgs.
Anyone running cloud-based SIEM and would be willing to share his experience, here or privately?
We’ve played with Storm a bit (Splunk’s cloud offering), but not seriously. I have a friend who gets great use out of it as a log repo for various honeypots, but nothing more serious than that. It can be a bit delayed, depending on a number of factors.
While we’re talking about cloud here, I’d love to know how folks are getting logs from cloud providers to their store. We’ve not yet found an elegant way to do this for a couple of our providers…and they aren’t exactly helping.
I would take a look at sumo logic, it was built by the original developers of ArcSight who wanted to do cloud SIEM right.
I’ve seen many attempts to run SIEM in a virtualized environment and almost all of them failed.
I would strongly recommend against it unless your using a built from the ground up cloud product like sumo logic, loggly, Splunk cloud etc.
One more product that might be of interest.
There is one company that has a solution that is somewhat similar to the structure of ELK (in that it uses nodes for compute and storage distributing both the DB and CPU across commodity systems). It used to be known by the name “Sensage”, but now goes by Hawkeye. I received a sales call from these guys a while back and they were doing a makeover of Sensage into a full fledged SIEM. I know that the old Sensage was a good platform for log storage and queries/reporting, but I know nothing about the “new hotness” from Hexis that uses it as a back end.
Maybe the best answer is to look into using both in your infrastructure– In my experience, Splunk is a very powerful tool for initial aggregation and high-level analysis of data. I use it pretty often for gluing different data sets together and prototyping ideas, as it’s very tolerant of different log formats (check out cefkv add-on for processing ArcSight CEF files). Then I usually implement the prototypes into production analytics using open source stacks such as ElasticSearch or Hadoop.
If you’re building custom analytics or processes to search and act on data at near real-time speeds, I’d check out ElasticSearch (neat statistics capabilities, and great free-text search). If you’re feeling really ambitious Apache Storm or Druid.io for large scale statistics. If you don’t have a dedicated development team but want to build interactive workflows you can still do quite a bit using Splunk.
It seems like there are several different type of product functionality being discussed:
- Data aggregation:
– Splunk, ELK, NetWitness, etc.
- Log collection
- Packet collection
- Threat detection:
– Seems to be mainly open source/home-grown solutions.
- Data enrichment
- Threat intelligence
- Correlation functionality
– Seems to be ArcSight, but high overlap with Splunk… What else is in here?
- Audit trail / Work-logging
- I’m not sure what else goes here, but gut says this is important. Ideas?
Anything else I missed?
It seems that putting this all under the umbrella of “SIEM” doesn’t cut it anymore. Too many different requirements for one vendor to cover well. What else do you think is missing? Do you have any recommendations for every domain separately? How do you manage all the different solutions? What’s missing? Maybe there’s a place for some open source development?
Good call in pulling these out into different categories. We often use SIEM to describe all of these as one thing but it’s clear that we should be moving away from that in both discussion and in practice. The reason ArcSight fails at it’s job is because it tries to solve $WORKFLOW, $THREAT-DETECTION, and $DATA-AGGREGATION. I’d propose changing $WORKFLOW to the term $CASE-MANAGEMENT. Workflow could mean interacting with multiple tools but what we are really talking about is case management. Tools that I see being under the $CASE-MANAGEMENT section include things like:
- Jira | https://ucdavis.jira.com/wiki/display/PMO/Computer+Security+Incident+Response+Capability
- ArcSight | Lots of case management/ticket system functionality for incidents
- RSA Archer | Case management tool to track incidents, keep audit trail, collaborate on incidents
I’m sure many others can add to my limited list.
The day we stop buying SIEM’s to solve all 3 problems and use (ELK or Splunk + Jira + Netwitness or Moloch + (Snort, host based detection, real-time sandbox detection, etc) will be a good day.
Here’s how we look at it (I tried ascii art, but failed):
Log collection (rsyslog + custom log fetchers where needed) feeds into log aggregation (rsyslog) which feeds into log normalization (to be finalized, we have POC code now). The normalization tier is a bus that simultaneously feeds log search (ES) and log retention (Hadoop) and log correlation/alerting (Spark) and log enrichment (this is where threat feeds fit in). Enrichment does its thing and feeds back into log aggregation with the original log line added to the new enrichment (which all feeds back through the search/retention/correlation/alerting layer). Alerting then feeds into our case management system, which does include workflow, contact management, etc (this last bit is a $DayJob app, which we’ll be sharing for free soonish). The alerts will include enriched data, as well as pre-populated searches back into ES for easy investigations. Hadoop is there for long term retention and also running analysis jobs.
Btw, for those interested. Below is an overview of the approximate prices of Elasticsearch support.
Platinum support: 1h/4h/1d response
– per node 9k EUR
– 10 nodes : 45k EUR
– 25 nodes: 102k EUR
Gold: 4h/1d/2d response
– per node 6k EUR
– 10 nodes: 30k EUR
– 25 nodes: 68k EUR
Silver: 1d/2d/4d response
– per node : 3,4k EUR
– 10 nodes : 17k EUR
– 25 nodes : 38k EUR
Development support: 2d response
– 6 months: 20k EUR
– 3 months: 13k EUR
They also offer training.
This is fairly reasonable if you compare the total cost of their competitors (license + yearly support)
As for the support contracts, I’ll take a moment to note that those are list prices. There’s wiggle room even at those node counts. As the node counts go up, so do the discounts. Unfortunately due to NDA I can’t share what we negotiated down to on the list (but it’s much lower per node than list). For those considering support, decide if platinum is -really- worth it to you. We’ve got enough redundancy built in that even if one of our clusters explodes, we can still recover. Gold support is plenty if you prepare. Oh, and one other thing, the pricing is per node and per project. That means if you run, say, logstash on top of ElasticSearch -and- another ElasticSearch back-ended app, you’ll have to pay double. I’ve told them several times how stupid this pricing model is (we’ve been talking with them for nearly a year about support), but they’re very insistent. So be prepared.
There we have it! This conversation was one of the best I’ve seen thus far on the SIEM vs ELK & Splunk debate. I hope that others out there who are building out their security operations program can benefit from the text above. Towards the end of the conversation it appears that many analyst believe that buying one of these technologies to solve all of your SOC problems isn’t the best way to approach the problem. Lot’s of analyst believe a shift needs to happen so that SIEM technologies are responsible for alerting on real time data within ~a couple weeks. On top of that, Log management tools should be used to handle everything past 1 month of data. They are designed to take in massive amounts of logs and give analysts the ability to hunt retroactively for evil in their environment. Please feel free to comment on this post with your thoughts and questions. Thanks to all the analysts who took the time to participate in this discussion and provide their experience out with the security community!