How to scale azure functions to high throughput short lived Event Grid events - function

When publishing a large amount of events to a topic (where the retry and time to live is in the minutes) many fail to get delivered to subscribed functions. Does anyone know of any settings, or approaches to ensure scaling react quickly without dropping them all?
I am creating a Azure Function app that essentially passes events to an event grid topic at high rate, and other functions subscribed to a topic will handle the events. These events are meant to be short lived and not persist longer than a specified set of minutes. Ideally I want to see the app scale to handle the load without dropping events. the overall goal is that each event will trigger an outbound api endpoint call to my own api to test performance/load.
I have reviewed documentation on MSDN, and other locations but not much fits my scenario (most talk in terms of incoming events and not outbound http events).
For scaling I have looked into host.json settings for http (as there is none for grid events and grid events look to be similar to http triggers) and setting those seemed to have made some improvements
The end result I expect is that for every publish to a topic endpoint it gets delivered to a function and executed with a low fail delivery/drop rate.
What I am seeing is that when publishing many events to a topic (and at a consistent rate), the majority of events get dead-lettered/dropped

Consumption plan is limited by the computing power that is assigned to your function. In essence that there are some limits up to which it can scale, and then it becomes the bottle neck.
I suggest to have a look at the limitations.
And here you can some insights about computing power differences.
If you want to enable automatic scaling, or scaling in the number of vm instances I suggest using an app service plan. The cheapest option where scaling is supported is Standard pricing tier.


Identifying poor performance in an Application

We are in the process of building a high-performance web application.
Unfortunately, there are times when performance unexpectedly degrades and we want to be able to monitor this so that we can proactively fix the problem when it occurs, as opposed to waiting for a user to report the problem.
So far, we are putting in place system monitors for metrics such as server memory usage, CPU usage and for gathering statistics on the database.
Whilst these show the overall health of the system, they don't help us when one particular user's session is slow. We have implemented tracing into our C# application which is particularly useful when identifying issues where data is the culprit, but for performance reasons tracing will be off by default and only enabled when trying to fix a problem.
So my question is are there any other best-practices that we should be considering (WMI for instance)? Is there anything else we should consider building into our web app that will benefit us without itself becoming a performance burden?
This depends a lot on your application, but I would always suggest to add your application metrics into your monitoring. For example number of recent picture uploads, number of concurrent users - I think you get the idea. Seeing the application specific metrics in combination with your server metrics like memory or CPU sometimes gives valuable insights.
In addition to system health monitoring (using Nagios) of parameters such as load, disk space, etc.., we
have built-in a REST service, called from Nagios, that provides statistics on
transactions pers second (which makes sense in our case)
number of active sessions
the number of errors in the logs per minute
in short, anything that is specific to the application(s)
monitor the time it takes for a (dummy) round trip transaction: as if an user or system was performing the business function
All this data being sent back to Nagios, we then configure alert levels and notifications.
We find that monitoring the number of Error entries in the logs gives some excellent short term warnings of a major crash/issue on the way for a lot of systems.
Many of our customers use Systems and Application Monitor, which handles the health monitoring, along with Synthetic End User Monitor, which runs continuous synthetic transactions to show you the performance of a web application from the end-user's perspective. It works for apps outside and behind the firewall. Users often tell us that SEUM will reveal availability problems from certain locations, or at certain times of day. You can download a free trial at

Why is Kafka pull-based instead of push-based?

Why is Kafka pull-based instead of push-based? I agree Kafka gives high throughput as I have experienced it, but I don't see how Kafka throughput would go down if it were to pushed based. Any ideas on how push-based can degrade performance?
Scalability was the major driving factor when we design such systems (pull vs push). Kafka is very scalable. One of the key benefits of Kafka is that it is very easy to add large number of consumers without affecting performance and without down time.
Kafka can handle events at 100k+ per second rate coming from producers. Because Kafka consumers pull data from the topic, different consumers can consume the messages at different pace. Kafka also supports different consumption models. You can have one consumer processing the messages at real-time and another consumer processing the messages in batch mode.
The other reason could be that Kafka was designed not only for single consumers like Hadoop. Different consumers can have diverse needs and capabilities.
Pull-based systems have some deficiencies like resources wasting due to polling regularly. Kafka supports a 'long polling' waiting mode until real data comes through to alleviate this drawback.
Refer to the Kafka documentation which details the particular design decision: Push vs pull
Major points that were in favor of pull are:
Pull is better in dealing with diversified consumers (without a broker determining the data transfer rate for all);
Consumers can more effectively control the rate of their individual consumption;
Easier and more optimal batch processing implementation.
The drawback of a pull-based systems (consumers polling for data while there's no data available for them) is alleviated somewhat by a 'long poll' waiting mode until data arrives.
Others have provided answers based on Kafka's documentation but sometimes product documentation should be taken with a grain of salt as an absolute technical reference. For example:
Numerous push-based messaging systems support consumption at
different rates, usually through their session management primitives.
You establish/resume an active application layer session when you
want to consume and suspend the session (e.g. by simply not
responding for less than the keepalive window and greater than the in-flight windows...or with an explicit message) when you want to
stop/pause. MQTT and AMQP, for example both provide this capability
(in MQTT's case, since the late 90's). Given that no actions are
required to pause consumption (by definition), and less traffic is
required under steady stable state (no request), it is difficult to
see how Kafka's pull-based model is more efficient.
One critical advantage push messaging has vs. pull messaging is that
there is no request traffic to scale as the number of potentially
active topics increases. If you have a million potentially active
topics, you have to issue queries for all those topics. This
concern becomes especially relevant at scale.
The critical advantage pull messaging has vs push messaging is replayability. This factors a great deal into whether downstream systems can offer guarantees around processing (e.g. they might fail before doing so and have to restart or e.g. fail to write messages recoverably).
Another critical advantage for pull messaging vs push messaging is buffer allocation. A consuming process can explicitly request as much data as they can accommodate in a pre-allocated buffer, rather than having to allocate buffers over and over again. This gains back some of the goodput losses vs push messaging from query scaling (but not much). The impact here is measurable, however, if your message sizes vary wildly (e.g. a few KB->a few hundred MB).
It is a fallacy to suggest that pull messaging has structural scalability advantages over push messaging. Partitioning is what is usually used to provide scale in messaging applications, regardless of the consumption model. There are push messaging systems operating well in excess of 300M msgs/sec on hard wired local clusters...125K msgs/sec doesn't even buy admission to the show. In fact, pull messaging has inferior goodput by definition and systems like Kafka usually end up with more hardware to reach the same performance level. The benefits noted above may often make it worth the cost. I am unaware of anyone using Kafka for messaging in high frequency trading, for example, where microseconds matter.
It may be interesting to note that various push-pull messaging systems were developed in the late 1990s as a way to optimize the goodput. The results were never staggering and the system complexity and other factors often outweigh this kind of optimization. I believe this is Jay's point overall about practical performance over real data center networks, not to mention things like the open Internet.

Heartbeat monitoring system for IoT, need some suggestion with architecture [closed]

First of sorry for such an open ended question, but i did not know any other platform for posting such questions.
So basically i am working on an IoT platform where in field close to 2 million devices are supposed to be connected with few gateways. Now i have a requirement where i need to monitor heartbeat (periodic) of each device and on the basis of some missed heartbeats i want to add/remove the device from the network console.
I am planning to put a kafka queue between the devices and the gateways so that the periodic data can be queued and stored some where, however my problem starts when at a gateway level i have to monitor every heartbeat and decide which device has missed beats for a given soak period. I can't maintain a large data structure to keep the mapping, DB is going to be costly for a NRT or RT system, any suggestion how the design should be made for this.
My platform is Java driven, so any suggestions if some other Open Source platform can fit the bill or any design approach.
Your use-case is a typical async ingestion + processing that happens all the time in big data systems.
Your choice of Kafka at the events ingestion is perfect (don't forget to look at ways of monitoring Kafka as well. Most people I know assume that Kafka is the magic pill and it will solve all their problems, only to find that the time they take their system to production, the lack of monitoring around the Kafka cluster has bitten them hard)
Now on the processing part at the gateway layer, you can look at systems like Spark (streaming)/Storm/Flink. I am quite familiar with Spark and your use-case looks more like spark streaming with windowing. It scales pretty well and also has an easy development cycle if you are already familiar with Scala (their Java APIs are also pretty straightforward).
You would not need a DB unless you want to maintain historical data of which device you have taken out and when. The output of Spark streaming job (after every window of soak time) can essentially communicate to your network console and take the device down. One thing to note is the soak time you have for your application - if it is big, then you might have to provision for more machines with RAM and disk. Spark is extremely fast as it maintains the entire window of data in memory (and flushing to disk if it can't hold in RAM)

scalability of azure cloud queue

In current project we currently use 8 worker role machines side by side that actually work a little different than azure may expect it.
Short outline of the system:
each worker start up to 8 processes that actually connect to cloud queue and processes messages
each process accesses three different cloud queues for collecting messages for different purposes (delta recognition, backup, metadata)
each message leads to a WCF call to an ERP system to gather information and finally add retreived response in an ReDis cache
this approach has been chosen over many smaller machines due to costs and performance. While 24 one-core machines would perform by 400 calls/s to the ERP system, 8 four-core machines with 8 processes do over 800 calls/s.
Now to the question: when even increasing the count of machines to increase performance to 1200 calls/s, we experienced outages of Cloud Queue. In same moment of time, 80% of the machines' processes don't process messages anymore.
Here we have two problems:
Remote debugging is not possible for these processes, but it was possible to use dile to get some information out.
We use GetMessages method of Cloud Queue to get up to 4 messages from queue. Cloud Queue always answers with 0 messages. Reconnect the cloud queue does not help.
Restarting workers does help, but shortly lead to same problem.
Are we hitting the natural end of scalability of Cloud Queue and should switch to Service Bus?
I have not been able to fully understand the problem, I described it in the natual borders of Cloud Queue.
To summarize:
Count of TCP connections have been impressive. Actually too impressive (multiple hundreds)
Going back to original memory size let the system operate normally again
In my experience I have been able to get better raw performance out of Azure Cloud Queues than service bus, but Service Bus has better enterprise features (reliable, topics, etc). Azure Cloud Queue should process up to 2K/second per queue.
You can also try partitioning to multiple queues if there is some natural partition key.
Make sure that your process don't have some sort of thread deadlock that is the real culprit. You can test this by connecting to the queue when it appears hung and trying to pull messages from the queue. If that works it is your process, not the queue.
Also take a look at this to setup some other monitors:
It took some time to solve this issue:
First a summarization of the usage of the storage account:
We used the blob storage once a day pretty heavily.
The "normal" diagonistics that Azure provides out of the box also used the same storage account.
Some controlling processes used small tables to store and read information once an hour for ca. 20 minutes
There may be up to 800 calls/s that try to increase a number to count calls to an ERP system.
When recognizing that the storage account is put under heavy load we split it up.
Now there are three physical storage accounts heaving 2 queues.
The original one still keeps up to 800/s calls for increasing counters
Diagnositics are still on the original one
Controlling information has been also moved
The system runs now for 2 weeks, working like a charm. There are several things we learned from that:
No, the infrastructure is "not just there" and it doesn't scale endlessly.
Even if we thought we didn't use "that much" summarized we used quite heavily and uncontrolled.
There is no "best practices" anywhere in the net that tells the complete story. Esp. when start working with the storage account a guide from MS would be quite helpful
Exception handling in storage is quite bad. Even if the storage account is overused, I would expect some kind of exception and not just returning zero message without any surrounding information
Read complete story here: natural borders of cloud storage scalability
The scalability has a lot of influences. You may are interested in Azure Service Bus: Massive count of listeners and senders to be aware of some more pitfalls.

Read-only queue for distributed event throttling?

I'm looking for a way of throttling various processes on a cluster-wide basis. This requires some kind of centralised control that can cope with an arbitrary number of event consumers. The thought that I had involves a read-only queue that generates tokens at a certain rate with no backlog (so missed events are just discarded). For example, say I have some web API that needs to be throttled to 10,000 messages per hour, but that can be called from any number of servers in a cluster. I would configure a queue to generate tokens at 10k messages/hour, and all servers connect to that queue and retrieve a token before proceeding. This would introduce an element of latency (of 3600/10000 sec after the first request), but would be smooth and predictable regardless of consumer count. I don't want to have a backlog because I don't want to have a rush after a quiet period - the aim is not just to limit to a total number of events per hour, but to spread them evenly across it.
My main app is PHP and it's running on linux. At the moment I'm very happy with beanstalkd for normal queuing, but it doesn't support this mode of operation. I've used RabbitMQ in the past but found it heavy and fragile in comparison. It would be nice if this could be done by the queue manager itself since it needs no external input after configuration.
In the absence of specific support for something like this, I could try using an ordinary queue with a process pushing tokens into it with very short expiry, though that seems very inelegant. Any better ideas?
You can use one of this approaches:
Use topic exchange with regular queues but set message ttl to your needs. The pro of this method is that you can have small backlog, say, for last 5 seconds, which allow your application to recover after short-time issues, like network lose during some maintenance. No cons.
You can publish messages to fanout exchange and declare queues with auto-delete flag and then consume message from them. The biggest con of this way is that messages get duplicated through queues. Actually, it may be pro if you need such behavior, but you can also achieve it with topic exchange easy with additional queues with the same binding.