Re: Large Infrastructure question

Top Page
Attachments:
Message as email
+ (text/plain)
+ (text/html)
+ (text/plain)
Delete this message
Reply to this message
Author: Michael Butash
Date:  
To: plug-discuss
Subject: Re: Large Infrastructure question
Googling it leads to much more verbose descriptions than you'll want,
otherwise Bryan's suggestion suffices. :)

Let the routing protocol that runs the internet (bgp) do your
load-balancing, just make sure your app can service requests anywhere,
literally.

You need to understand the networking involved to make that work, or any
*cloud* solution.

It's funny, enterprises find this "cloud" story compelling, but then
quickly realize their applications are highly state-unaware usually, and
really can only work in active/passive capacity (ie. about anything
windoze-based). They think somehow the cloud magically fixes this, but
instead you have to accept that stateless, asynchronous transactions
what make real cloud applications work.

I know a fairly large org here in town that took legacy crap code (asp
ported horribly to something .net-ish) in their website app, pushed it
into azure because they bought the magical tale of microsoft fixing that
with fairies and pixie dust in the service. Problem is they could only
host in one availability zone because of the legacy stated nature of the
application that latency between the app backend was an issue to
replicate between zones and actually work. Then they realize Azure goes
down, a lot. They wanted me to magically fix it with the network
somehow, I had to them to fix their crappy app first, and no mssql
enterprise clusters can't do synchronous replication a thousand miles away.

The trick is meeting in between. More app dev's, especially remotely
"web-ish", need to understand things like network BGP routing,
anycasting, differences between tcp and udp, synchronous/asynchronous
data flows, unicast vs. multicasting, etc as it *is* part of their
application when they get to a point. This why we network guys are
putting 100G in data centers, and across WAN's now, but it doesn't make
developer apps any less crappy still that they actually *require* it
because they refuse to believe there are limits to network bandwidth vs.
dma or sata.

I learned unix and even active directory because I was tired of stupid
server and app people telling me to fix the network, when I found it was
generally their applications abusing it. Very few ever take the time to
learn the network side, especially in M$ land, and it shows in horribly
inefficient application infrastructures. I find it's still true 15
years later.

-mb


On 08/08/2014 09:42 AM, Bryan O'Neal wrote:
>
> I am going to send you for research, because explaining it via a phone
> keyboard would be quite time consuming.
> Short version is you get one IP that resolves the the advertised
> systems with the lowest cost rout from the source. This typically
> means the closest logical cluster. It is how things like DNS are
> usually served.
>
> On Aug 8, 2014 9:26 AM, "David Schwartz" <
> <mailto:newsletters@thetoolwiz.com>> wrote:
>
>     What's anycast?

>
>     I don't care where the servers are located. I'm just thinking that
>     it'll work best to dedicate a specific server to serving
>     individual geographic areas.

>
>     It's more of a routing question, not a hosting question.

>
>     -David

>
>
>
>     On Aug 7, 2014, at 11:48 PM, Bryan O'Neal
>     <
>     <mailto:Bryan.ONeal@theonealandassociates.com>> wrote:

>
>>     Sounds perfect for anycast. Many small packets, no sessions or
>>     contracts, etc. However one cluster in LA, Seattle, Dallas,
>>     Ashburn, and Chicago will provide exquisite northern American
>>     coverage. You don't put them where the people are you put them
>>     where the network is.

>>
>>     On Aug 7, 2014 11:24 PM, "David Schwartz"
>>     < <mailto:newsletters@thetoolwiz.com>>
>>     wrote:

>>
>>         I appreciate all of the comments. Some made sense and some
>>         were a bit over my head. I've only ever had to deal with a
>>         single server that required a pair of nameserver names, so
>>         most of this is relatively new to me. (All of my sites today
>>         are on a shared reseller hosting account.)

>>
>>         A few more details might be helpful.

>>
>>         The incoming requests will all be fairly small. Aside from
>>         the headers and API keys, the data will be under 100 bytes.

>>
>>         At first, the servers will simply take the data and stuff it
>>         into a database, then send a simple 200 status response.

>>
>>         Down the line, the server processes will do some simple
>>         queries, then send a custom status response code and possibly
>>         a reply message of a dozen or so bytes. The vast majority of
>>         repiles will be a simple status response. In the rare
>>         situation where we'll need to send more data, a 302/307
>>         redirect to a process running on a different server would
>>         suffice.

>>
>>         We'll need to run our own app to do this. Again, it's fairly
>>         simple. Someone suggested that launching PHP would be a lot
>>         of overhead. Perhaps a custom ISAPI module (or whatever
>>         they're called these days) would work.

>>
>>         As far as geo-locality, we're looking at major metropolitan
>>         areas, like Phoenix, Tucson, Flagstaff, Las Vegas.
>>          High-density areas like LA and San Diego, and cities on the
>>         East Coast, might get split into a few smaller areas, but
>>         that would only be done after operational tests showed it
>>         would be beneficial.

>>
>>         -David

>>
>>
>>
>>         On Aug 7, 2014, at 10:40 PM, Eric Cope <
>>         <mailto:eric.cope@gmail.com>> wrote:

>>
>>>         I'm not sure if its what you are looking for, but I read
>>>         this on Hacker News the other day:
>>>         http://www.scalescale.com/rolling-your-own-cdn-build-a-3-continent-cdn-for-25-in-1-hour/

>>>
>>>         Eric

>>>
>>>
>>>         On Thu, Aug 7, 2014 at 8:38 PM, Joseph Sinclair
>>>         <
>>>         <mailto:plug-discussion@stcaz.net>> wrote:

>>>
>>>             In reference to your final sentence, you're looking for
>>>             the kind of services a CDN provides.
>>>             (e.g. geographic routing, and rapid scale).  Something
>>>             like one of the following combinations may offer what
>>>             you need (using the technologies others have mentioned
>>>             already):

>>>
>>>             AWS with Amazon CloudFront (if your content is static)
>>>             AWS or ComputeEngine with LimeLight Networks (for static
>>>             content it's simple, but they can do dynamic, different
>>>             for each request, as well for a higher fee).
>>>             AWS or ComputeEngine with Akamai (same as LimeLight,
>>>             simple for static or they can also do dynamic for higher
>>>             fees).

>>>
>>>             AWS or ComputeEngine without CDN, This can be very
>>>             coarse-grained in that requests from a geographic region
>>>             will (preferentially) go to the datacenter in that region.
>>>             So you could differentiate Asia, Europe(EMEA, really),
>>>             US-East, and US-West with the AWS or GCE zones.

>>>
>>>             Hopefully those suggestions help; there are many other
>>>             combinations of compute and CDN offerings, but those
>>>             above represent the top two providers in each category.

>>>
>>>             If you needed to go it yourself, you could use something
>>>             like the geoip database (there are a few providers) to
>>>             match IP to geography.  That's not hugely reliable, but
>>>             it's about as good as you'll get on a global internet
>>>             where people travel and sometimes use things like Tor to
>>>             hide their origin.
>>>             If you're on mobile, why not just tag the request with
>>>             location from the mobile device?  That would be much
>>>             more reliable than any of the other options.

>>>
>>>             If you're needing very precise control, then you could
>>>             use the mobile location information in a simple router
>>>             service (something like NGinx or similar with a basic
>>>             region-to-server mapping) to redirect the request to the
>>>             correct locality server.

>>>
>>>             If you're looking for extremely small (neighborhood or
>>>             smaller) areas and it's a mobile app, there are also
>>>             geofencing services (similar to Android's built-in
>>>             services, see
>>>             http://developer.android.com/training/location/geofencing.html)
>>>             that identify fairly precise location and help serve
>>>             different content based on that.

>>>
>>>             Hopefully one of those options helps point you in the
>>>             direction of what you need.

>>>
>>>             On 08/06/2014 11:17 PM, David Schwartz wrote:
>>>             > Here?s something interesting for the infrastructure
>>>             geeks on the list ...
>>>             >
>>>             > How would you approach setting up a service that had
>>>             to sink around, oh ? say ? 10-20 million small HTTP POST
>>>             requests per minute throughout the day, from sources
>>>             geographically distributed around the country?

>>>             >
>>>             > To do development and get the logic working, a small
>>>             server is sufficient. But it needs to scale quickly once
>>>             it?s launched.

>>>             >
>>>             > There will be a high degree of geo-locality, so
>>>             servers could be set up to handle requests from
>>>             different geographic areas.  HTTP requests from a given
>>>             area would be routed to whatever server is dedicated for
>>>             that area. I guess their IP address could be used for
>>>             that purpose?

>>>             >
>>>             > (How granular is the location data for IP addresses on
>>>             mobile devices? Are they reliable? We could add a
>>>             location geotag to the packet headers if that would help.)

>>>             >
>>>             > Note that the servers don?t need to be physically
>>>             LOCATED in the area; rather, they're dedicated to
>>>             SERVING a well-defined geographic area.

>>>             >
>>>             > There?s no need for cross-talk, either. That is,
>>>             there?s no need for a server serving, say, the LA area
>>>             to cross-post with one in San Diego, except in a very
>>>             small overlapping area which is easy to address.

>>>             >
>>>             > Can this sort of routing be done with a DNS service?
>>>              (eg., DNSMadeEasy.com <http://dnsmadeeasy.com/> is one
>>>             I?m familiar with)

>>>             >
>>>             > Or is something more massive needed?

>>>             >
>>>             > Also note that this would be an automated service. It
>>>             has a very steady stream of small incoming packets,
>>>             peaking at various times of the day, with limited
>>>             responses. No ads, no graphics, no user interactions at all.

>>>             >
>>>             > I know there are infrastructure services in place to
>>>             handle this kind of thing, like what Amazon offers, and
>>>             others. I?m looking for any specific pointers to
>>>             services that might fit this use case profile.

>>>             >
>>>             > -David

>>>             >

>>>             >

>>>             >
>>>             > ---------------------------------------------------
>>>             > PLUG-discuss mailing list -
>>>             
>>>             <mailto:PLUG-discuss@lists.phxlinux.org>
>>>             > To subscribe, unsubscribe, or to change your mail
>>>             settings:
>>>             > http://lists.phxlinux.org/mailman/listinfo/plug-discuss

>>>             >

>>>
>>>
>>>             ---------------------------------------------------
>>>             PLUG-discuss mailing list -
>>>             
>>>             <mailto:PLUG-discuss@lists.phxlinux.org>
>>>             To subscribe, unsubscribe, or to change your mail settings:
>>>             http://lists.phxlinux.org/mailman/listinfo/plug-discuss

>>>
>>>
>>>         ---------------------------------------------------
>>>         PLUG-discuss mailing list - 
>>>         <mailto:PLUG-discuss@lists.phxlinux.org>
>>>         To subscribe, unsubscribe, or to change your mail settings:
>>>         http://lists.phxlinux.org/mailman/listinfo/plug-discuss

>>
>>
>>         ---------------------------------------------------
>>         PLUG-discuss mailing list - 
>>         <mailto:PLUG-discuss@lists.phxlinux.org>
>>         To subscribe, unsubscribe, or to change your mail settings:
>>         http://lists.phxlinux.org/mailman/listinfo/plug-discuss

>>
>>     ---------------------------------------------------
>>     PLUG-discuss mailing list - 
>>     <mailto:PLUG-discuss@lists.phxlinux.org>
>>     To subscribe, unsubscribe, or to change your mail settings:
>>     http://lists.phxlinux.org/mailman/listinfo/plug-discuss

>
>
>     ---------------------------------------------------
>     PLUG-discuss mailing list - 
>     <mailto:PLUG-discuss@lists.phxlinux.org>
>     To subscribe, unsubscribe, or to change your mail settings:
>     http://lists.phxlinux.org/mailman/listinfo/plug-discuss

>
>
>
> ---------------------------------------------------
> PLUG-discuss mailing list -
> To subscribe, unsubscribe, or to change your mail settings:
> http://lists.phxlinux.org/mailman/listinfo/plug-discuss


---------------------------------------------------
PLUG-discuss mailing list -
To subscribe, unsubscribe, or to change your mail settings:
http://lists.phxlinux.org/mailman/listinfo/plug-discuss