Large Infrastructure question
Paul Mooring
paul at getchef.com
Fri Aug 8 11:13:48 MST 2014
Allow me to present an alternative point of view:
BGP Multipathing is cool technology and can definitely fit many of the
needs for an HA service. It's been a standard for a long time and works so
well that, while I couldn't prove it, I strongly suspect that Amazon,
Microsoft, Google, $cloud_provider... all use it extensively to make
technologies like ELB and "cloud HA" solutions work. The downside of BGP
is it's expensive. You don't pay for BGP itself, but you do pay for the
networking gear to run, the data centers (or racks, or whatever) and
equipment to make sure whatever it routes too is also HA, the power usage
and monitoring required for ongoing maintenance. Those add up to a lot of
money, but those expenses seem lower after comparing them to the labor
costs of having someone around who can actually keep that system running
(and it's *extremely* rare that the person who can do that is also capable
of handling the full stack from dev to deploy).
Given those costs, think about the companies that actually need
distributed, highly available technical systems. Keep in mind technology
companies are a myth, all companies sell goods and/or services and
technology is an implementation detail. That is to say it doesn't matter
if BGP mulipathing is the best and most flexible system, the company
needing that technology only cares about knowing where traffic came from
and keeping the system up.
This is where cloud technologies come in, if you can pay $3,000/mon to a
couple of companies to accomplish your goals and eliminate capital expense
while still reducing operating expense (self hosted still has monthly
costs) while still accomplishing your immediate needs, that is the best
solution. Enterprises and startups aren't turning to cloud technologies
because they believe in pixie dust, they're doing it because they ran the
numbers and it solves their business problems for less. It always pays to
evaluate decisions with a set goal in mind and not get caught up in what
technology is "best"
On Fri, Aug 8, 2014 at 10:07 AM, Michael Butash <michael at butash.net> wrote:
> Googling it leads to much more verbose descriptions than you'll want,
> otherwise Bryan's suggestion suffices. :)
>
> Let the routing protocol that runs the internet (bgp) do your
> load-balancing, just make sure your app can service requests anywhere,
> literally.
>
> You need to understand the networking involved to make that work, or any
> *cloud* solution.
>
> It's funny, enterprises find this "cloud" story compelling, but then
> quickly realize their applications are highly state-unaware usually, and
> really can only work in active/passive capacity (ie. about anything
> windoze-based). They think somehow the cloud magically fixes this, but
> instead you have to accept that stateless, asynchronous transactions what
> make real cloud applications work.
>
> I know a fairly large org here in town that took legacy crap code (asp
> ported horribly to something .net-ish) in their website app, pushed it into
> azure because they bought the magical tale of microsoft fixing that with
> fairies and pixie dust in the service. Problem is they could only host in
> one availability zone because of the legacy stated nature of the
> application that latency between the app backend was an issue to replicate
> between zones and actually work. Then they realize Azure goes down, a
> lot. They wanted me to magically fix it with the network somehow, I had to
> them to fix their crappy app first, and no mssql enterprise clusters can't
> do synchronous replication a thousand miles away.
>
> The trick is meeting in between. More app dev's, especially remotely
> "web-ish", need to understand things like network BGP routing, anycasting,
> differences between tcp and udp, synchronous/asynchronous data flows,
> unicast vs. multicasting, etc as it *is* part of their application when
> they get to a point. This why we network guys are putting 100G in data
> centers, and across WAN's now, but it doesn't make developer apps any less
> crappy still that they actually *require* it because they refuse to believe
> there are limits to network bandwidth vs. dma or sata.
>
> I learned unix and even active directory because I was tired of stupid
> server and app people telling me to fix the network, when I found it was
> generally their applications abusing it. Very few ever take the time to
> learn the network side, especially in M$ land, and it shows in horribly
> inefficient application infrastructures. I find it's still true 15 years
> later.
>
> -mb
>
>
>
> On 08/08/2014 09:42 AM, Bryan O'Neal wrote:
>
> I am going to send you for research, because explaining it via a phone
> keyboard would be quite time consuming.
> Short version is you get one IP that resolves the the advertised systems
> with the lowest cost rout from the source. This typically means the closest
> logical cluster. It is how things like DNS are usually served.
> On Aug 8, 2014 9:26 AM, "David Schwartz" <newsletters at thetoolwiz.com>
> wrote:
>
>> What’s anycast?
>>
>> I don’t care where the servers are located. I’m just thinking that
>> it’ll work best to dedicate a specific server to serving individual
>> geographic areas.
>>
>> It’s more of a routing question, not a hosting question.
>>
>> -David
>>
>>
>>
>> On Aug 7, 2014, at 11:48 PM, Bryan O'Neal <
>> Bryan.ONeal at theonealandassociates.com> wrote:
>>
>> Sounds perfect for anycast. Many small packets, no sessions or
>> contracts, etc. However one cluster in LA, Seattle, Dallas, Ashburn, and
>> Chicago will provide exquisite northern American coverage. You don't put
>> them where the people are you put them where the network is.
>> On Aug 7, 2014 11:24 PM, "David Schwartz" <newsletters at thetoolwiz.com>
>> wrote:
>>
>>> I appreciate all of the comments. Some made sense and some were a bit
>>> over my head. I’ve only ever had to deal with a single server that required
>>> a pair of nameserver names, so most of this is relatively new to me. (All
>>> of my sites today are on a shared reseller hosting account.)
>>>
>>> A few more details might be helpful.
>>>
>>> The incoming requests will all be fairly small. Aside from the headers
>>> and API keys, the data will be under 100 bytes.
>>>
>>> At first, the servers will simply take the data and stuff it into a
>>> database, then send a simple 200 status response.
>>>
>>> Down the line, the server processes will do some simple queries, then
>>> send a custom status response code and possibly a reply message of a dozen
>>> or so bytes. The vast majority of repiles will be a simple status response.
>>> In the rare situation where we’ll need to send more data, a 302/307
>>> redirect to a process running on a different server would suffice.
>>>
>>> We’ll need to run our own app to do this. Again, it’s fairly simple.
>>> Someone suggested that launching PHP would be a lot of overhead. Perhaps a
>>> custom ISAPI module (or whatever they’re called these days) would work.
>>>
>>> As far as geo-locality, we’re looking at major metropolitan areas,
>>> like Phoenix, Tucson, Flagstaff, Las Vegas. High-density areas like LA and
>>> San Diego, and cities on the East Coast, might get split into a few smaller
>>> areas, but that would only be done after operational tests showed it would
>>> be beneficial.
>>>
>>> -David
>>>
>>>
>>>
>>> On Aug 7, 2014, at 10:40 PM, Eric Cope <eric.cope at gmail.com> wrote:
>>>
>>> I'm not sure if its what you are looking for, but I read this on
>>> Hacker News the other day:
>>>
>>> http://www.scalescale.com/rolling-your-own-cdn-build-a-3-continent-cdn-for-25-in-1-hour/
>>>
>>> Eric
>>>
>>>
>>> On Thu, Aug 7, 2014 at 8:38 PM, Joseph Sinclair <
>>> plug-discussion at stcaz.net> wrote:
>>>
>>>> In reference to your final sentence, you're looking for the kind of
>>>> services a CDN provides.
>>>> (e.g. geographic routing, and rapid scale). Something like one of the
>>>> following combinations may offer what you need (using the technologies
>>>> others have mentioned already):
>>>>
>>>> AWS with Amazon CloudFront (if your content is static)
>>>> AWS or ComputeEngine with LimeLight Networks (for static content it's
>>>> simple, but they can do dynamic, different for each request, as well for a
>>>> higher fee).
>>>> AWS or ComputeEngine with Akamai (same as LimeLight, simple for static
>>>> or they can also do dynamic for higher fees).
>>>>
>>>> AWS or ComputeEngine without CDN, This can be very coarse-grained in
>>>> that requests from a geographic region will (preferentially) go to the
>>>> datacenter in that region.
>>>> So you could differentiate Asia, Europe(EMEA, really), US-East, and
>>>> US-West with the AWS or GCE zones.
>>>>
>>>> Hopefully those suggestions help; there are many other combinations of
>>>> compute and CDN offerings, but those above represent the top two providers
>>>> in each category.
>>>>
>>>> If you needed to go it yourself, you could use something like the geoip
>>>> database (there are a few providers) to match IP to geography. That's not
>>>> hugely reliable, but it's about as good as you'll get on a global internet
>>>> where people travel and sometimes use things like Tor to hide their origin.
>>>> If you're on mobile, why not just tag the request with location from
>>>> the mobile device? That would be much more reliable than any of the other
>>>> options.
>>>>
>>>> If you're needing very precise control, then you could use the mobile
>>>> location information in a simple router service (something like NGinx or
>>>> similar with a basic region-to-server mapping) to redirect the request to
>>>> the correct locality server.
>>>>
>>>> If you're looking for extremely small (neighborhood or smaller) areas
>>>> and it's a mobile app, there are also geofencing services (similar to
>>>> Android's built-in services, see
>>>> http://developer.android.com/training/location/geofencing.html) that
>>>> identify fairly precise location and help serve different content based on
>>>> that.
>>>>
>>>> Hopefully one of those options helps point you in the direction of what
>>>> you need.
>>>>
>>>> On 08/06/2014 11:17 PM, David Schwartz wrote:
>>>> > Here�s something interesting for the infrastructure geeks on the list
>>>> ...
>>>> >
>>>> > How would you approach setting up a service that had to sink around,
>>>> oh � say � 10-20 million small HTTP POST requests per minute throughout the
>>>> day, from sources geographically distributed around the country?
>>>> >
>>>> > To do development and get the logic working, a small server is
>>>> sufficient. But it needs to scale quickly once it�s launched.
>>>> >
>>>> > There will be a high degree of geo-locality, so servers could be set
>>>> up to handle requests from different geographic areas. HTTP requests from
>>>> a given area would be routed to whatever server is dedicated for that area.
>>>> I guess their IP address could be used for that purpose?
>>>> >
>>>> > (How granular is the location data for IP addresses on mobile
>>>> devices? Are they reliable? We could add a location geotag to the packet
>>>> headers if that would help.)
>>>> >
>>>> > Note that the servers don�t need to be physically LOCATED in the
>>>> area; rather, they're dedicated to SERVING a well-defined geographic area.
>>>> >
>>>> > There�s no need for cross-talk, either. That is, there�s no need for
>>>> a server serving, say, the LA area to cross-post with one in San Diego,
>>>> except in a very small overlapping area which is easy to address.
>>>> >
>>>> > Can this sort of routing be done with a DNS service? (eg.,
>>>> DNSMadeEasy.com <http://dnsmadeeasy.com/> is one I�m familiar with)
>>>> >
>>>> > Or is something more massive needed?
>>>> >
>>>> > Also note that this would be an automated service. It has a very
>>>> steady stream of small incoming packets, peaking at various times of the
>>>> day, with limited responses. No ads, no graphics, no user interactions at
>>>> all.
>>>> >
>>>> > I know there are infrastructure services in place to handle this kind
>>>> of thing, like what Amazon offers, and others. I�m looking for any specific
>>>> pointers to services that might fit this use case profile.
>>>> >
>>>> > -David
>>>> >
>>>> >
>>>> >
>>>> > ---------------------------------------------------
>>>> > PLUG-discuss mailing list - PLUG-discuss at lists.phxlinux.org
>>>> > To subscribe, unsubscribe, or to change your mail settings:
>>>> > http://lists.phxlinux.org/mailman/listinfo/plug-discuss
>>>> >
>>>>
>>>>
>>>> ---------------------------------------------------
>>>> PLUG-discuss mailing list - PLUG-discuss at lists.phxlinux.org
>>>> To subscribe, unsubscribe, or to change your mail settings:
>>>> http://lists.phxlinux.org/mailman/listinfo/plug-discuss
>>>>
>>>
>>> ---------------------------------------------------
>>> PLUG-discuss mailing list - PLUG-discuss at lists.phxlinux.org
>>> To subscribe, unsubscribe, or to change your mail settings:
>>> http://lists.phxlinux.org/mailman/listinfo/plug-discuss
>>>
>>>
>>>
>>> ---------------------------------------------------
>>> PLUG-discuss mailing list - PLUG-discuss at lists.phxlinux.org
>>> To subscribe, unsubscribe, or to change your mail settings:
>>> http://lists.phxlinux.org/mailman/listinfo/plug-discuss
>>>
>> ---------------------------------------------------
>> PLUG-discuss mailing list - PLUG-discuss at lists.phxlinux.org
>> To subscribe, unsubscribe, or to change your mail settings:
>> http://lists.phxlinux.org/mailman/listinfo/plug-discuss
>>
>>
>>
>> ---------------------------------------------------
>> PLUG-discuss mailing list - PLUG-discuss at lists.phxlinux.org
>> To subscribe, unsubscribe, or to change your mail settings:
>> http://lists.phxlinux.org/mailman/listinfo/plug-discuss
>>
>
>
> ---------------------------------------------------
> PLUG-discuss mailing list - PLUG-discuss at lists.phxlinux.org
> To subscribe, unsubscribe, or to change your mail settings:http://lists.phxlinux.org/mailman/listinfo/plug-discuss
>
>
>
> ---------------------------------------------------
> PLUG-discuss mailing list - PLUG-discuss at lists.phxlinux.org
> To subscribe, unsubscribe, or to change your mail settings:
> http://lists.phxlinux.org/mailman/listinfo/plug-discuss
>
--
Paul Mooring
Operations Engineer
Chef
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.phxlinux.org/pipermail/plug-discuss/attachments/20140808/7bd3774e/attachment.html>
More information about the PLUG-discuss
mailing list