沃梦达 / 编程问答 / php问题 / 正文

PHP 分布式系统剖析

Anatomy of a Distributed System in PHP(PHP 分布式系统剖析)

本文介绍了PHP 分布式系统剖析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了一个问题,让我很难找到理想的解决方案,为了更好地解释它,我将在这里公开我的场景.

I've a problem which is giving me some hard time trying to figure it out the ideal solution and, to better explain it, I'm going to expose my scenario here.

我有一个接收订单的服务器来自几个客户.每个客户都会提交一组重复性任务应该在某个指定的时间执行间隔,例如:客户端 A 提交任务AA 应该每次执行2009-12-31 和 2009-12-31 之间的分钟2010-12-31;所以如果我的数学是对的大约有 525 600 次操作一年,给予更多的客户和任务让服务器处理所有这些任务是不可行的所以我提出了工人的想法机器.服务器将被开发在 PHP 上.

I've a server that will receive orders from several clients. Each client will submit a set of recurring tasks that should be executed at some specified intervals, eg.: client A submits task AA that should be executed every minute between 2009-12-31 and 2010-12-31; so if my math is right that's about 525 600 operations in a year, given more clients and tasks it would be infeasible to let the server process all these tasks so I came up with the idea of worker machines. The server will be developed on PHP.

工人机器只是普通的便宜基于 Windows 的计算机,我会在我家或我的工作场所主持,每个工人都会有一个专门的互联网连接(使用动态 IP)和 UPS 以避免停电.每个worker 也会每一次查询服务器通过网络服务调用 30 秒左右,获取下一个待处理的作业并处理它.工作完成后,工人将将输出提交到服务器并请求一份新工作等等,无穷无尽.如果需要扩展系统 I应该只设置一个新的工人,然后整个事情应该无缝运行.将开发工作客户端使用 PHP 或 Python.

Worker machines are just regular cheap Windows-based computers that I'll host on my home or at my workplace, each worker will have a dedicated Internet connection (with dynamic IPs) and a UPS to avoid power outages. Each worker will also query the server every 30 seconds or so via web service calls, fetch the next pending job and process it. Once the job is completed the worker will submit the output to the server and request a new job and so on ad infinitum. If there is a need to scale the system I should just set up a new worker and the whole thing should run seamlessly. The worker client will be developed in PHP or Python.

在任何时候,我的客户都应该能够登录到服务器并检查他们订购的任务的状态.

At any given time my clients should be able to log on to the server and check the status of the tasks they ordered.

现在是棘手的部分:

  • 我必须能够重建已经处理的任务,如果对于某些服务器出现故障的原因.
  • 工作人员不是特定于客户的,一名工人应该处理工作任何给定数量的客户.

我对一般数据库设计以及要使用的技术有一些疑问.

I've some doubts regarding the general database design and which technologies to use.

最初我想使用多个 SQLite 数据库并将它们全部加入服务器,但我不知道如何按客户端分组以生成作业报告.

Originally I thought of using several SQLite databases and joining them all on the server but I can't figure out how I would group by clients to generate the job reports.

我从未真正使用过以下任何技术:memcachedCouchDBHadoop 等等,但我愿意想知道其中任何一个是否适合我的问题,如果是,您为新手推荐哪个是分布式计算"(或者这是并行的?)像我一样.请记住,worker 具有动态 IP.

I've never actually worked with any of the following technologies: memcached, CouchDB, Hadoop and all the like, but I would like to know if any of these is suitable for my problem, and if yes which do you recommend for a newbie is "distributed computing" (or is this parallel?) like me. Please keep in mind that the workers have dynamic IPs.

就像我之前说的,我在通用数据库设计方面也遇到了麻烦,部分原因是我还没有选择任何特定的 R(D)DBMS,而是我已经选择了一个问题,我认为它与我的 DBMS 无关选择与排队系统有关......我是否应该预先计算特定作业的所有绝对时间戳并拥有大量时间戳,执行并将它们标记为完成升序或者我应该有一个更聪明的系统,比如when timestamp mod 60 == 0 -> execute".这个聪明"系统的问题在于某些作业不会按顺序执行,因为有些工作人员可能会等待无所事事而其他工作人员过载.您有什么建议?

Like I said before I'm also having trouble with the general database design, partly because I still haven't chosen any particular R(D)DBMS but one issue that I've and I think it's agnostic to the DBMS I choose is related to the queuing system... Should I precalculate all the absolute timestamps to a specific job and have a large set of timestamps, execute and flag them as complete in ascending order or should I have a more clever system like "when timestamp modulus 60 == 0 -> execute". The problem with this "clever" system is that some jobs will not be executed in order they should be because some workers could be waiting doing nothing while others are overloaded. What do you suggest?

PS:我不确定这个问题的标题和标签是否正确反映了我的问题以及我正在尝试做的事情;如果不是,请相应地进行编辑.

感谢您的意见!

@timdev:

  1. 输入将是一个非常小的 JSON 编码字符串,输出也将是一个 JSON 编码字符串,但要大一点(大约 1-5 KB).
  2. 将使用来自 Web 的多个可用资源计算输出,因此主要瓶颈可能是带宽.数据库写入也可能是一个 - 取决于 R(D)DBMS.

推荐答案

看起来您即将重新创建 Gearman.以下是 Gearman 的介绍:

It looks like you're on the verge of recreating Gearman. Here's the introduction for Gearman:

Gearman 提供了一个通用的应用程序将工作外包给其他人的框架更好的机器或流程适合做这项工作.它可以让你并行工作,加载平衡处理,并调用语言之间的功能.有可能用于各种应用,从高可用性网站到数据库复制的传输事件.换句话说,它是神经系统如何分布处理通信.

Gearman provides a generic application framework to farm out work to other machines or processes that are better suited to do the work. It allows you to do work in parallel, to load balance processing, and to call functions between languages. It can be used in a variety of applications, from high-availability web sites to the transport of database replication events. In other words, it is the nervous system for how distributed processing communicates.

您可以用 PHP 编写客户端和后端工作程序代码.

You can write both your client and the back-end worker code in PHP.

关于为 Windows 编译的 Gearman 服务器的问题:我不认为它可以在为 Windows 预先构建的整洁包中使用.Gearman 仍然是一个相当年轻的项目,他们可能还没有成熟到可以为 Windows 生产现成的发行版.

Re your question about a Gearman Server compiled for Windows: I don't think it's available in a neat package pre-built for Windows. Gearman is still a fairly young project and they may not have matured to the point of producing ready-to-run distributions for Windows.

Sun/MySQL 员工 Eric Day 和 Brian Aker 提供了教程2009 年 7 月 OSCON 上的 Gearman,但他们的幻灯片只提到了 Linux 包.

Sun/MySQL employees Eric Day and Brian Aker gave a tutorial for Gearman at OSCON in July 2009, but their slides mention only Linux packages.

这是 Perl CPAN Testers 项目的链接,表明可以使用 Microsoft C 编译器 (cl.exe) 在 Win32 上构建 Gearman-Server,并且它通过了测试:http://www.nntp.perl.org/group/perl.cpan.testers/2009/10/msg5521569.html 但我猜你必须下载源代码并自己构建.

Here's a link to the Perl CPAN Testers project, that indicates that Gearman-Server can be built on Win32 using the Microsoft C compiler (cl.exe), and it passes tests: http://www.nntp.perl.org/group/perl.cpan.testers/2009/10/msg5521569.html But I'd guess you have to download source code and build it yourself.

这篇关于PHP 分布式系统剖析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本文标题为:PHP 分布式系统剖析

基础教程推荐