jump to navigation

Deprecated Means “Don’t Touch This” October 27, 2015

Posted by PythonGuy in Uncategorized.
add a comment

My favorite video game is Factorio.

The developers have been doing a great job building a remarkably stable game for something they call “experimental” or rather pre-alpha.

However, they committed a cardinal sin a few versions back, one which they did not need to do. The cardinal sin they committed was they broke backwards compatibility with mod authors.

When you’re developing software, you quickly learn that if new versions of your software doesn’t work with old version of someone else’s software, that they don’t want to use your software anymore. They call it “breaking backwards compatibility” and they call your software “broken”. After all, you had a choice and you chose to make your software fail. You broke it, and it will remain broken until everyone else updates their code or you fix the bug you introduced in your own software.

There is a way to make backwards-incompatible changes such that it isn’t backwards-incompatible. In order to explain what to do, I’m going to describe how I would replace an API that provides something called A with something that provides B. A and B are different, but they accomplish similar things. That is, you want everyone who called A to now call B.

Here’s the formula:

  1. Introduce B in parallel with A. This can be tricky, but it is necessary.
  2. Tell everyone that A is now deprecated. Do not change A anymore. If someone uses A, then somehow notify them that they should be using B.
  3. Support A until no one in the entire universe is using it anymore.
  4. Even when people stop using A, keep supporting it for as long as you can.
  5. At some point, you will want to bring in a change that will make A actually broken. Don’t do it. Instead, if they call A, return an error saying that the operation cannot work anymore.

That’s pretty much it.

Notice that A is supported for as long as possible — way beyond anything reasonable. The reason for this is simple: Old software doesn’t die. You may think you have all the old software updated, but someone somewhere does not, and if you break their software unnecessarily, then they will rightfully call your code out for being broken.

As a user of the software, I can write it so that it can talk with the new or old version, or both, if necessary. Had Factorio not been changed in such a way as to break old software, we could’ve gradually moved to the new version, kept the players happy, and been none the wiser. But instead, it’s broken for everybody, and it will be some time before we can fix it all.

As a side note, Python 3 broke backwards compatibility as well. They committed a huge sin, but one that had to be committed because there really was no way to see how to have features A and B exist together. For instance, they wanted to replace the print statement with a function, and there’s simply no way to do that. “print” either has to be a function or a statement, it can’t be both. They piled on a bunch of other backwards-incompatible changes and told people, “Python 3 is not the next version of Python 2. It is a completely different version that only bears some resemblance to each other.” See, Python 2 still works, and will work for a very long time.

And that’s what you need to do when you must break backwards compatibility: come out with something entirely new.

Advertisements

My Short Foray Into Node.js October 27, 2015

Posted by PythonGuy in Uncategorized.
add a comment

I had the opportunity a few weeks back to develop a new web app in our company’s architecture. I was given the choice of Node.js or Ruby on Rails. I went with Ruby on Rails, and here’s why.

As I started to dive into Node.js, my first reaction was “How can a project release a kajillion major versions in a few weeks?” I’m exaggerating, of course, but not by much. Node.js is simply an unstable platform right now. Stability is not a goal in the project. As a developer, I can’t spend a month writing my app only to find out the version I wrote it for is no longer supported.

Node,js also expects that I only have one version of Node.js installed on my machine. That might work in production or test environments, but it doesn’t work in a development environment. I need both versions running side-by-side so I can simultaneously develop on multiple projects. See, my code might need to run on multiple version of the software. Granted, there are some tools like RVM that are being developed, but it shows a certain lack of respect for the UNIX philosophy that it wasn’t built in from the beginning.

Or rather, it shows a level of immaturity in the developers of the project. Let me explain. In programming, like many other fields, there are things that experienced people do. As programmers develop their skills, they gradually approach these things, no matter what background they come from. For instance, version control software like CVS and Git. This is a natural desire for programmers to have once they reach a certain level of experience. Over time, their needs increase until they settle on to something very comparable to Git.

In the Unix world, where Linux lives, there is a lot of experience in the programs that are installed by default and how they are installed. When you run counter to this, not only are you trying to establish a new paradigm (which shows a lack of respect or even understanding of the existing paradigm), but you are showing disrespect to your potential users who now have to learn an entirely new skill-set unnecessarily. It’s like an app that decides it *really* wants to put the OK button on the right (or left) in the dialog windows and ignore the OS’s default settings. You just don’t do that.

At some point, some reasonable people may join the Node.js project and begin to process of normalizing the software. Until that point, it really isn’t worth me wasting my time.

Also, to all of you who say “Node.js is just so much faster!” Let me explain why this kind of comment shows a level of immaturity in your own ability.

  • The vast majority of programs on your computer are not processor bound. (“Processor bound” means that the program is limited by the speed of the CPU.)
  • The vast majority of programs on your computer are not even memory bound (limited by RAM speed or capacity.)
  • The vast majority of web apps are not even disk or I/O bound (limited by the speed of the network or the hard drive.)

Thus, if you want to make your web app faster, you will do nothing to address any of the issues. If you truly want to make your web app faster, you will focus on the one area where it is actually limited, and Node.js is completely irrelevant to that. You need to identify your own bottleneck and free your process from it. Using Node.js does not address any bottleneck I’ve ever experienced for the past ten years.

So please, don’t tell me “X is faster!” Just because one part of a pipeline is faster does not make the entire pipeline faster, and spending precious resources making the wrong part faster is simply throwing those resources out the window so your competitor can eat your lunch. Tempus fugit means you don’t have time to waste on making anything but your bottleneck faster.

One day, I’m going to explain why I want to eliminate Nginx from our company web architecture as well, and why “It’s just so much faster than Apache!” isn’t going to work. Of course, I’m going to have to figure out a way to educate our team on what makes things go fast. It’s a problem at our company right now, one I hope to resolve with careful, consistent work and non-backwards compatible changes.

Why I Don’t Use REST October 16, 2015

Posted by PythonGuy in HTTP, Networking, REST, Ruby, TCP/IP.
1 comment so far

It seems REST is pretty entrenched nowadays. Somewhere along the line, someone decided that all web developers must use REST for their API because there’s really no better way to do it.

I disagree.

I’ll begin with my criticism of REST. I end with my proposal for a replacement, a proposal that is not new nor unique.

My first criticism is that REST is emphatically NOT a standard. There is no paper or publication that I can find that proposes a REST standard, let alone one that has been agreed upon by many vendors. As far as I can tell, the best definition of REST is “the way Ruby on Rails does it”. The only reason that definition wins is because Ruby on Rails is more popular than the numerous other platforms that provides something they call REST. This criticism is often ignored, but I believe wrongly so. Without a standard, how can one learn let alone comply with it? Someone somewhere should’ve documented by now what REST means, and the fact that no one has means that probably no one can. Or, in my mind, that once they set their keyboard to documenting it, they realize how flawed and horrible it really is.

The second criticism is that it doesn’t acknowledge or even comply with a critical component of the internet known as network layers. I can’t blame new programmers and developers for their ignorance, but I can blame experienced people for not knowing the basics of networking. It seems in developing the internet, its architects made it too seamless and easy to use such that people who have no business expanding it are fully capable of doing so.

Let me explain this to you in a nutshell. If you are a developer, you should read OSI Model at Wikipedia to get the bigger picture.

You may have heard of something called HTTP and TCP/IP. These aren’t just fancy acronyms, they are critical components that make the web possible. See, when two computers attempt to communicate with one another, there is a lot that needs to happen. At the hardware level, pulses of light or electrical potential are sent down cables, or radio waves are emitted. Those pulses are received, oftentimes distorted, by another bit of hardware. What is important is that both computers not only agree on what kind of wire or cable or radio waves are used, but what those pulses mean and how they are to be interpreted. These pulses contain all the information you care about — a photo of your grandmother, a message from your boss, a request to load a customer record from a server. However, at this level, the Physical Layer, the architects who design the protocols and the engineers who hook it all up don’t care. All they care is that bit of data are sent back and forth according to those protocols. You may have heard of some of these protocols but only because you’ve had to connect the wires or buy the hardware.

Above this layer sits the Data Link Layer. Now the data sent by the Physical Layer has meaning in terms of which computer is speaking and which is supposed to listen. This layer consists of acronyms you’ve never heard of unless you work at an ISP.

Above that layer sits the “IP” of “TCP/IP”. It is the Network Layer. This layer handles routing, the conveying of a message from one computer to another through a chain of other computers. You may have heard of IPv4 and IPv6. These are two protocols that can be described as envelopes with a “to” address and a bunch of data inside. It is important to note that, like the other layers, it doesn’t care what is in the packet or how the packets are flying from one machine to another. In fact, in the journey of a packet from your computer to the server, it is likely that several different kinds of Physical and Data Link Layers are used, and IP doesn’t care. This layer is where we get IP addresses and ports from. Everything below this doesn’t even know an IP address exists.

The “TCP” in “TCP/IP” refers to the Transport Layer. This is how data that cannot fit into a single packet is conveyed between machines. Above TCP are a few other layers, some you may have heard of.

On the top of the hill of the OSI model is the Application Layer. Here sits HTTP, the granddaddy of them all. This is where you can finally tell another computer to send you an HTML document. But this is not the end. Above HTTP sits your web browser or your web server, and ultimately, your application. When the user types in your website’s URL, they are interacting at a level where they don’t even know HTTP exists.

This is a long detour, but I did it for a purpose. See, at each of these layers, there is an abstraction going on that allows all the complexities of the lower layers to be bottled up. This frees the architects, engineers and programmers working at higher levels to write the simplest code possible that is still robust and efficient. If you had to worry about what kind of internet connection the client has at the HTTP level, for instance, sending different data if they were on WiFi or if they had an ethernet cable, or if they were communicating over a cable modem or DSL, why, you job would be impossible.

When I see people using HTTP as if it were part of the app that they were writing, I want to scream. What happens when HTTP is replaced with something better that doesn’t have HTTP codes in it? I mean, we could be using IP packet headers to send our error codes, but we don’t, because we know that IP can change (and indeed, it is changing, from IPv4 to IPv6!) Or rather, I believe REST developers don’t use packet headers because the authors of the software that implement IP have packaged everything up so nicely that they don’t even know it exists.

Now, sitting atop HTTP, REST developers see the nuts and bolts and say, “Gee, I’d like to use some of that for my own purposes.” No, this is bad behavior. Just because you have knobs and levers doesn’t mean you should pull them. You should use the simplest subset of features you can get away with, and leave the knobs and levers to be pulled by those who truly understand what they are for and how they work.

The final criticism of REST is that it is object-centric. To programmers in languages like Java, this seems natural, even elegant. But to everyone else, it is horrifying and difficult. See, it has been proven that the fundamental element of programming is the function. Meaning, if you have functions alone, you have everything you need to write any type of program. Objects, on the other hand, are not sufficient. Only when you wrap functions into objects (which we call methods) do they become powerful enough to write any type of program. You can create objects with functions, but you don’t have useful objects without them.

I understand why REST is object-centric. HTTP was originally written to be a document store, and documents are a kind of object. We have since repurposed HTTP to be far more than a document store, and browsers to be far more than document fetchers and readers, but that is what it was built to be. As we’ve repurposed HTTP, we’ve left behind certain features that a document store might need. But a document store is not enough to build a full-fledged application. It is only part of an application.

Had we done things right, we might have created a new protocol for building apps on the internet. Indeed, you could say that internet architects are hard at work developing that protocol right now. Unfortunately, they are hamstrung because they have to build it on top of HTTP, because there are so many people who built their application not on top of HTTP, but deeply integrated with it. I can explain why this happened, but that is another story. There was a time when people were inventing new Application Layers to suit their needs, a time when SMTP and HTTP and more were brand new. If we want a new way to communicate over TCP/IP, we should be inventing new protocols, not torturing existing protocols.

Here is an important example of why having an object-centric API doesn’t work. Consider how you might fetch a customer record. You would access the /customers/234 for the 234 customer, right? Well, what should that return?

  • The customer might want to see all of their information, even the bits which he wouldn’t want to share with anyone else, including the company.
  • A Customer Service Representative (CSR) would like to see the history of all communications with that customer, as well as all actions taken on that customer account. Maybe there is an additional REST URL like /customers/234/history? How would the CSR filter out which history he’d like to see? /customers/234/history?from_date=20150101&type=csr_interactions maybe.
  • The person who ships the package to the customer is only interested in their name and address. They might want to know if there are any special flags, such as holds on shipping to the customer, or maybe flags that indicate there should be an email or text message sent when the package is shipped.

As you can see, the idea of “customer” is different for different people in different roles. Some would like some data, others would like other data. How do you specify which data you need? You have to add additional parameters to the REST protocol, and things start to get really messy. What happens when you want to move all the queries for full customer information to one class of server, and the queries for customer interactions to a different cluster? Do you setup a fleet of servers just to reverse proxy the requests based on the URL, or do you setup different services entirely?

REST does not give an answer to this, and indeed, trying to make things “RESTful” only makes the complicated situations far more complicated than they would have otherwise been.

Now, my proposals.

We live in a world where HTTP is the de-facto communication standard. People run their apps in browsers, and the browsers only know how to speak HTTP. Web Sockets exist (which are basic TCP/IP connections) but for the same reasons that people were forced into HTTP, Web Sockets are unlikely to succeed. It seems that we have to plug all of our application into a single port (443) and it seems we have to stick to a single protocol because too many people do not understand the full power of the internet. (Maybe games programmers can save us. They’re the last hope to building an internet on HTTP.) So my first proposal is to stop using HTTP except as a document store. We should make “web server” synonymous with some other protocol, and leave “HTTP server” to be a special kind of server that only stores, retrieves, and updates documents.

However, even though we cannot decouple our apps from HTTP yet, we CAN start writing apps that are HTTP-unaware. Here’s what such an app would look like.

  • The app code would be retrieved from some static source. This would contain all the instructions on how to start the app, including any libraries needed to bootstrap the app. It needs to be static so that it can be cached. As much of the app as possible must be static for this reason.
  • App data would be retrieved over an API layer.

That’s pretty much it.

Now, HTTP fits the bill for retrieving the static app data to a T. It is the perfect solution for this, provided that the data is truly static. Given the demand for things like CDN, this is rapidly coming true. If you can’t store your app in a CDN, it’s not static enough and you need to make it that way. But keep in mind, that if someone came up with a better way to store and retrieve static documents, you should be able to quickly and easily port to that system because you only had static files in your HTTP servers! And don’t think it isn’t coming!

The benefits of having a static file be the base of your app is that it can be cached, distributed, shared, etc… at no cost to you. We’ve seen things like Bittorrent excel at things like this. If your app can’t be distributed through Bittorrent, it’s no good to you or anyone else. When you wake up one morning and half the world wants your app, if it’s not static and distributed, you’re going to have a very bad day. This need is going to drive us away from HTTP one day. We’re already publishing our apps by uploading them to the Google and Apple store. HTTP is going to disappear just like so many protocols before it disappeared!

The API, on the other hand, should exhibit the following features:

  • Function, not object, based.
  • Functions take any kind of parameters in any configuration. (Python is a good model for this. Many languages are similar.)
  • Functions may return a result (a blob of any kind of data) or raise an exception (which is also any kind of data.)
  • A session which would contain the global or dynamic context in which the function should operate.

That’s pretty much all the API has to do. Anything on top of this is unnecessary complexity.

Now, some of the API calls return static data. Those should be called documents and stored in HTTP, and retrieved not through the API but through HTTP. So don’t put static documents in your API, keep it separate and have your API point the client to them.

WebSockets can be the foundation for this API. We can fall back to communicating over HTTP, but we must be very careful not to tie our APIs to HTTP. Meaning, we should be able to switch the API to use WebSockets, raw TCP/IP, or HTTP, with the flick of a switch.

So, build your application on HTTP today, but build it in such a way that it doesn’t have to run on HTTP. That way, when the future comes, you’ll be ready. And you’ll also be building something that others can build upon reliably.

In conclusion, I wanted to show you why REST is not a good solution. I wanted to teach you what we can replace REST with. I don’t give any specific guidance on what the future should look like, just broad pointers on where it can and probably should be.

I do appreciate comments, positive or negative. Keep in mind that ad hominem, red herring and other logical fallacies should not be employed. If you don’t know what those are and why it is bad to use them, you should go study up on logic and logical fallacies before commenting on anything ever again.