Some months ago, my team was assigned a new project which had some complicated issues that resulted in us learning some useful lessons. We had to radically change and adapt our way of working, regarding methods and technical skills. Even though the initial proposal conditions for the project were the worst ever, in the end, we got the best out of it, and that is the story I’m going to try to tell here.
Follow the result of this story by reading our blog post Hacking Your Telecoms Systems – Part II
The project itself was atypical and unusual from its very base requirements. We had to integrate an internal Intraway platform with 10+ external systems for a potential new customer. The real problem was that for all the integrations no formal API -Application Program Interface- (HTTP, custom or whatsoever) was given. Neither was a system description, a help manual or even the possibility to get in touch with people from the technical team that develops and maintains those systems. The only thing that we were provided with was the access to those systems and a specification to reproduce only the operations that we would have to integrate. The final objective was clear: to connect our product to those external systems in a non-interactive way. However, the way to do it or achieve it was entirely unclear.
Before we got involved in the project, one of the proposed solutions was to do the integration by GUI (Graphical User Interface) macros, called through an API, to reproduce the use of user dialogues for the clients of the systems silently. Platforms such as autohotkey.com and seleniumhq.org provide this kind of functionality. Obviously, we strongly refused to go this way and started to think about orthogonal solutions.
The silent approach to the black-box style of using interactive screen automation has a lot of interesting advantages. One of the most important, especially around this specific project according to the client requests, was that fewer computing resources are involved than the screen automation way. The system as a whole must support intense concurrency operations, which in the scenery of the automation implies that you need a simulation (of clicks and keyboard inputs) to occur in parallel, each one in a different instance of browsers and operating systems, probably using an array of virtual machines running the screens automation scripts. This solution only scales using powerful servers because is extremely expensive regarding computational power. On the other hand, simulating precisely the protocols, with the right technology could be a much more light way approach. Aligned with these performance issues, hacking the protocols allowed us to overstep some useless communication parts of the processing needed to be done to achieve some tasks, for example to subscribe a user into a TV package we didn’t have to go to the bundles selection screen because we had an internal cache of the codes to apply in the subscribe operation itself, like this in many cases the GUI forced the user to follow extended wizards that we were able to skip to execute the action needed.
Also, hacking the protocols, in some way we were ahead of simple GUI changes from third parties developers, more able to interpret and react to any errors between the client and the server, we were also avoiding tons of problems related to windows forms and pop-ups synchronization issues (specially if you want to speed-up the process as required). Finally, another great advantage is having the entire system processing in one process rather than in several screen-scripts processes, all of them running with different users at the same time. It implies another specific process to orchestrate this array of CPU-consuming screen-bots.
Our team has skills from low-level languages (C++, C, assembly, etc.), so we weren’t intimidated by the idea of hacking those clients, and we immediately started to consider building connectors that simulated the clients and communicated through the network with the real back-ends, faking the original clients.
At this point, some of the risks were clear: doing reverse engineering to one network communications system is hard, but in this case, we had more than 10 different systems. To make matters even worse, none of the systems were developed by our customer directly; they were maintained and developed by 3rd party companies (their telephony system providers). Some other challenges were associated with the fact that we were testing the operations that we had to reproduce with real users and real data from real people, even without manuals or descriptions to embrace the concepts that we were dealing with. In this field, a lot of intuition was needed because we were indeed going to build a generic API. On top of everything, there was the fact that we did not only need to do it right, but we also had to do it fast, because that was the commitment required by Intraway to win the business.
The language certainly was another barrier. The entire project was for a Portuguese customer, the first one in the company history (that was a huge responsibility for us too). No one on our team was fluent in the Brazilian language, and while there are similarities between Spanish and Portuguese, everyone here knows that maybe it is easier to read or understand the spoken word. The real problem was that in several cases we had to interact with the customer via email or chat messages, and that was a substantial, difficult task.
We were ready to do it. Want to know if we succeeded? Keep reading Part II and find out!