Accessibility for test automation

; Date: Tue Jul 19 2005

Tags: Java

This is prompted by a feature request I saw on the ( JDIC bulletin board:

As I mentioned earlier, I've spent more than a few brain cycles thinking about GUI test automation. It's been so much that at times I've been worried about my sanity - how many people do you know daydream about the minute details of GUI interactions, looking for ways to drive the interaction by a program? At least I don't talk to myself ... too often.

But I digress.

The JDIC thread is a feature request where the author asks: More generally, it'd be good to be able to interface a bit more with other applications running on the same machine, grabbing their windows as graphics, activating menus (that would let us write macro scripting applications).

Yeah - I'd like those abilities too. A large issue in my team (the JSE SQE Team) is testing the Java Plugin and JWS features. Because they relate with web browsers, a lot of the test scenarios involving Plugin and JWS involve browser interaction. And, because our product (Java) is cross-platform we have to test those browser interactions across platform (well, Windows, Linux and Solaris anyway). It ends up costing us a lot of manual labor due to manual testing due to the lack of automation tools to cover the interactions required in these scenarios.

Hheck, yeah, we really want to be able to "interface more" with web browsers from a Java test.

By way of offering my proposed answer, let me point to an article I read long ago: p ( Scripting the Unscriptable in Mac OS X

The "unscriptable" in this case is any GUI application ... Since the article is about how to "script" GUI interactions on Mac OS X, the author first describes the past. In the past they would use the QuicKeys application to "hack" the system to simulate a "ghost" user who would press keys and move/click the mouse. Sounds a lot like the Robot class but with a nice GUI around it to help the user do the work.

On OS X the QuicKeys approach doesn't work because OS X is more secure. At least that's what the article says.

But, enter Uncle Sam and the Section 508 requirements that are what's driving the push to Accessibility. Those requirements have turned into this scenario being played out with every desktop GUI "vendor": Given any application on Mac OS X, it needs to be possible for some other application to discover what interface elements it is displaying. The other application needs to be able to "read" these elements ("There's a button that says OK") and it needs to be able to access them (click the OK button).

What that means is that Apple developed a programming interface to support Accessibility "agent" software in making OS X accessible to the disabled. That programming interface allows an accessibility agent to query and discover the location and characteristics of other applications running on the computer screen. And, further, it was not just Apple who developed these programming interfaces ... no ... Microsoft developed an accessibility tool API ... and so has Sun. Sun has engineers working on accessibility for both GNOME and Mozilla, donating code to support accessibility to both of those projects.

The next step of the "unscriptable" article is to put 2+2 together. Namely, Apple provides with OS X the AppleScript tool. To "script" the "unscriptable" one merely needed a way to make calls to the accessibility support from AppleScript, and that's what the SystemEvents "application" does. It allows AppleScript programs to query the accessibility support and then interact with (a.k.a. "script") any GUI application, and do the interaction as if it's a real user with a keyboard and mouse.

Unfortunately that solution is limited to one platform: Mac OS X.

Fortunately it points to a solution for Java testers. One merely needs to make the same architecture. Namely: write an interface in JNI that exposes accessibility "agent" support to Java applications. With that support exposed to Java, then we can use it with Robot and with the tools such as the ones listed here that build on Robot, then be able to automate interactions with any GUI application. And we would be able to do so on ANY platform (due to it being in Java).

Source: (


There ARE ways of doing GUI test automation... and I don't want to get into marketing here, but consider using VNC (see Wikipedia if you haven't experienced it), building in methods of fast image comparisons and a scripting language, and you're on your way. VNC is really pretty handy, and there are VNC Servers for any platform, and any obscure platform that doesn't have one can use a hardware device hooked into keyboard/mouse/monitor, which makes even the boot process (to set parameters, change what you're booting from, etc...) obervable and scriptable remotely. So yes, people HAVE thought about this in a multi-platform way, and with a lot less hassle of going through accessibility support.

Posted by: bjanzen on July 22, 2005 at 12:01 PM

Yes, I'm aware of using VNC. I also spent awhile evaluating a similar approach by Test Quest.

Let me point you again to TestingGUIApplications which points to the tool you're probably talking about.

The approach is somewhat attractive. However I'm concerned on one point, and I wonder what you have to say about this concern.

Namely: GUI's can change in subtle ways for valid reasons. When you're looking for a component based on its visual rendering, then changes in the visual rendering will tend break the test. If tests break there's an overhead in maintaining the tests, e.g. recapturing the images you use for reference in order to look for components.

For example ... During the Tiger release cycle the J2SE engineers spent a long time on the font quality trying to align more closely with the native platform fonts. They're doing the same in Mustang, for that matter. This meant changes to the way text appeared on the screen, and these changes were very necessary to improve Java. The tests we had which relied on screen captures to find things broke because of these rendering changes, and we had to recapture those images. And this happened not just once, but multiple times, because the engineers made multiple iterations of font rendering.

For the record I have thought about this (VNC for test automation) too.

Posted by: robogeek on July 22, 2005 at 12:20 PM

And I have to qualify this with "I work for Redstone Software", which is listed on that page but doesn't get listed under Image-based component discovery, which it is. So nothing more than a couple of simple technical things: You're correct, it does take some engineering to compensate for, using Mac OS X's Aqua interface as an example, pulsating buttons, anti-aliased text, etc... so there are image comparison types of "text" or "pulsing" and degrees of tolerance as well. Hopefully, we're getting better by thinking of these issues daily to try to avoid recapturing images when what you want really IS on the screen within your tolerance level.

Now if you're not concerned with the exact look of the text, but want to just snag some text from the system you're testing, VNC also contains a clipboard, so that if your text is selectable, it can be put on the clipboard and grabbed for further analysis / use.

Posted by: bjanzen on July 22, 2005 at 02:01 PM

An interesting line of thought I had about visually finding components is -- what about graphically looking for patterns that indicate components. e.g. You know what a button border looks like, so write an algorithm that finds button borders regardless of the size. Then you could write a program using that algorithm that finds all buttons on the screen. Then if you included OCR capability you could find all text on the screen regardless of the font, or details in the rendering changes.

Well, one of my colleagues prototyped some code to do that (except for the OCR part). The problems with that boiled down to the L&F; issue. Swing allows multiple L&F; to the point that you could switch L&F; at runtime. Oh, and it was very time consuming to write the pattern matching code. He made good progress with recognizing AWT components, on one platform, but it was going to take the same long time to recognize AWT components elsewhere, not to mention recognizing Swing components. We gave up on that approach.

Now if you're not concerned with the exact look of the text, but want to just snag some text from the system you're testing, VNC also contains a clipboard, so that if your text is selectable, hmm... text on a button is generally not selectable. Text in a tooltip is generally not selectable. Text in a listbox or combobox is generally not selectable. Text in a text area is selectable, but it's rather difficult to figure out the coordinates to move the mouse to so that you can make the selection in the first place, that is, unless you want to do SelectAll and read the whole textarea. But this ability to read the clipboard gives you an advantage over testquest, since testquest's connection with the box is soley keyboard/mouse/video.

The qualm I have about tolerance levels is - A few years ago I was reviewing support escalations against AWT in 1.2.2(?). One of them involved a text rendering in some Eastern European language where there was one pixel misplaced. But that one pixel was important enough to cause the bug filing and the subsequent escalation of the issue to the Java developers. One pixel. That one pixel would have been lost in the tolerance settings.


Posted by: robogeek on July 25, 2005 at 07:38 AM

it's rather difficult to figure out the coordinates to move the mouse to so that you can make the selection in the first place...

Not really. You can do image capture and select a hot spot outside of the image you're capturing - it could be anywhere. Our users of Eggplant do this a lot,. You can almost always find something graphically to key off of (some text label, key window, etc...). Your hot spot is now where you start the mouse drag to select text, and then Ctrl-c / Linux, Windows or Cmd-c / Mac and it's on the clipboard.

Now the OCR part - that's not trivial. I've tried plugging in things like Omnipage or ReadIris Pro, and they simply don't do OCR at screen resolution (and I've checked directly with ReadIris Pro to confirm this). It takes more than simply plugging in an OCR engine and feeding it screen images. Stay tuned.

Posted by: bjanzen on July 25, 2005 at 09:11 AM

See ( for further development of this proposal. Posted by: tlroche on July 31, 2005 at 07:57 AM

Better late than never, right? Here goes...

When writing Java Swing applications, I use the following tools/frameworks for testing:

JUseCase - a framework for adding scripting capabilities to swing components. Log4j - For creating plain-text behavior logs (screen layout can also be logged in plain-text...). TextTest - for running tests and comparing plain-text output from my application (the behavior logs)

Posted by: xipher on June 27, 2006 at 05:33 AM

About the Author(s)

David Herron : David Herron is a writer and software engineer focusing on the wise use of technology. He is especially interested in clean energy technologies like solar power, wind power, and electric cars. David worked for nearly 30 years in Silicon Valley on software ranging from electronic mail systems, to video streaming, to the Java programming language, and has published several books on Node.js programming and electric vehicles.