Welcome.
I feel I am far enough along in the process of making visualisations of molecular chemistry to share...

Potato-virus-A-isolates-200px (3K) Visualisation-of-sequence-data-200px (17K) TMP-concept-200px (2K)
A B C

My work is made using (open source, multi platform) Nodebox (www.nodebox.org - see potted review below).
What I'm making with it are representations of:

Amino acid sequences (780kb)... The first image above links to a pdf that compares two isolates of the Potato Virus genome. Show it to a molecular scientist - they will probably like it. It's best printed at least A3.

Base pair sequences (1.4mb)... the next image links to Visualisation of sequence data from Lane2_12E-GBS_1. Data straight from the sequencing machine! (massively subsampled).
Newer work is using this symbol coding as a baseline for amino acid information. When printed this (typically) comes out as a black line, but in a pdf the base pairs can be seen and offer an additional layer of information.

Concept artwork for trans membrane proteins (130kb)... one of a number of things on the wish list that get played with occasionally - this is a bit of a fail in a practical sense, but it reminds me Nodebox can be creative as well as practical.


The most useful resource here is the dataset that I use to drive the amino acid representations. It surprised me that I could find nothing around already so I've made up my own.

Amino-acids-csv-100px (2K)

Visualisations driven by this dataset colour code by chemical grouping, allow quick evaluation of charge and show the individual mass of each amino acid.
Fields for RGB values have been included to offer consistency. Click on the picture to download a simple .csv spreadsheet - I hope it's helpful.
This representation works for scientists, it's much prettier than using letters and I get lots of good feedback. There is a Nodebox project that uses this dataset below.

Potted review of Nodebox
Nodebox works by linking a series of operations together to get a result. It has quite a steep learning curve, but it's immense power becomes obvious quite quickly. It's the coolest piece of software I've used in ages.
Downsides: Things can become very complex very quickly. Text handling isn't great.
Upsides: Everything you make is a template that can be infinitely tweaked, shared, and re-used (with different data). Often it's immensely satisfying and occasionally it's fun.
With Nodebox, the data is usually easy... it's often the metadata (scales and legends) that adds complexity.
Hints: Figure out Zip Maps - they link your shapes and your data. If you are giving lots of nodes the same value there's probably an easier way of doing it.

I'm still learning Nodebox, and I suspect many of the ways I do things are inelegant, so I am offering only one Nodebox project. The project is the basis of the amino acid representation shown above.
The (50kb) zip file below contains Amino acids.csv (dataset), Amino acids.ndbx (project), and Amino acids.pdf (output). I've added a 'no value (X)' symbol to the dataset which is of use for gene alignment.

Amino-acids-300px (2K)

My hope is that we can make better progress in making data rewarding to the eye. My representations must try to make patterns (or differences) as clear as possible. People are very good at pattern recognition and we get better with practice - I believe we can see possibilities computers can't.
Compliments, criticism, and ambivalance have all been helpful - thank you . There are so many fields of science that are crying out for good representations - a short to-do list for me still includes: triplet encoding (codons), methylation and some (I've forgotten what) characteristic of an amino acid that helps you assume structural shape (which is definitely not a straight line).
You can do what you like with anything on this page.