represent the settings? I do. If you have many experiments, however,
this strategy will defy comparison of the experiments, as it becomes
hard to put everything in one plot. The default settings in plotting
programs such as gnuplot or matplotlib do not have a large enough
range of colors, and if they do, it becomes hard to assign the colors
in a meaningful way for exploratory data analysis. The script here
might come to rescue.
Reading pre-processing directory names
We first transform directory names such that
test-paramA_0.1
becomestest-paramA_0000.1
, allowing easier string comparison.import Levenshtein as L import numpy as np from colormath.color_objects import LabColor import os, re def fillfunc(mo): """ returns the matched number, with padded zeros to the front """ s = mo.group(1) while(len(s)<6): s = '0'+s return s def get_unified_names(fns): n = len(fns) strs = [x for x in fns] for i in xrange(len(strs)): strs[i] = re.sub(r'(?<!\d)([\d.]+)(?!\d)',fillfunc,strs[i]) return strs
Creating the distance matrix based on Levenstein string distance
The Levenshtein string distance is a measure of how many
edits/insertions/deletions one needs to transform one string into the
other. With this we automatically derive a dissimilarity measure for
the unified directory strings:
edits/insertions/deletions one needs to transform one string into the
other. With this we automatically derive a dissimilarity measure for
the unified directory strings:
mat = np.zeros((n,n)) for i in xrange(n): for j in xrange(n): if j<=i: continue mat[i,j] = L.distance(strs[i],strs[j]) mat[j,i] = mat[i,j] return np.exp(-mat)
Embedding the dissimilarity matrix using Multidimensional Scaling (MDS) in R
The distance matrix does not necessarily directly map to coordinates
in 3D, so we need an embedding algorithm which deals with distance
matrices only.
For simplicity, we call R from python with a saved dissimilarity matrix.
def emb(mat): np.savetxt("file.dat", mat) os.system("./analyse/emb.R") res = np.loadtxt("mds.dat") return resNow to the part written in R, which only does the embedding and saves
the result in a text file:
#!/usr/bin/Rscript library(MASS) library(vegan) tab <- read.table("file.dat", header = FALSE, sep=" ") data.m <- as.matrix(tab) data.mds <- vegan::metaMDS(data.m, k=3, trymax=50) write.table(data.mds$points, file="mds.dat", quote=F, row.names=F, col.names=F)
Transforming embedded coordinates into colors
The coordinates are in some – it seems arbitrary – range and
therefore need to be mapped to colors. We want similar colors to have
similar edit distance, we therefore map our coordinates directly to
Lab Color Space and then transform the Lab color coordinates to RGB:
The resulting string can be used directly in the
of a matplotlib plot and is then best combined with picking to find
out more about a particularly interesting plotline.
therefore need to be mapped to colors. We want similar colors to have
similar edit distance, we therefore map our coordinates directly to
Lab Color Space and then transform the Lab color coordinates to RGB:
def getcolor(embedded,i): #lab = LabColor(0.8,*embedded[i,:]) lab = LabColor(*embedded[i,:]) rgb = lab.convert_to('RGB', debug=False).get_rgb_hex() return rgb
The resulting string can be used directly in the
color
specificationof a matplotlib plot and is then best combined with picking to find
out more about a particularly interesting plotline.
No comments:
Post a Comment