Revision history for ArrayPipe (use the Refresh button to see the latest version) revision 1.7: date: 2005/04/30 07:54:43 - changes: * changed swissprot file to uniprot file * added variables in preparation for DB backend * changed default link for program to 'localhost' (was www.pathogenomics.ca) * updated weblink for BIND data * changed name of RATIO_DIFF_TERM to 'r.o.r.' (from RATIO_DIFF) * specified minimum width of 450 pixels for chip visualizations * added function to calculate variation between replicates (within slide, between pairs, technical and biological replicates) * added function to merge pairs * activated function for calculating values after skipping of outliers (SOV) * made sure that functions request at least spot id term to be read in from data file * MA plot pictures can be made available as PDFs * for the array annotation a file can be uploaded * flag markers automatically flags spots that occur more than a specified number of times (defaults to 10) * changed orientation option for chip visualization from 'portrat' and 'landscape' to 'original' and 'rotated 90 ccw' * replaced technical replicate with the idea of pairs, which allows to match up dye-swaps, for example * separate routine for adding uniprot information - bug fixes: * flagged spots were scaled differently in Z-score output revision 1.6.1: date: Fri Mar 4 14:26:17 PST 2005 - bug fixes: * program hung when run on localhost * stop of action due to high system load was not reported on output page * if no directory for saved sessions was specified, files in /tmp were listed revision 1.6: date: Tue Feb 22 10:43:20 PST 2005 - changes: * it is now possible to specify an alias for the input file(s); this allows to influence the order in which output is presented and also provides more meaningful descriptions in case of two-part data files, as in Imagene output * added forcing of TMEV generation for data that is only available as ratio or fold-change (the intensities are artificially created) * allowing regular expressions for flags, e.g. /^(-)/ for spot names starting with '(-)' (negative controls) * added more recognizers for spot name column and intensity columns (SpotName, gMeanSignal, rMeanSignal, gBGUsed, rBGUsed) * added one more field for list overlay * changed ending of archive with TMEV files from .tgz to .tar.gz for better indication of the nature of the file * replaced 'Probe ID' with variable $PROBE_ID_TERM * read probe ids from data column $PROBE_ID_TERM if available * added splitter for barplot/histogram (linearize multiple values in a row) * expanded histogram borders by 20% of range in each direction * increased number of filter fields to 8 * added index update when using the spreadsheet * added skipping of lists * set default method for merging slides to median (the weighted mean will confuse as long as the t-tests are not adjusted to take weights into account!) * moved location of external programs into configuration file * added more explanations in ANOVA output * stricter regulation of channel annotation (so far it didn't matter if CH1 and CH2 were swapped around) * added annotation support for OCI H19kv6 files * made link to uniprot annotation file configurable * made path to annotation files configurable * changed 'public_username' to 'default_username' * changed parameter configuration mechanism to a more flexible hash structure * added system error messages to STDERR output * added mechanism to display current news and information * cleaned up error and information messages, now displayed when 'verbose' and/or 'debug' is set * avoid second call in background when BATCH processing from command line * enabled reading of data files in self-defined ArrayPipe format * improved recognition of GenePix files - bug fixes: * avoid multiple occurrences of the same column in output * avoid empty ARRAY... column in output * spot id didn't come up if no annotation was available * older versions didn't incorporate file-specs * spreadsheet output files were overwritten * sometimes log-taking of 0 was attempted * give full permissions to new user directory * changed invalid values for TMEV from '1' to '0' * skip values of '0' when duplicate spots are merged * flag spots as 'undefined' if one channel has zero foreground intensity * too many dollar signes in $$cluster_nodes * use new file-handle for reading of annotation file (some other file-handle must be open and this led to the first line being skipped) * load in new TMEV format * dealing with duplicate headers failed for Windows files * empty trailing fields in web-based spreadsheet didn't show up properly * unloading of lists didn't work * flags from lists hid flags from data file * report error properly if second file for an ImaGene-type data set wasn't found * avoided browser caching problem that led to loading of the same spreadsheet again and again * merging of duplicates skipped spotes with negative log-ratio * name of merged output file wasn't reported * wrong values were returned for t-test where mean == pop_mean and stderr == 0 * endless loop with certain types of GenePix input * number of rows and columns were changed for one array type (12,4,17,16 layout) revision 1.4: date: 2004/10/13 21:17:00 - changes: * changed RI plot to MA plot - this seems to be the more commonly used plot * set default for Welch t-test (within group) to work on intensities (if this was set to ratios it would act exactly like a Student t-test) * enable reading of data and annotation from previously saved ArrayPipe files (allows saving of normalized data and channel annotation) * added lines indicating 2-fold change to MA and scatter plots * adjust size of text boxes to channel annotation * made long listing of spot info for print-tip box plot optional (switched off by default) * set default for print-tip box plots to 'ratios' (not 'auto' which sometimes prints both channels beside each other) * changed 'Set cutoffs' to 'Set cutoffs (data shift)' to make the name it bit more explanatory * added loess functions (printtip and global) from Bioconductor's limma package, which is faster and more robust (but yields the same results as the loess from the marrayNorm package) * added function that allows changing one or more flags to another * added loading capability of TMEV files (e.g. from TIGR's Spotfinder) * avoid excessive skipping of spots if one or more values are missing but there is still sufficient data for t-tests available * added weighting averaging (and made it default) for merging of replicates * added weights used in averaging as an output column * list merging of technical replicates before merging of all replicates in the module list (more logical order this way) * report the number of valid entries that go into p-value calculation (within groups only) - bug fixes: * overlay list with only one column didn't work for windows files * t-test between groups allowed too many NA's * round numbers in scientific notation as well (spreadsheet) * rename headers with multiple occurences in spreadsheet output * set flagging information for 'a' (absent) to 'automatic' and handle multiple instances properly * set value of flagged/undefined entries in MEV output file to '0', which will make spot show up as a grey box if present in both channels * set list size for some file selection boxes * prevented over-flagging in normalization * fixed problem with colouring flagged spots when multiple MA-plots follow each other * the function 'Signal box plot (printTip)' calculated log-transferred ratios the wrong way: instead of sub-tracting the log-transferred intensities, these were accidentally divided. This resulted in wrong box plots and in some cases also in wrong ratios in the results output. * empty fields at end of input row were skipped and caused warning (insert empty elements instead) * name of merged output file wasn't reported * files with special characters such as brackets caused problems in spreadsheet functionality * y-coordinates of quartiles in MA plots were sometimes wrong * rounding error sometimes caused problems with calculation of standard deviation * fixed problem with merging of files (attempt of taking log of negative values that are log-transferred already) revision 1.2 date: Tue Jul 27 16:15:05 PDT 2004 - changes: * added Student, Wilcox and Welch for tests between groups * distinction between files with channel label 'C|T1' and 'C|T2' (beforehand the control channels needed to be different) * default channel labelling is now 'C' and 'T' (instead of 'T1') * new 'Extra tool' in spreadsheet: add column calculated from existing ones * (e.g. to calculate fold-change between two conditions) * removed 'Save List' feature from spreadsheet (it was a bit confusing and not that useful) * included counter 'i' if probe coordinates were selected - bug fixes: * links to MEV archive corrected * inclusion of p-anova in merged files, * corrected missing entries in spreadsheet if the first of a set of files doesn't contain values for a spot * fixed bug that messed up spreadsheet output when TIGR format has been * selected for output as well revision 1.0 date: 2004/07/07 19:26:20; author: khokamp; state: Exp; lines: +6370 -1960 - added columns with gene names and gene descriptions to the front of the table - add ratio column - added spot and probe ids to hidden fields in spreadsheet output - added significance values to simple spreadsheet output - added inverse pattern match for first filter field (because 'BUTNOT' connector is not available for that one) - added list upload for extra filtering in spreadsheet - upgraded flagging of flawed duplicates: allows to flag spots with more than x-fold standard deviation from median absolute or fold difference - fixed sorting bug in spreadsheet (if values in scientific notation were encountered the sorting mode changed erroneously to alphabetical instead of numeric al sorting) - change header of file if multiple columns with the same header are found - keep name of output files slightly simpler - avoid ugly empty cells in the last column of a spreadsheet table - added a box for the hidden columns in a spreadsheet - fixed bug that didn't report file problems if all files were faulty - add name of file to spreadsheet title if only one file is shown - avoid long endings to file names in spreadsheet - changed default of number of sample output lines to 0, in which case a link to a file with 40 sample lines is provided - added defaults for p-values to the p- values - changed default of set cutoffs from global to individual shift - fixed size of new list in spreadsheet output - remove temporary files if $clean_up_files is set - exclude parameters from output that have not been used - skip writing complete output to huge file - remove skip_empty option in output module, because it is buggy and probably not being used - reduced time of merging process for spreadsheet - had to put step from ArrayPipe to Spreadsheet to the background because it could take longer than the 5 minute time-out threshold if many large files are to be dealt with - added 'ignore sign' box to filter to work on absolute numbers - changed expire from +1y to -1s for each header to avoid caching of pages - fixed bug with storing modified spot id (number in brackets need to be cut off so that flags are recognized) - fixed bug with annotation of JB 22x22 slides (spot id was processed too early and probe id wasn't accessible anymore) - fixed bug with extra columns in TIGR annotation file - deletion of uploaded archives after extracting of files - gave merged ratios priority over individual ratios in filter-by-value - adjustments for new ProbeLynx headers - added merging of background values when merging duplicated spots - started work on hiding settings - started work on moving output into separate pages ---------------------------- revision 0.96 date: 2004/05/26 19:29:31; author: khokamp; state: Exp; lines: +150 -16 - fixed merging of replicates: so far the intensity values have been merged and ratios were taken afterwards; now the ratios of a merged file are calculated as the median of the ratios from the individual files; this should give more accurate values if files have not been normalized globally. ---------------------------- revision 0.95 date: 2004/05/26 17:37:25; author: khokamp; state: Exp; lines: +65 -46 - In the previous versions the values in the fold-change column were sometimes derived from the raw foreground intensities, instead of normalized and merged values. This has been fixed now. ---------------------------- revision 0.94 date: 2004/05/21 01:17:54; author: khokamp; state: Exp; lines: +23 -12 - added markers for JB 21K slide - added gzip'ed tar archive for MEV files ---------------------------- revision 0.93 date: 2004/05/20 21:00:37; author: khokamp; state: Exp; lines: +238 -53 - fold-change column has been added to the output - annotation for JackBell 21K slides added ---------------------------- revision 0.92 date: 2004/05/20 13:32:54; author: khokamp; state: Exp; lines: +832 -82 - ANOVA added - bug fix in permutation program - small bug fixes ---------------------------- revision 0.91 date: 2004/05/20 13:32:02; author: khokamp; state: Exp; - first stored version of public release