This function makes it easy to map peaks based on peak maps. It reports all peaks including missing peaks and ambiguous peaks by adding a set of information columns for each entry (is_identified, is_missing, is_ambiguous, n_matches, n_overlapping). For routine downstream data processing, this function is usually followed by iso_summarize_peak_mappings and iso_get_problematic_peak_mappings to inspect the problematic peaks and iso_remove_problematic_peak_mappings to proceed only with mapped peaks that are clearly identified. Note that without this filter, one must proceed with great caution interpreting the ambiguous peaks. Also note that if the compound column alreadty exists in dt, it will be overwritten with the new mappings from the peak maps but will issue a warning that this is happening.

iso_map_peaks(dt, peak_maps, file_id = default(file_id),
  map_id = default(map_id), compound = default(compound),
  rt = default(rt), rt_start = default(rt_start),
  rt_end = default(rt_end), rt_prefix_divider = ":",
  quiet = default(quiet))

Arguments

dt

data frame with peak data

peak_maps

data frame with the peak map(s). At minimum, this data frame must have a compound and rt column but may have additional information columns. If multiple peak maps are provided, the dt data frame requires a map_id column to identify which peak map should be used and the peak maps data frame must have a rt:<map_id> column for each used value of map_id. The names of all these columns can be changed if necessary using the compound, codert and map_id parameters.

file_id

the column(s) in dt that uniquely identify a file/set of peaks that belong together

map_id

the column in dt that indicates which map to use for which file (only necessary if multiple peak maps are used)

compound

the column in peak_maps that holds compound information

rt

the column in dt and colum prefix in peak_maps ("rt:...") that holds retention time information

rt_start

the column in dt that holds start of peak retention times

rt_end

the column in dt that holds end of peak retention times

rt_prefix_divider

the divider after the retention time column prefix in peak_maps to identify the map id values (e.g. "rt:map_id_value")

quiet

whether to display (quiet=FALSE) or silence (quiet = TRUE) information messages.

Value

data frame with mapped peaks and the following information columns:

  • peak_info: a label for the peak with its name and retention time plus indicators of any ambiguity in identification in the form of '?' for either compound name or retention time for an expected peak that was not found

  • is_identified: a logical TRUE/FALSE indicating peaks that have been successfully identified (includes missing peaks from the peak map!) (note that this information could also be derived from !is.na(compound) but is provided for convenience)

  • is_missing: a logical TRUE/FALSE indicating peaks that are in the peak map definition but have no matching peak

  • is_ambiguous: a logical TRUE/FALSE indicating peaks that are ambiguous in their definition either because they have multiple matches or because they overlap with other, overlapping peaks that were identified the same (note that this information could also be derived from n_overlapping > 1 | n_matches > 1 but is provided for convenience)

  • n_matches: the number of matches each peak has in the peak map

  • n_overlapping: the number of overlapping peaks that match the same peak definition

See also