I have an .xlsx document containing some data and measurements that is within 3 columns (ImageNumber, ObjectNumber, Intensity) The problem is, that this data is all in just one column as it is shown here:
263 2 347.92942202463746
264 2 340.47059811465442
265 2 626.37256725877523
266 2 352.60785254277289
267 2 1161.9843464940786
268 2 353.31373599730432
269 2 1164.090231411159
270 2 639.38041111640632
271 2 365.32550028897822
272 2 373.7215790450573
273 2 611.34119655750692
274 2 342.07451999932528
275 2 343.72550031356514
276 2 602.51766622252762
277 2 335.52942184358835
278 2 347.39216740056872
279 2 349.49412823654711
280 2 365.96079528704286
281 2 603.77256822399795
282 2 403.58432539924979
283 2 633.00001835078001
284 2 390.50589356571436
285 2 387.1451101154089
1 3 94.176473506726325
2 3 79.400002629496157
3 3 331.84314792603254
4 3 1152.6863025426865
5 3 1186.6627745330334
6 3 470.21962223947048
7 3 513.58432994037867
8 3 501.98040856420994
9 3 497.15687815099955
10 3 440.45099299959838
11 3 442.16471740975976
12 3 1270.5686648786068
13 3 1296.145133793354
14 3 592.69021038152277
15 3 1247.9529772102833
16 3 1304.1843515634537
17 3 1317.5176855623722
18 3 566.2706073410809
19 3 555.8470722027123
20 3 953.59217982552946
21 3 445.65883476100862
22 3 438.89020615816116
23 3 1410.3608229905367
24 3 426.01569781638682
25 3 1424.8588645160198
26 3 1416.5294532775879
27 3 1253.4470970630646
28 3 422.10197120346129
29 3 1272.7372958958149
30 3 498.68629035539925
31 3 464.75687384977937
32 3 374.47452012635767
33 3 402.48628707416356
34 3 508.00393660180271
35 3 405.66275736689568
36 3 498.54511260986328
This is just a very small extract of my measurements. However, to be able to analyse this data, I need it in the following format, where the different data sets are beside each other:
ImageNumber ObjectNumber Intensity ImageNumber ObjectNumber Intensity ImageNumber ObjectNumber Intensity ImageNumber ObjectNumber Intensity
1 1 2385.494163364172 1 2 30.200001035351306 1 3 522.71766421943903 1 4 1057.6157233268023
2 1 479.47844552993774 2 2 28.882353894878179 2 3 1007.6078772544861 2 4 461.65491861104965
3 1 391.68236282467842 3 2 27.615687115117908 3 3 907.86276851594448 3 4 416.80001404881477
4 1 25.168628226500005 4 2 762.15687701106071 4 3 360.51765584945679 4 4 745.08237412944436
5 1 32.286275403108448 5 2 735.21570462733507 5 3 370.90589284710586 5 4 95.643139901570976
6 1 29.819608668331057 6 2 680.78825412690639 6 3 357.29804827086627 6 4 91.490198460407555
7 1 63.627452164888382 7 2 746.64315531402826 7 3 441.45099106151611 7 4 131.12157137878239
8 1 57.643138359300792 8 2 391.56863641180098 8 3 550.72550706192851 8 4 805.54511855356395
9 1 54.403922646306455 9 2 386.09804813098162 9 3 339.52549867797643 9 4 506.53334950841963
10 1 485.22354453988373 10 2 574.22747141867876 10 3 1228.5804251618683 10 4 1256.5176827311516
11 1 382.1568714408204 11 2 545.84315247740597 11 3 1212.9255175394937 11 4 1363.251015804708
12 1 396.752954300493 12 2 571.52942893654108 12 3 377.77256011217833 12 4 729.58433585613966
13 1 1283.8667007293552 13 2 542.1764866374433 13 3 706.82747261226177 13 4 648.21178455650806
14 1 430.46275778114796 14 2 909.63139714486897 14 3 451.46668002009392 14 4 1028.8941485583782
15 1 602.85491912066936 15 2 347.98432378470898 15 3 396.26667900010943 15 4 406.35295270755887
16 1 421.81961948797107 16 2 837.12943513691425 16 3 478.32942511886358 16 4 1038.5725800022483
17 1 405.13334396108985 17 2 747.52551138773561 17 3 446.17256097495556 17 4 885.80394879728556
18 1 324.09020387381315 18 2 653.02354798838496 18 3 835.43531934171915 18 4 407.7647173628211
19 1 336.67843942344189 19 2 804.93727961182594 19 3 429.20393324270844 19 4 291.43530296906829
20 1 741.53335233777761 20 2 366.6039296798408 20 3 732.14904120564461 20 4 394.81569704227149
21 1 338.82745894789696 21 2 1345.1961118653417 21 3 519.3960902877152 21 4 564.89413283765316
22 1 415.46667850390077 22 2 837.20394962280989 22 3 395.91765884310007 22 4 224.29020238853991
23 1 362.44314773753285 23 2 787.94120307266712 23 3 391.5568740144372 23 4 1794.8980749752373
24 1 789.72158995270729 24 2 374.35295177251101 24 3 708.94512075185776 24 4 381.40393186733127
25 1 386.32942296564579 25 2 687.25100283324718 25 3 373.17255918681622 25 4 430.47844344004989
26 1 319.23530425131321 26 2 564.95687813684344 26 3 429.85099240392447 26 4 289.76079219020903
27 1 312.13726452738047 27 2 480.74903298169374 27 3 440.44314985722303 27 4 373.62746067903936
28 1 630.78825259953737 28 2 470.48236661218107 28 3 288.50981164351106 28 4 270.01961551234126
29 1 340.45099052786827 29 2 648.56472269445658 29 3 427.27059957757592 29 4 1008.9764956980944
As shown in this figure, all the data sets having the same "ObjectNumber" should be in a separate column together with the corresponding value for "Intensity" and "ImageNumber". As there are sometimes more than 100 data sets (with several hundred data points), copying manually one set besides the other is no option, as it takes a lot of time.
I have already solved some other problems regarding data management or alignment by the use of the R package "tidyverse" or "reshape". However, this time I have absolutely no idea how I can solve this problem.
I would be very grateful, if you could help me with this issue.
I've used a function like this in some reports:
wrap_frame <- function(x, nr, nc, rownames = NULL, byrow = FALSE, sep = "_", unique_names = TRUE) {
if (!xor(missing(nr), missing(nc))) stop("specify exactly one of 'nr' or 'nc'")
has_rownames <- isTRUE(is.character(attr(x, "row.names")))
if (is.null(rownames)) {
if (missing(rownames) && has_rownames) warning("wrap_frame: row names discarded", call. = FALSE)
} else {
x <- cbind.data.frame(list(row.names(x)), x)
colnames(x)[1] <- rownames
}
if (missing(nr)) {
nr <- ceiling(nrow(x) / nc)
ind <- c(rep(seq_len(nc), times = nrow(x) %/% nc),
head(seq_len(nc), n = nrow(x) %% nc))
} else {
nc <- ceiling(nrow(x) / nr)
ind <- c(rep(seq_len(nrow(x) %/% nr), times = nr),
rep(nc, nrow(x) %% nr))
}
if (!byrow) ind <- sort(ind)
lst <- split(x, ind)
lst <- lapply(lst, lapply, `length<-`, nrow(lst[[1]]))
cnames <-
if (unique_names) {
paste(rep(colnames(x), times = nc), rep(seq_len(nc), each = ncol(x)), sep = sep)
} else {
rep(colnames(x), times = nc)
}
out <- do.call("cbind.data.frame", lst)
colnames(out) <- cnames
out
}
Some sample data (it's faster this way, sorry I'm not using yours even though you did improve its format!):
mt <- mtcars[1:3]
And some sample calls, fixed number of rows first:
wrap_frame(mt, nr = 10)
# Warning: wrap_frame: row names discarded
# mpg_1 cyl_1 disp_1 mpg_2 cyl_2 disp_2 mpg_3 cyl_3 disp_3 mpg_4 cyl_4 disp_4
# 1 21.0 6 160.0 17.8 6 167.6 21.5 4 120.1 15.0 8 301
# 2 21.0 6 160.0 16.4 8 275.8 15.5 8 318.0 21.4 4 121
# 3 22.8 4 108.0 17.3 8 275.8 15.2 8 304.0 NA NA NA
# 4 21.4 6 258.0 15.2 8 275.8 13.3 8 350.0 NA NA NA
# 5 18.7 8 360.0 10.4 8 472.0 19.2 8 400.0 NA NA NA
# 6 18.1 6 225.0 10.4 8 460.0 27.3 4 79.0 NA NA NA
# 7 14.3 8 360.0 14.7 8 440.0 26.0 4 120.3 NA NA NA
# 8 24.4 4 146.7 32.4 4 78.7 30.4 4 95.1 NA NA NA
# 9 22.8 4 140.8 30.4 4 75.7 15.8 8 351.0 NA NA NA
# 10 19.2 6 167.6 33.9 4 71.1 19.7 6 145.0 NA NA NA
wrap_frame(mt, nr = 10, rownames = NULL) # to silence the warning
Fixed number of columns:
wrap_frame(mt, nc = 7, rownames = NULL)
# mpg_1 cyl_1 disp_1 mpg_2 cyl_2 disp_2 mpg_3 cyl_3 disp_3 mpg_4 cyl_4 disp_4 mpg_5 cyl_5 disp_5 mpg_6 cyl_6 disp_6 mpg_7 cyl_7 disp_7
# 1 21.0 6 160 18.1 6 225.0 17.8 6 167.6 10.4 8 460.0 21.5 4 120.1 19.2 8 400.0 15.8 8 351
# 2 21.0 6 160 14.3 8 360.0 16.4 8 275.8 14.7 8 440.0 15.5 8 318.0 27.3 4 79.0 19.7 6 145
# 3 22.8 4 108 24.4 4 146.7 17.3 8 275.8 32.4 4 78.7 15.2 8 304.0 26.0 4 120.3 15.0 8 301
# 4 21.4 6 258 22.8 4 140.8 15.2 8 275.8 30.4 4 75.7 13.3 8 350.0 30.4 4 95.1 21.4 4 121
# 5 18.7 8 360 19.2 6 167.6 10.4 8 472.0 33.9 4 71.1 NA NA NA NA NA NA NA NA NA
Both of those examples demonstrate some padding: in the first, the number of rows is fixed so the last column is a bit sparse, but only one column has NA
values; in the second, the number of columns is fixed, so the last row is half-empty, but only one row has NA
values. Both are "balanced" in a sense.
Same thing, but this time go by-row, meaning the first nc
rows of x
are spread across the first row of the output:
wrap_frame(mt, nc = 7, byrow = TRUE, rownames = NULL)
# mpg_1 cyl_1 disp_1 mpg_2 cyl_2 disp_2 mpg_3 cyl_3 disp_3 mpg_4 cyl_4 disp_4 mpg_5 cyl_5 disp_5 mpg_6 cyl_6 disp_6 mpg_7 cyl_7 disp_7
# 1 21.0 6 160.0 21.0 6 160.0 22.8 4 108.0 21.4 6 258.0 18.7 8 360.0 18.1 6 225.0 14.3 8 360.0
# 2 24.4 4 146.7 22.8 4 140.8 19.2 6 167.6 17.8 6 167.6 16.4 8 275.8 17.3 8 275.8 15.2 8 275.8
# 3 10.4 8 472.0 10.4 8 460.0 14.7 8 440.0 32.4 4 78.7 30.4 4 75.7 33.9 4 71.1 21.5 4 120.1
# 4 15.5 8 318.0 15.2 8 304.0 13.3 8 350.0 19.2 8 400.0 27.3 4 79.0 26.0 4 120.3 30.4 4 95.1
# 5 15.8 8 351.0 19.7 6 145.0 15.0 8 301.0 21.4 4 121.0 NA NA NA NA NA NA NA NA NA
And we can make it more aesthetically "happy" by keeping the column names unchanged:
wrap_frame(mt, nc = 3, rownames = "", unique_names = FALSE)
# mpg cyl disp mpg cyl disp mpg cyl disp
# 1 Mazda RX4 21.0 6 160.0 Merc 450SE 16.4 8 275.8 AMC Javelin 15.2 8 304.0
# 2 Mazda RX4 Wag 21.0 6 160.0 Merc 450SL 17.3 8 275.8 Camaro Z28 13.3 8 350.0
# 3 Datsun 710 22.8 4 108.0 Merc 450SLC 15.2 8 275.8 Pontiac Firebird 19.2 8 400.0
# 4 Hornet 4 Drive 21.4 6 258.0 Cadillac Fleetwood 10.4 8 472.0 Fiat X1-9 27.3 4 79.0
# 5 Hornet Sportabout 18.7 8 360.0 Lincoln Continental 10.4 8 460.0 Porsche 914-2 26.0 4 120.3
# 6 Valiant 18.1 6 225.0 Chrysler Imperial 14.7 8 440.0 Lotus Europa 30.4 4 95.1
# 7 Duster 360 14.3 8 360.0 Fiat 128 32.4 4 78.7 Ford Pantera L 15.8 8 351.0
# 8 Merc 240D 24.4 4 146.7 Honda Civic 30.4 4 75.7 Ferrari Dino 19.7 6 145.0
# 9 Merc 230 22.8 4 140.8 Toyota Corolla 33.9 4 71.1 Maserati Bora 15.0 8 301.0
# 10 Merc 280 19.2 6 167.6 Toyota Corona 21.5 4 120.1 Volvo 142E 21.4 4 121.0
# 11 Merc 280C 17.8 6 167.6 Dodge Challenger 15.5 8 318.0 <NA> NA NA NA
Note: the NA
are required by R since a data.frame
must be rectangular. If I inserted blanks, then all numbers would be converted to character
, not what you want/need (I suspect). However, when exported, you typically have the option to declare how an NA
is represented in the output, with things like:
write.table(..., na="")
readr::write_csv(..., na="")
options(knitr.kable.NA=""); knitr::kable(...)
I've added this as a gist: https://gist.github.com/r2evans/f99f77d253cfbf6431db575f0bf2a7ea