Search code examples
rgrid-layoutphylogeny

Find coalescent times of cells from the history of a simulated grid


I am writing a simulation which consists of an n x n grid of cells. At different times in the simulation, a cell is drawn at random to 'divide'. When a cell divides, it dies and creates two daughter cells. One daughter replaces the original cell and the other daughter replaces one of it's 8 neighbors on the grid at random.

The grid is encoded by a dataframe with n^2 rows in the beginning, one row for each cell (each cell has birth_time=0, death_time=50 and parent=0 at the start). As the simulation proceeds, two rows, representing the daughter cells, are added for each division event and the death times of the parent (and the neighbor precursor) are updated. The daughters get assigned birth_time!=0, death_time=50 and a parent (see below for examples).

After the simulation has run for some designated period of time (50 in the examples below) I take a sample of cells that all have the same x-coordinate. For those cells, I'd like to use the historical information coded in my grid-dataframe to find their coalescent times, that is the death time of all cells that are ancestors of two or more cells in the final sample. I am looking for a function to accomplish this in R (or help constructing an algorithm I could code in R myself).

Below are three examples which I hope will make my requirements clear:

Test1:

> grid1
   cellID x_coordinate y_coordinate onEdge parent birth_time death_time
1        1            1            1      1      0          0         50
2        2            2            1      1      0          0         50
3        3            3            1      1      0          0          2
4        4            4            1      1      0          0         50
5        5            5            1      1      0          0         50
6        6            1            2      1      0          0         50
7        7            2            2      0      0          0         50
8        8            3            2      0      0          0          2
9        9            4            2      0      0          0         50
10      10            5            2      1      0          0         50
11      11            1            3      1      0          0         50
12      12            2            3      0      0          0         50
13      13            3            3      0      0          0         12
14      14            4            3      0      0          0         50
15      15            5            3      1      0          0         50
16      16            1            4      1      0          0         50
17      17            2            4      0      0          0         50
18      18            3            4      0      0          0         21
19      19            4            4      0      0          0         50
20      20            5            4      1      0          0         50
21      21            1            5      1      0          0         50
22      22            2            5      1      0          0         50
23      23            3            5      1      0          0         50
24      24            4            5      1      0          0         50
25      25            5            5      1      0          0         50
26      26            3            2      0      8          2         12
27      27            3            1      1      8          2         50
28      28            3            2      0     26         12         33
29      29            3            3      0     26         12         21
30      30            3            3      0     29         21         33
31      31            3            4      0     29         21         45
32      32            3            3      0     30         33         45
33      33            3            2      0     30         33         50
34      34            3            4      0     31         45         50
35      35            3            3      0     31         45         50

I sample the crypts that exist at the end time (50) and have x-coordinate=3. Note that although I sample all 5 crypts in this test case, a subset will be sampled in the actual simulation.

> sample1
   cellID x_coordinate y_coordinate onEdge parent birth_time death_time
23      23            3            5      1      0          0         50
27      27            3            1      1      8          2         50
33      33            3            2      0     30         33         50
34      34            3            4      0     31         45         50
35      35            3            3      0     31         45         50

In this example, the cell at (3,5) is unrelated to the others (except by a pseudo-parent node of all cells (0). The other four cells are all related and there are 3 division events that are informative to the phylogeny. These are:

> res1
  cellID x_coordinate y_coordinate onEdge parent birth_time death_time
1       8            3            2      0      0          0          2
3      29            3            3      0     26         12         21
5      31            3            4      0     29         21         45

The tree below shows the relationship I'm trying to capture enter image description here

Here are two other examples: Test2:

> grid2
   cellID x_coordinate y_coordinate onEdge parent birth_time death_time
1        1            1            1      1      0          0         50
2        2            2            1      1      0          0          2
3        3            3            1      1      0          0         50
4        4            4            1      1      0          0         45
5        5            5            1      1      0          0         50
6        6            1            2      1      0          0         50
7        7            2            2      0      0          0          2
8        8            3            2      0      0          0         45
9        9            4            2      0      0          0         21
10      10            5            2      1      0          0         21
11      11            1            3      1      0          0         50
12      12            2            3      0      0          0         50
13      13            3            3      0      0          0         33
14      14            4            3      0      0          0         50
15      15            5            3      1      0          0         50
16      16            1            4      1      0          0         50
17      17            2            4      0      0          0         33
18      18            3            4      0      0          0         12
19      19            4            4      0      0          0         50
20      20            5            4      1      0          0         50
21      21            1            5      1      0          0         50
22      22            2            5      1      0          0         50
23      23            3            5      1      0          0         50
24      24            4            5      1      0          0         12
25      25            5            5      1      0          0         50
26      26            2            2      0      7          2         50
27      27            2            1      1      7          2         50
28      28            3            4      0     18         12         50
29      29            4            5      1     18         12         50
30      30            4            2      0      9         21         50
31      31            5            2      1      9         21         50
32      32            2            4      0     17         33         50
33      33            3            3      0     17         33         50
34      34            3            2      0      8         45         50
35      35            4            1      1      8         45         50

> sample2
   cellID x_coordinate y_coordinate onEdge parent birth_time death_time
3        3            3            1      1      0          0         50
23      23            3            5      1      0          0         50
28      28            3            4      0     18         12         50
33      33            3            3      0     17         33         50
34      34            3            2      0      8         45         50

The cells in sample2 are completely un-related (their most-recent common ancestor is the 0 pseudo-node). The function should return nothing (or just the time 0).

Test3:

> grid3
   cellID x_coordinate y_coordinate onEdge parent birth_time death_time
1        1            1            1      1      0          0         50
2        2            2            1      1      0          0         50
3        3            3            1      1      0          0         50
4        4            4            1      1      0          0         50
5        5            5            1      1      0          0         50
6        6            1            2      1      0          0         50
7        7            2            2      0      0          0         31
8        8            3            2      0      0          0         34
9        9            4            2      0      0          0         37
10      10            5            2      1      0          0         50
11      11            1            3      1      0          0         50
12      12            2            3      0      0          0         22
13      13            3            3      0      0          0          8
14      14            4            3      0      0          0          8
15      15            5            3      1      0          0          6
16      16            1            4      1      0          0         50
17      17            2            4      0      0          0          2
18      18            3            4      0      0          0          2
19      19            4            4      0      0          0          3
20      20            5            4      1      0          0         50
21      21            1            5      1      0          0         50
22      22            2            5      1      0          0         50
23      23            3            5      1      0          0         50
24      24            4            5      1      0          0         50
25      25            5            5      1      0          0         50
26      26            2            4      0     17          2         50
27      27            3            4      0     17          2          3
28      28            3            4      0     27          3         45
29      29            4            4      0     27          3          6
30      30            4            4      0     29          6         50
31      31            5            3      1     29          6         50
32      32            4            3      0     14          8         50
33      33            3            3      0     14          8         22
34      34            3            3      0     33         22         45
35      35            2            3      0     33         22         31
36      36            2            3      0     35         31         50
37      37            2            2      0     35         31         34
38      38            2            2      0     37         34         50
39      39            3            2      0     37         34         37
40      40            3            2      0     39         37         49
41      41            4            2      0     39         37         50
42      42            3            3      0     34         45         49
43      43            3            4      0     34         45         50
44      44            3            3      0     42         49         50
45      45            3            2      0     42         49         50

> sample3 <- subset(grid3, x_coordinate==3 & death_time==50)
> sample3
   cellID x_coordinate y_coordinate onEdge parent birth_time death_time
3        3            3            1      1      0          0         50
23      23            3            5      1      0          0         50
43      43            3            4      0     34         45         50
44      44            3            3      0     42         49         50
45      45            3            2      0     42         49         50

This grid has many events overlapping the x-coordinate 3, but only two are informative:

> res3
  cellID x_coordinate y_coordinate onEdge parent birth_time death_time
1      42            3            3      0     34         45         49
2      34            3            3      0     33         22         45

Should anyone find it helpful, here's my semi-crude drawing of the state of each grid at each time point (ignore the top two rows): enter image description here

Thank you very much for your help!


Solution

  • For future readers, I realize that the question is rather complicated and difficult to understand. I apologize for this.

    I ended up solving my problem by adding to the grid dataframes an attribute 'pathString' on the form 0/1, 0/1/27 etc where for each cell the cell stores all of it's ancestors as well as 'itself'.

    I could then use the as.Node() function in the data.tree package in R to convert my grid into a tree object which can subsequently be converted into phylo object with the as.phylo() function in ape. Once the sampled cells are stored as a tree, the existing functions in ape and ggtree make the rest easy.

    Please see data(acme) of the data.tree package and the #Tree example here: https://rdrr.io/cran/data.tree/man/as.Node.data.frame.html