Search code examples
mysqlnode.jstypescriptquery-performancetypeorm

TypeORM Performance problem - Loading object with many relations


TS + Node 12 + TypeORM + MySQL - AWS 2GB instance EC2 (medium I think)

The issue is simple.

For get orders API request - we load orders from the database - each order comes with more related fields from other tables - via join. Something like:

const galleries: Gallery[] = await relationshipAModel.createQueryBuilder('model')
      .leftJoinAndSelect('model.subRelationshipA', 'a')
      .leftJoinAndSelect('model.subRelationshipB', 'b')
      .whereInIds(firstResultIds)
      .getMany();

The code is from: https://github.com/typeorm/typeorm/issues/4499

Our issue is the exact same issue as linked here.

NodeJS process crashes when loading 5-10 requests - each request loads 500 orders - loading 25 / 50 / 100 orders is ok.

Memory profiling didn't help.

Order object in typescript looks like:

@Entity()
export class Order {
@PrimaryColumn({ type: 'varchar', name: 'id' })
id: string;

@OneToMany(type => OrderItem, orderItem => orderItem.order, {
    cascade: true
})
orderItems: OrderItem[];

@ManyToOne(type => ServiceLevel, serviceLevel => serviceLevel.orders, {
    cascade: true
})
serviceLevel: ServiceLevel;

@ManyToOne(type => Customer, customer => customer.orders, { cascade: true })
customer: Customer;

@JoinColumn()
@OneToOne(type => Address, { cascade: true, nullable: true })
shippingAddress: Address;

@JoinColumn()
@OneToOne(type => Address, { cascade: true, nullable: true })
billingAddress: Address;

@ManyToMany(type => OrderTag, orderTag => orderTag.orders, {
    cascade: true
})
@JoinTable()
orderTags: OrderTag[];

Left only the fields that are "suspects"

Node heap stack for 350 orders:

Nov 25 15:24:58 ip-172-31-38-186 web: {
Nov 25 15:24:58 ip-172-31-38-186 web: page: 0,
Nov 25 15:24:58 ip-172-31-38-186 web: maxResults: 350,
Nov 25 15:24:58 ip-172-31-38-186 web: filters: [ { fieldName: 'orderStatus', operator: 'NOT IN', values: [Array] } ],
Nov 25 15:24:58 ip-172-31-38-186 web: stepFilter: { fieldName: 'orderStatus', operator: 'IN', values: [ 'stepPrint' ] },
Nov 25 15:24:58 ip-172-31-38-186 web: sortFields: [ { fieldName: 'createdDate', value: 'DESC' } ]
Nov 25 15:24:58 ip-172-31-38-186 web: }
Nov 25 15:25:18 ip-172-31-38-186 web: <--- Last few GCs --->
Nov 25 15:25:18 ip-172-31-38-186 web: [20326:0x31ad300]   113205 ms: Mark-sweep 993.9 (999.1) -> 993.7 (997.4) MB, 766.6 / 0.0 ms  (+ 127.3 ms in 23 steps since start of marking, biggest step 25.9 ms, walltime since start of marking 969 ms) (average mu = 0.231, current mu = 0.082) allocation [20326:0x31ad300]   114167 ms: Mark-sweep 994.1 (997.4) -> 993.2 (997.1) MB, 881.2 / 0.0 ms  (+ 47.2 ms in 7 steps since start of marking, biggest step 26.7 ms, walltime since start of marking 963 ms) (average mu = 0.138, current mu = 0.036) allocation fa
Nov 25 15:25:18 ip-172-31-38-186 web: <--- JS stacktrace --->
Nov 25 15:25:18 ip-172-31-38-186 web: ==== JS stack trace =========================================
Nov 25 15:25:18 ip-172-31-38-186 web: 0: ExitFrame [pc: 0x13555cd]
Nov 25 15:25:18 ip-172-31-38-186 web: Security context: 0x003cbef008d1 <JSObject>
Nov 25 15:25:18 ip-172-31-38-186 web: 1: slice [0x2c5dd3dc3561] [buffer.js:~606] [pc=0xb6e02716b7c](this=0x3b080493db19 <Object map = 0x3047472a6099>,0x3b0942d12c71 <Uint8Array map = 0x3047472a55a9>,44922,45232)
Nov 25 15:25:18 ip-172-31-38-186 web: 2: _typeCast(aka typeCast) [0x20293188c3e9] [/var/app/current/node_modules/mysql/lib/protocol/packets/RowDataPacket.js:~53] [pc=0xb6e027ab692](this=0x338ac1d1b739 <RowDataPacket map =...
Nov 25 15:25:18 ip-172-31-38-186 web: FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
Nov 25 15:25:18 ip-172-31-38-186 web: 1: 0xa093f0 node::Abort() [node]
Nov 25 15:25:18 ip-172-31-38-186 web: 2: 0xa097fc node::OnFatalError(char const*, char const*) [node]
Nov 25 15:25:18 ip-172-31-38-186 web: 3: 0xb842ae v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
Nov 25 15:25:18 ip-172-31-38-186 web: 4: 0xb84629 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
Nov 25 15:25:18 ip-172-31-38-186 web: 5: 0xd30fe5  [node]
Nov 25 15:25:18 ip-172-31-38-186 web: 6: 0xd31676 v8::internal::Heap::RecomputeLimits(v8::internal::GarbageCollector) [node]
Nov 25 15:25:18 ip-172-31-38-186 web: 7: 0xd3def5 v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [node]
Nov 25 15:25:18 ip-172-31-38-186 web: 8: 0xd3eda5 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
Nov 25 15:25:18 ip-172-31-38-186 web: 9: 0xd4185c v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [nod
e]

Any help will be much apperciated!


Solution

  • If id is the PRIMARY KEY for order, then the DISTINCT is unnecessary.

    A LIMIT should have an ORDER BY.

    order needs

     INDEX(shopId, orderBatchId, id)
    

    A hundred columns for 100 rows? That seems like a lot. Rethink which columns you really need and whether you need 100 rows.

    Are there nodejs or "packet" or other networking limits? (Since 100 rows, but 500 croaks.)

    5 Unique keys implies that there could be a separate table that equates 4 of them to one. It also implies that there is probably a design error; I commented on another question that had both shipping_addr and billing_addr being UNIQUE. If I try to send something to two relatives at different addresses, I will have trouble. Especially if anyone else is using your system to send to either of those addresses.

    Even 3 UNIQUEs is probably a mistake. I know of very few valid cases for 2 UNIQUEs; the common one is for 'normalization'.

    This construct makes my head spin. I need to figure out if it does what you wanted:

    FROM a  LEFT JOIN b  INNER JOIN c
    

    If that is equivalent to this, then you don't have what you wanted:

    FROM a  LEFT JOIN (b  INNER JOIN c)
    

    This, on the other hand, is probably what you wanted:

    FROM (a  LEFT JOIN b)  INNER JOIN c
    

    This would clearer, especially since c is a subquery:

    FROM (c  JOIN a)  LEFT JOIN b
    

    Please provide EXPLAIN SELECT ... so we can see how it was evaluated.

    Meanwhile, if you can get the subquery first, my quandry is avoided. And it may help performance.

    LEFT -- Are you using LEFT JOIN because each of those things may be missing data? For example, might the shippingAddress be missing (and you are expecting NULLs)? I ask because it makes a difference to performance, readability, optimization, etc. Use LEFT when needed, else don't.