Graphorall

Back

    DB course

    • Algebra

      • preliminaries: relation, attribute, schema, tuple, component, domain, current instance, key

        • operations

            • set ops: union, intersect, minus
            • filter ops: select
            • relation ops: product, joins
            • mapping ops: rename, project
            • equivalence

              • RS=πL(σC(R×S))R\Join S=\pi_L(\sigma_C(R\times S))

            • extended operators

              • deduplicate δ\delta

                • aggregation

                  • grouping γ\gamma

                    • outer join (left,right, and theta)

                      • sorting τ\tau

                      • datalog

                      • Formalization

                        • FD

                          • key -> tuple and key is minimal, superkey is superset of key

                            • rules

                            • closure

                              • BAs+    AsbB\in As^+ \iff As\to b

                                • {superkey}+={all attributes}

                                  • check whether a FD follows from a given set

                                  • minimal basis

                                    • 1.All the FD’s in B have singleton right sides.
                                    • 2.If any FD is removed from B, B is no longer a basis.
                                    • 3.If for any FD in B we remove one or more attributes from the left side of F, the result is no longer a basis (why?)
                                    • projecting

                                      • minimize

                                        • remove those follow from others

                                          • remove left side redundancy

                                        • BCNF

                                          • any FD As->Bs, As is superkey

                                            • decompose: not dependency preserving

                                              • chase test for lossless join

                                              • 3NF

                                                • dependency preserving

                                                  • any FD As->Bs, As is superkey or Bs are each in a key

                                                    • decompose: find min-basis, a relation for each FD, add a key relation

                                                    • MD

                                                      • intuitively, As->->Bs holds if As determine a set of values of Bs.

                                                        • trivial: As ->-> subset of As, As->->\As(others=\emptyset)

                                                          • FD promotion: As->Bs => As->->Bs(in tabular test, b1=b2b_1=b_2)

                                                            • transitive

                                                              • complementation: As->->Bs => As->->others

                                                                • 4NF

                                                                  • any MVD As->->Bs, As is superkey

                                                                  • chase test: initially two tuples t and u, use FDs and MVDs to produce v

                                                                    • projecting

                                                                      • simplification(not reviewed)

                                                                    • E-R Model

                                                                      • many->one(less than), many->one(exactly, rounded arrowhead), many—(degree constrains)many, roles

                                                                        • multiway relations -> relation entity&multi binary relations

                                                                          • isa(subclass relationship)

                                                                            • differs from object-oriented approach, entity may be in multi sub entity sets

                                                                              • root entity set

                                                                              • remove redundant entity set: referred once in relation, all attrs are in key

                                                                                • weak entity set (and its supporting relationship)

                                                                                  • def: key is composed of attributes , some or all of which belong to another entity set

                                                                                  • convert to relational design

                                                                                    • combining many-one relationship to many side

                                                                                      • weak entity set: don’t need supporting relation

                                                                                        • classes: null method, OO method(enumerate all subtrees)

                                                                                      • Hard Problems

                                                                                        • 设有关系 R(A,B,C,D,E,P),F={AB→C,C→D,E→A,B→C,CA→E,BD→A} 1、写出关系 R 的所有候选键,并指出关系 R 为第几范式。要求说明原因。 (4 分) 2、求出该函数依赖集的最小集 Fm。写出求解过程。 (7 分) 3、将 R 分解为具有无损连接性和依赖保持性的 3NF。写出分解过程。 (5 分)

                                                                                          • Fm={B->C,C->D,E->A,AC->E,B->A}

                                                                                      • SQL lang

                                                                                        • components

                                                                                          • DDL: data definition lang, schema and constraint etc.
                                                                                          • DML: data manipulation lang, query and transaction etc.
                                                                                          • DCL: data control lang, permission and grant etc.
                                                                                          • relations: stored, view, temp tables

                                                                                            • schema

                                                                                              • CREATE TABLE/VIEW <name> (...)

                                                                                                • inline constrains after attribute

                                                                                                  • or table-wise constrain after all attrs

                                                                                                  • DROP TABLE/VIEW <name> (...)

                                                                                                    • ALTER TABLE/VIEW <name> (...)

                                                                                                    • query

                                                                                                      • SELECT <>
                                                                                                        FROM <table>[alias],...
                                                                                                        WHERE{
                                                                                                        CONDITIONS
                                                                                                        }
                                                                                                        GROUP BY ()
                                                                                                        HAVING <condition>
                                                                                                        sql
                                                                                                        • operators

                                                                                                          • not equal <>

                                                                                                            • pattern matching: _, %, '', 'pattern' ESCAPE '<escape char>'

                                                                                                              • NULL: calc result is NULL, comparison result is UNKNOWN

                                                                                                                • EXISTS, [NOT] IN, > ALL, > ANY(existence)

                                                                                                                  • CROSS JOIN, a JOIN b ON <c>, FULL|LEFT|RIGHT OUTERJOIN

                                                                                                                    • set operators normally eliminates duplicate, use a UNION ALL b to avoid

                                                                                                                  • modification

                                                                                                                  • SQL other features

                                                                                                                    • Constraint

                                                                                                                      • kinds

                                                                                                                        • default value

                                                                                                                          • key

                                                                                                                            • primary key not nullable, unique nullable
                                                                                                                            • can’t agree on all attrs, unless one is null
                                                                                                                            • two nulls don’t agree on each other
                                                                                                                            • foreign key

                                                                                                                              • policy: CASCADE, SET NULL, DEFAULT(reject)

                                                                                                                              • not null

                                                                                                                                • trivial check

                                                                                                                                  • assertion

                                                                                                                                    • CREATE ASSERTION xxx CHECK (...)

                                                                                                                                  • naming

                                                                                                                                    • defer

                                                                                                                                      • [NOT] DEFERRABLE [INITIALLY DEFERRED/IMMEDIATE]

                                                                                                                                        • SET CONSTRAINT xxx DEFERRED

                                                                                                                                        • scope

                                                                                                                                          • attribute: checked only when target attribute changes

                                                                                                                                            • tuple: checked more frequently

                                                                                                                                              • table(assertions)

                                                                                                                                            • Trigger

                                                                                                                                              • CREATE TRIGGER XXX
                                                                                                                                                AFTER|BEFORE|INSTEAD-OF UPDATE|DELETE|INSERT ON <table>
                                                                                                                                                REFERENCING xxx
                                                                                                                                                FOR EACH ROW|STATEMENT
                                                                                                                                                [WHEN <condition>]
                                                                                                                                                [BEGIN]
                                                                                                                                                statements;
                                                                                                                                                [END]
                                                                                                                                                plaintext
                                                                                                                                              • View

                                                                                                                                                • updatable view

                                                                                                                                                  • using instead of trigger

                                                                                                                                                    • materialize view

                                                                                                                                                    • Index

                                                                                                                                                      • estimate query cost (simple)

                                                                                                                                                        • tuple locate

                                                                                                                                                          • by index, 1r per index access

                                                                                                                                                            • no index, full table scan

                                                                                                                                                            • tuple access

                                                                                                                                                              • if clustered, then may be 1r

                                                                                                                                                                • else num of pages tuple is distributed

                                                                                                                                                                • tuple insert

                                                                                                                                                                  • usually zero-cost to find a slot

                                                                                                                                                                    • 1r per index update

                                                                                                                                                              • Storage

                                                                                                                                                                • disc access model

                                                                                                                                                                  • find track -> move to track -> wait for spinning -> start transferring

                                                                                                                                                                    • move to track: head start/stop + head move(n-tracks/velocity)

                                                                                                                                                                      • TODO spinning: half?avg? :LOGBOOK: CLOCK: [2022-11-02 Wed 08:31:22]—[2023-04-16 Sun 16:27:16] => 3967:55:54 :END:

                                                                                                                                                                        • transferring: size-block(sectors+gaps)/size-track * time per rotation

                                                                                                                                                                      • physical optimize

                                                                                                                                                                        • organize data by cylinders

                                                                                                                                                                          • mirroring disks

                                                                                                                                                                            • enhance reliability

                                                                                                                                                                              • speed up reading(different part per disk) not writing

                                                                                                                                                                              • scheduling

                                                                                                                                                                                • elevator algorithm

                                                                                                                                                                                • intelligent buffering

                                                                                                                                                                                • robustness

                                                                                                                                                                                  • checksum

                                                                                                                                                                                    • redundant storage(stable storage, RAID)

                                                                                                                                                                                      • TODO RAID 4(one main disk, bottleneck) 5(symmetric backup) 6(multiple ) :LOGBOOK: CLOCK: [2022-11-02 Wed 09:25:01]—[2023-04-16 Sun 16:27:21] => 3967:02:20 :END:

                                                                                                                                                                                    • storage format

                                                                                                                                                                                      • schema

                                                                                                                                                                                        • number of fields, type, order, name

                                                                                                                                                                                        • fixed-length format

                                                                                                                                                                                          • header

                                                                                                                                                                                            • pointer to schema, length, timestamp

                                                                                                                                                                                          • variable-length format

                                                                                                                                                                                            • a pointer to other than first variable field

                                                                                                                                                                                              • header->pointers->fixed-fields->var-fields

                                                                                                                                                                                                • dedup in-record by another level pointers

                                                                                                                                                                                                  • separate variable fields

                                                                                                                                                                                                    • spanned records

                                                                                                                                                                                                    • pack into blocks

                                                                                                                                                                                                      • header

                                                                                                                                                                                                        • one or more other blocks

                                                                                                                                                                                                          • role of this block(normal data or overflow)

                                                                                                                                                                                                            • which relation the records belongs to

                                                                                                                                                                                                              • record offsets

                                                                                                                                                                                                                • real offset in-block

                                                                                                                                                                                                                  • or forwarding address to another block

                                                                                                                                                                                                                    • or tombstone indicating it’s deleted

                                                                                                                                                                                                                    • timestamp

                                                                                                                                                                                                                    • two-ends growing to center

                                                                                                                                                                                                                      • variable header and record length

                                                                                                                                                                                                                  • addressing

                                                                                                                                                                                                                    • physical disk address

                                                                                                                                                                                                                      • logical address

                                                                                                                                                                                                                        • more compact

                                                                                                                                                                                                                          • tradeoff: flexibility <-> cost

                                                                                                                                                                                                                            • e.g. move records of indirection

                                                                                                                                                                                                                            • logical->phisical

                                                                                                                                                                                                                              • logical->memory

                                                                                                                                                                                                                              • pointer swizzling

                                                                                                                                                                                                                                • avoid repeatedly translating addr

                                                                                                                                                                                                                                  • strategies

                                                                                                                                                                                                                                    • automatic

                                                                                                                                                                                                                                      • on demand

                                                                                                                                                                                                                                      • unswizzling when returning to disk

                                                                                                                                                                                                                                        • need mem->logical indexing

                                                                                                                                                                                                                                      • pinned records and blocks

                                                                                                                                                                                                                                        • pinned: object is referred to by a swizzled pointer

                                                                                                                                                                                                                                          • use spare bytes(logical-memaddr) to make a linked list

                                                                                                                                                                                                                                        • modification

                                                                                                                                                                                                                                          • insertion

                                                                                                                                                                                                                                            • deletion

                                                                                                                                                                                                                                              • immediate reclaim space

                                                                                                                                                                                                                                                • mark delete

                                                                                                                                                                                                                                                • update

                                                                                                                                                                                                                                                  • fixed-length easy

                                                                                                                                                                                                                                                    • dynamic via deletion&insertion

                                                                                                                                                                                                                                                • Index

                                                                                                                                                                                                                                                  • types

                                                                                                                                                                                                                                                    • dense: one by one key-pointer pairs

                                                                                                                                                                                                                                                      • sparse: adjacent entries determine a range corresponding to keys

                                                                                                                                                                                                                                                        • multi-leveled

                                                                                                                                                                                                                                                          • secondary

                                                                                                                                                                                                                                                            • must be dense

                                                                                                                                                                                                                                                            • indirection

                                                                                                                                                                                                                                                              • reduce replicated key entries

                                                                                                                                                                                                                                                                • key->bucket of many pointers->tuple

                                                                                                                                                                                                                                                                • revert index(document retrieval)

                                                                                                                                                                                                                                                                • data structures

                                                                                                                                                                                                                                                                  • B-trees

                                                                                                                                                                                                                                                                    • rules

                                                                                                                                                                                                                                                                      • node min item
                                                                                                                                                                                                                                                                      • leave node
                                                                                                                                                                                                                                                                      • multiset case: min new key, blank key
                                                                                                                                                                                                                                                                      • insertion and deletion

                                                                                                                                                                                                                                                                        • recursively split
                                                                                                                                                                                                                                                                        • borrow from sibling node
                                                                                                                                                                                                                                                                      • hash table

                                                                                                                                                                                                                                                                        • extensible hash table

                                                                                                                                                                                                                                                                          • bucket directory

                                                                                                                                                                                                                                                                            • initially first 1 bit
                                                                                                                                                                                                                                                                            • once bucket used up, split the bucket, use k+1 bits, other buckets still remain unchanged
                                                                                                                                                                                                                                                                            • once directory length not enough, double the directory, b0/b1 points to same bucket
                                                                                                                                                                                                                                                                          • linear hash

                                                                                                                                                                                                                                                                        • multi dimensional

                                                                                                                                                                                                                                                                          • partial match, range, nearest-neighbor

                                                                                                                                                                                                                                                                            • hash like approach

                                                                                                                                                                                                                                                                              • grid file

                                                                                                                                                                                                                                                                                • dynamic insertion: overflow block, or insert grid line

                                                                                                                                                                                                                                                                                  • hard to find optimal split

                                                                                                                                                                                                                                                                                  • partitioned hashing

                                                                                                                                                                                                                                                                                    • good at partial match

                                                                                                                                                                                                                                                                                  • tree like approach

                                                                                                                                                                                                                                                                                    • kd-tree

                                                                                                                                                                                                                                                                                      • long search path
                                                                                                                                                                                                                                                                                      • R-tree

                                                                                                                                                                                                                                                                                      • bitmap approach

                                                                                                                                                                                                                                                                                        • field -> collection of bitmaps, one bitmap per value

                                                                                                                                                                                                                                                                                          • compressed bitmap

                                                                                                                                                                                                                                                                                            • run-length encoding
                                                                                                                                                                                                                                                                                        • exercise

                                                                                                                                                                                                                                                                                          • 3.1 线性表删除时如何处理位数i减少

                                                                                                                                                                                                                                                                                            • 3.2 hard h(k)的每一位都相同无法区分,若重复数多于bucksize如何split

                                                                                                                                                                                                                                                                                              • 7.5 如何编码趋近log2i\log_2i

                                                                                                                                                                                                                                                                                            • Exec

                                                                                                                                                                                                                                                                                              • Physical Ops

                                                                                                                                                                                                                                                                                                • one pass

                                                                                                                                                                                                                                                                                                  • two pass

                                                                                                                                                                                                                                                                                                    • sort

                                                                                                                                                                                                                                                                                                      • index

                                                                                                                                                                                                                                                                                                        • hash

                                                                                                                                                                                                                                                                                                        • algebra simplification

                                                                                                                                                                                                                                                                                                          • cost model

                                                                                                                                                                                                                                                                                                            • logical & physical plan

                                                                                                                                                                                                                                                                                                              • exercise

                                                                                                                                                                                                                                                                                                                • 4.3.2

                                                                                                                                                                                                                                                                                                                  • 4.3.5

                                                                                                                                                                                                                                                                                                                    • 4.4.5、4.4.8 子表少于M如何利用空闲缓冲区

                                                                                                                                                                                                                                                                                                                      • 4.5.5 考虑磁盘IO特性,计算开销

                                                                                                                                                                                                                                                                                                                    • Schedule

                                                                                                                                                                                                                                                                                                                      • serializability

                                                                                                                                                                                                                                                                                                                        • conflicts: ignore actions from same trans conflict equiv: by transforming non-conflict ops conflict serializable: conflict equiv to some serial schedule view serializable: ignore blind write

                                                                                                                                                                                                                                                                                                                          • lock

                                                                                                                                                                                                                                                                                                                            • shared-exclusive

                                                                                                                                                                                                                                                                                                                              • upgrading lock

                                                                                                                                                                                                                                                                                                                                • shared->upgrade: may lead to deadlock

                                                                                                                                                                                                                                                                                                                                  • update lock: may later write, singleton but can coexist with shared

                                                                                                                                                                                                                                                                                                                                • increment lock

                                                                                                                                                                                                                                                                                                                                  • procedure

                                                                                                                                                                                                                                                                                                                                    • naive lock -> 2PL: conflict serializable, unlock after all locks

                                                                                                                                                                                                                                                                                                                                      • lock table

                                                                                                                                                                                                                                                                                                                                        • group mode

                                                                                                                                                                                                                                                                                                                                          • lock list: waiting and holding

                                                                                                                                                                                                                                                                                                                                            • grant policy

                                                                                                                                                                                                                                                                                                                                          • granularity

                                                                                                                                                                                                                                                                                                                                            • intention lock

                                                                                                                                                                                                                                                                                                                                              • 组模式,某种锁优于另一种

                                                                                                                                                                                                                                                                                                                                              • tree protocol

                                                                                                                                                                                                                                                                                                                                                • 观察到无需对父节点修改时unlock

                                                                                                                                                                                                                                                                                                                                            • deadlock

                                                                                                                                                                                                                                                                                                                                              • timestamp&validation

                                                                                                                                                                                                                                                                                                                                                • read/write too late

                                                                                                                                                                                                                                                                                                                                                  • uncommited TX may abort -> delay until determined

                                                                                                                                                                                                                                                                                                                                                  • exercise

                                                                                                                                                                                                                                                                                                                                                    • image.png

                                                                                                                                                                                                                                                                                                                                                      • r1a,w1a,r1b,w1b,r2b,w2b,r2a,w2a r2b,w2b,r2a,w2a,r1a,w1a,r1b,w1b 注意到其对A、B的操作有交换律,(42)\binom{4}{2}

                                                                                                                                                                                                                                                                                                                                                        • Compare with 18.2.3

                                                                                                                                                                                                                                                                                                                                                        • 2.4 wn(A)...ri(A)wi(A)ri+1(A)wi+1(A)...rn(A)w_n(A)...r_i(A)w_i(A)r_{i+1}(A)w_{i+1}(A)...r_n(A)

                                                                                                                                                                                                                                                                                                                                                          • 2.6 r3c,r1a,w1c,r2a,w2b,r3b

                                                                                                                                                                                                                                                                                                                                                            • 3.1? image.png

                                                                                                                                                                                                                                                                                                                                                              • case serial: 2 case interleave: 1A,2B->1B; 2B,1A->1A; 因此只有1A2B/1B2A两组内部可交替,共(42)2=36\binom{4}{2}^2=36

                                                                                                                                                                                                                                                                                                                                                              • 4.1 会发生什么怎么说?

                                                                                                                                                                                                                                                                                                                                                                • 4.6

                                                                                                                                                                                                                                                                                                                                                                  • 4.7 所有的?更新?如何考虑锁? a. r2c; b. i2c,i1b; c. u1b; d. w1b;

                                                                                                                                                                                                                                                                                                                                                                • Transaction

                                                                                                                                                                                                                                                                                                                                                                  • failure recovery

                                                                                                                                                                                                                                                                                                                                                                    • undo

                                                                                                                                                                                                                                                                                                                                                                      • write log->write disk->commit, commit data
                                                                                                                                                                                                                                                                                                                                                                      • redo

                                                                                                                                                                                                                                                                                                                                                                        • write log->commit->write disk, commit log
                                                                                                                                                                                                                                                                                                                                                                        • checkpoint: commit data
                                                                                                                                                                                                                                                                                                                                                                        • undo/redo *

                                                                                                                                                                                                                                                                                                                                                                          • checkpoint

                                                                                                                                                                                                                                                                                                                                                                            • 避免向前追溯
                                                                                                                                                                                                                                                                                                                                                                            • start ckpt(Ti); end ckpt,若有end则Ti无需恢复
                                                                                                                                                                                                                                                                                                                                                                            • start ckpt(Ti)中的Ti似乎可以省略?
                                                                                                                                                                                                                                                                                                                                                                            • logical log

                                                                                                                                                                                                                                                                                                                                                                              • logical equiv, compensating action

                                                                                                                                                                                                                                                                                                                                                                            • recoverability

                                                                                                                                                                                                                                                                                                                                                                              • (读)依赖的值必须先commit

                                                                                                                                                                                                                                                                                                                                                                                • 日志落盘顺序需与产生顺序一致(group commit)

                                                                                                                                                                                                                                                                                                                                                                                  • cascading rollback: dirty data, ACR调度, strict locking

                                                                                                                                                                                                                                                                                                                                                                                    • rollback policy

                                                                                                                                                                                                                                                                                                                                                                                      • block: cache refresh

                                                                                                                                                                                                                                                                                                                                                                                        • smaller: using log

                                                                                                                                                                                                                                                                                                                                                                                      • exercise

                                                                                                                                                                                                                                                                                                                                                                                        • 1.1 2PL和严格封锁的区别? strict只有在事务结束时才能统一释放所有w锁

                                                                                                                                                                                                                                                                                                                                                                                          • 1.4 “恢复时产生脏读”?

                                                                                                                                                                                                                                                                                                                                                                                            • 1.5 又是数调度!

                                                                                                                                                                                                                                                                                                                                                                                              • 2.6

                                                                                                                                                                                                                                                                                                                                                                                                • 6.2.2 如何计数 归纳递推,转换为向L1,[subproblem(n1)]L1,[subproblem(n-1)]中插入O1

                                                                                                                                                                                                                                                                                                                                                                                              • Course Project

                                                                                                                                                                                                                                                                                                                                                                                                • 杨元钊、钟祎诚、邹奕杨

                                                                                                                                                                                                                                                                                                                                                                                                  • 架构: 整体为C/S架构,Server端采用MySQL数据库,Client使用Python+QtQuick开发。其中Server部分使用一台华为云ECS,根据MySQL官网指示部署了MySQL 8.0(最初使用docker部署,但性能低于常规值3个数量级,后期推测是由于ECS的虚拟化云硬盘IO延迟高)。Client部分为Qt前端+Python后端,主程序由Python调用PySide(Qt)运行QML引擎,引擎运行QML代码实现界面显示与交互,QML代码通过引擎调用后端类获取数据;后端类使用pymysql执行编写好的SQL语句与数据库交互。

                                                                                                                                                                                                                                                                                                                                                                                                    • 分工: 杨元钊负责数据的爬取清洗及数据库Schema的设计、创建; 邹奕杨负责客户端后端类的编写、SQL语句的编写及部分语句优化; 钟祎诚负责服务端的部署、客户端整体和UI部分的编写、SQL优化。

                                                                                                                                                                                                                                                                                                                                                                                                      • DBProject.drawio_1675502686622_0.png

                                                                                                                                                                                                                                                                                                                                                                                                      • PostgreSQL

                                                                                                                                                                                                                                                                                                                                                                                                        DB course
                                                                                                                                                                                                                                                                                                                                                                                                        https://astro-pure.js.org/blog/DB%20course
                                                                                                                                                                                                                                                                                                                                                                                                        Author rubbishzyc
                                                                                                                                                                                                                                                                                                                                                                                                        Published at April 16, 2023