In this commit, we fix an issue that was recently introduced as a result
of migration #10. The new TLV format ended up modifying the
serialization functions called in `serializePaymentAttemptInfo`.
Migration #9, also used this `serializePaymentAttemptInfo` method to
serialize the _new_ (pre TLV, but new payment attempt structure) routes
into the database during its migration. However, migration #10 failed to
copy over the existing unmodified `serializePaymentAttemptInfo` method
into the legacy serialization for migration #9. As a result, once
migration #9 was run, the routes/payments were serialized using the
_new_ format, rather than the format used for v0.7.1. This then lead to
de-serialization either failing, or causing partial payment corruption
as migration #10 was expecting the "legacy" format (no TLV info).
We fix this issue by adding a new fully enclosed
`serializePaymentAttemptInfoMigration9`method that will be used for
migration #9. Note that our tests didn't catch this, as they test the
migration in isolation, rather than in series which is how users will
encounter the migrations.
Fixes#3463.
Previously the invoice registry wasn't aware of replayed htlcs. This was
dealt with by keeping the invoice accept/settle logic idempotent, so
that a replay wouldn't have an effect.
This mechanism has two limitations:
1. No accurate tracking of the total amount paid to an invoice. The total
amount couldn't just be increased with every htlc received, because it
could be a replay which would lead to counting the htlc amount multiple
times. Therefore the total amount was set to the amount of the first
htlc that was received, even though there may have been multiple htlcs
paying to the invoice.
2. Impossible to check htlc expiry consistently for hodl invoices. When
an htlc is new, its expiry needs to be checked against the invoice cltv
delta. But for a replay, that check must be skipped. The htlc was
accepted in time, the invoice was moved to the accepted state and a
replay some blocks later shouldn't lead to that htlc being cancelled.
Because the invoice registry couldn't recognize replays, it stopped
checking htlc expiry heights when the invoice reached the accepted
state. This prevents hold htlcs from being cancelled after a restart.
But unfortunately this also caused additional htlcs to be accepted on an
already accepted invoice without their expiry being checked.
In this commit, the invoice registry starts to persistently track htlcs
so that replays can be recognized. For replays, an htlc resolution
action is returned early. This fixes both limitations mentioned above.
As the logic around invoice mutations gets more complex, the friction
caused by having this logic split between invoice registry and channeldb
becomes more apparent. This commit brings a clearer separation of
concerns by centralizing the accept/settle logic in the invoice
registry.
The original AcceptOrSettle method is renamed to UpdateInvoice because
the update to perform is controlled by the callback.
This commit adds a set of htlcs to the Invoice struct and
serializes/deserializes this set to/from disk. It is a preparation for
accurate invoice accounting across restarts of lnd.
A migration is added for the invoice htlcs.
In addition to these changes, separate final cltv delta and expiry
invoice fields are created and populated. Previously it was required
to decode this from the stored payment request. The reason to create
a combined commit is to prevent multiple migrations.
This commit adds an index bucket, disabledEdgePolicyBucket, for those
ChannelEdgePolicy with disabled bit on.
The main purpose is to be able to iterate over these fast when prune is
needed without the need for iterating the whole graph.
The entry points for accessing this index are:
1. When updating ChannelEdgePolicy - insert an entry.
2. When deleting ChannelEdge - delete the associated entries.
3. When querying for disabled channels - implemented DisabledChannelIDs
function
This commit modifies the nodeWithDist struct to use a route.Vertex
instead of a *channeldb.LightningNode. This change, coupled with
the new ForEachNodeChannel function, allows the findPath Djikstra's
algorithm to cut down on database lookups since we no longer need
to call the FetchOtherNode function.
This commit specifies two bbolt options when opening the underlying
channel and watchtower databases so that there is reduced heap
pressure in case the bbolt database has a lot of free pages in the
B+ tree.
Previously the migration would fail if the source node was not set in
the database. Since we know that the source node must have been set
before making any payments, we check whether there actually are any
payments to migrate, and return early if not.
This commit makes the router use the ControlTower to drive the payment
life cycle state machine, to keep track of active payments across
restarts. This lets the router resume payments on startup, such that
their final results can be handled and stored when ready.
This commit gives a new responsibility to the control tower, letting it
populate the payment bucket structure as the payment goes through
its different stages.
The payment will transition states Grounded->InFlight->Success/Failed,
where the CreationInfo/AttemptInfo/Preimage must be set accordingly.
This will be the main driver for the router state machine.
migrateOutgoingPayments moves the OutgoingPayments into a new bucket format
where they all reside in a top-level bucket indexed by the payment hash. In
this sub-bucket we store information relevant to this payment, such as the
payment status.
To avoid that the router resend payments that have the status InFlight (we
cannot resume these payments for pre-migration payments) we delete those
statuses, so only Completed payments remain in the new bucket structure.
This commit changes the format used to store payments within the
DB. Previously this was serialized as one continuous struct
OutgoingPayment, which also contained an Invoice struct we where only
using a few fields of. We now split it up into two simpler sub-structs
CreationInfo, AttemptInfo and PaymentPreimage.
We also want to associate the payments more closely with payment
statuses, so we move to this hierarchy:
There's one top-level bucket "sentPaymentsBucket" which contains a set
of sub-buckets indexed by a payment's payment hash. Each such sub-bucket
contains several fields:
paymentStatusKey -> the payment's status
paymentCreationInfoKey -> the payment's CreationInfo.
paymentAttemptInfoKey -> the payment's AttemptInfo.
paymentSettleInfoKey -> the payment's preimage (or zeroes for
non-settled payments)
The CreationInfo is information that is static during the whole payment
lifcycle. The attempt info is set each time a new payment attempt
(route+paymentID) is sent on the network. The preimage is information
only known when a payment succeeds. It therefore makes sense to split
them.
We keep legacy serialization code for migration puproses.
This commit adds persisted status bit-field to ClientSessions, that can
be used to modify behavior of their handling in the client. Currently,
only a default CSessionActive status is defined. However, the intention
is that this could later be used to signal that a session is abandoned
without needing to perform a db migration to add the field. As we move
forward with testing, this will likely be useful if a session gets
borked and we need a simple method of the client to temporarily ignore
certain sessions.
The field may be useful in signaling other types of status changes,
though this was the primary motivation that warranted the addition.
This commit is the final step in making the link unaware of invoices. It
now purely offers the htlc to the invoice registry and follows
instructions from the invoice registry about how and when to respond to
the htlc.
The change also fixes a bug where upon restart, hodl htlcs were
subjected to the invoice minimum cltv delta requirement again. If the
block height has increased in the mean while, the htlc would be canceled
back.
Furthermore the invoice registry interaction is aligned between link and
contract resolvers.
In this commit, we refactor DeleteChannelEdge to use ChannelIDs rather
than ChannelPoints. We do this as the only use of DeleteChannelEdge is
when we are pruning zombie channels from our graph. When running under a
light client, we are unable to obtain the ChannelPoint of each edge due
to the expensive operations required to do so. As a stop-gap, we'll
resort towards using an edge's ChannelID instead, which is already
gossiped between nodes.
This commit removes the MarkEdgeZombie method from channeldb. This
method is currently not used in any live code paths in production, and
is only used in unit tests. However, incorrect usage of this method
could result in an edge being present in both the zombie and channel
indexes, which deviates from any state we would expect to see in
production. Removing the method will help mitigate the potential for
writing incorrect unit tests in the future, by forcing zombie edges to
be created via the relevant, production APIs, e.g. DeleteChannelEdge.
The existing unit tests that use this method have been modified to use
the DeleteChannelEdge instead. No regressions were discovered in the
process.
This commit modifies FetchChanInfos to skip any channels that are not in
the graph at the time of the call. Currently the entire call will fail
if the edge is not found, which stalls a gossip sync in the following
scenario:
1. Remote peer queries for a channel range
2. We return the set of channel ids in that range
3. A channel from that set is removed from the graph, e.g. via close.
4. Remote peer queries for removed edge, causing the query to fail.
To remedy this, we will now skip any edges that are not known in the
database at the time of the query. This prevents the syncer state
machines from halting, which otherwise could only be resolved by
disconnecting and reconnecting.
This commit modifies FilterKnownChanIDs to skip edges that
we ourselves have deemed zombies. This prevents us from requesting
the updates from them, as this wastes bandwidth and cpu cycles.
During the restore process, it may be possible that we have already
heard about our prior edge from a node on the network (or our channel
peers). As a result, we shouldn't exit if this happens, and instead
should continue with the rest of the restoration process.
In this commit, we modify the `AddrsForNode` method to not fail if no
graph node is found. We do this as when the backup is being
restored/created, it's possible that we don't yet have a
NodeAnnouncement for that node.
In this commit, we move the location where we restore the channel status
to within the `RestoreChannelShells` method itself. Before this commit,
we attempted to use `ApplyChanStatus` which creates a DB transaction and
relies on a fully populated channel state, which in the restoration
case, we don't yet have.
In this commit, we extend the graph's FetchChannelEdgesByID and
HasChannelEdge methods to also check the zombie index whenever the edge
to be looked up doesn't exist within the edge index. We do this to
signal to callers that the edge is known, but only as a zombie, and the
only information that we have about the edge are the node public keys of
the two parties involved in the edge.
In the event that an edge does exist within the zombie index, we make
an additional check on edge policies to ensure they are not within the
router's pruning window, indicating that it is a fresh update.
We mark the edges as zombies when pruning them to ensure we don't
attempt to reprocess them later on. This also applies to channels that
have been removed from the graph due to being stale.
In this commit, we add a zombie edge index to the database. This allows
us to quickly determine across restarts whether we're attempting to
process an edge we've previously deemed as zombie.
In this commit, we convert all the `Update` calls which are serial, to
use `Batch` calls which are optimistically batched together for
concurrent writers. This should increase performance slightly during the
initial graph sync, and also updates at tip as we can coalesce more of
these individual transactions into a single transaction.
This commit modifies the invoice registry to handle invoices for which
the preimage is not known yet (hodl invoices). In that case, the
resolution channel passed in from links and resolvers is stored until we
either learn the preimage or want to cancel the htlc.
In this commit, we update the `TestChanSyncFailure` method to pass given
the new behavior around updating borked channel states. In order to do
this, we add a new method to allow the test to clear an existing channel
state. This method may be of independent use in other areas in the
codebase in the future as well.
In this commit, we modify the WitnessCache's
AddPreimage method to accept a variadic number
of preimages. This enables callers to batch
preimage writes in performance critical areas
of the codebase, e.g. the htlcswitch.
Additionally, we lift the computation of the
witnesses' keys outside of the db transaction.
This saves us from having to do hashing inside
and blocking other callers, and limits extraneous
blocking at the call site.
In this commit, we introduce a migration for the message store
sub-bucket that will migrate all keys within it to a new key format.
This new key format is composed of the peer's public key, followed by
the short channel ID, followed by the message type. This migration is
needed in order to provide backwards-compatibility with messages that
were previously stored before the introduction of the new key format.
In this commit, we add a new type (ChannelShell) along with a new
method, RestoreChannelShells which allows a caller to insert a series of
channel shells into the database. These channel shells will allow a
restored node to initiate the DLP protocol and recover their set of
existing channels.
When we insert a channel shell, we re-create the original link node, and
also add the outgoing edge to the channel graph. This way we can be sure
that upon start up, we attempt to connect to the remote peers, and that
the normal graph query commands will operate as expected.
If the ChanStatusRestored flag is set, then we don't need to write or
read the set of commits for a channel as they won't exist. This will be
the case when we restore a channel from an SCB.
The new syncNewChannel function will allow callers to insert a new
channel given the OpenChannel struct, and set of addresses for the
channel peer. This new method will also create a new LinkNode for the
peer if one doesn't already exist.
In this commit, we modify the String() method of the ChannelStatus type
to reflect the fact that it's a flag set. With these new changes, we'll
now print the variable name of each assigned bit with a bar delimiting
them all.
In this commit, we add a prefix naming scheme to the ChannelStatus enum
variables. We do this as it enables outside callers to more easily
identify each individual enum variable as a part of the greater
enum-like type.
In this commit, we add a new AddrsForNode method. This method will allow
a wrapper sturct to implement the new chanbackup.LiveChannelSource
method which is required to implement the full SCB feature set.
If the max_htlc field is not found when fetching a ChannelEdgePolicy
from the DB, we treat this as an unknown policy.
This is done to ensure we won't propagate invalid data further. The data
will be overwritten with a valid one when we receive an update for this
channel.
It shouldn't be very common, but old data could be lingering in the DB
added before this field was validated.
Adding this field will allow us to persist an edge's
max HTLC to disk, thus preserving it between restarts.
Co-authored-by: Johan T. Halseth <johanth@gmail.com>
In this commit:
* we partition lnwire.ChanUpdateFlag into two (ChanUpdateChanFlags and
ChanUpdateMsgFlags), from a uint16 to a pair of uint8's
* we rename the ChannelUpdate.Flags to ChannelFlags and add an
additional MessageFlags field, which will be used to indicate the
presence of the optional field HtlcMaximumMsat within the ChannelUpdate.
* we partition ChannelEdgePolicy.Flags into message and channel flags.
This change corresponds to the partitioning of the ChannelUpdate's Flags
field into MessageFlags and ChannelFlags.
Co-authored-by: Johan T. Halseth <johanth@gmail.com>
This commit is a preparation for the addition of new invoice
states. A database migration is not needed because we keep
the same field length and values.
In this commit, we address an issue with the FetchWaitingCloseChannels
method where it would not properly return channels that are unconfirmed
and also have an unconfirmed closing transaction because they were
filtered out. We fix this by fetching channels that remain unconfirmed
that are also waiting for a confirmed closing transaction.
This will allow the recently added test TestFetchWaitingCloseChannels to
pass.
In this commit, we add a test case for FetchWaitingCloseChannels to
ensure it behaves as intended. Currently, this test fails due to not
fetching channels which are pending to be closed but are also pending to
be opened. This will be fixed in the following commit and should allow
the test to pass.
Instead return ErrGraphNodeNotFound directly. If the node bucket was
created it would be empty, and the call delChannelByEdge ->
fetchChanEdgePolicies -> fetchChanEdgePolicy ->
deserializeChanEdgePolicy -> fetchLightningNode would return this error
anyway.
This commit adds an optional field LastChanSyncMsg to the
CloseChannelSummary, which will be used to save the ChannelReestablish
message for the channel at the point of channel close.
This commit adds a new file legacy_serialization.go, where a copy of the
current deserializeCloseChannelSummary is made, called
deserializeCloseChannelSummaryV6.
The rationale is to keep old deserialization code around to be used
during migration, as it is hard maintaining compatibility with the old
format while changing the code in use.
In this commit, we add a method to the ChannelGraph struct that
determines whether a node is seen as public based on graph's source
node's point of view.
Previously a call to QueryInvoices with reversed=true and index_offset=1
would make the cursor point to the first available invoice (num 1) that
would be returned as part of the response. This is inconsistent with the
othre indexes, so we instead just return an empty list in this case.
A test case for this situation is also added.
In this commit, we ensure that we only create the sub-bucket for
channels once: at the time of creation. We do this as otherwise it's
possible that a method that mutates a channel's state is called after it
has already been closed on-chain, leading to the channel bucket being
recreated.
Using AbandonChannel, a channel can be abandoned. This means
removing all state without any on-chain or off-chain action.
A close summary is the only thing that is stored in the db after
abandoning.
A specific close type Abandoned is added. Abandoned channels
can be retrieved via the ClosedChannels RPC.
This commit modifies the edge update index
migration to avoid repetitive calls to
Cursor.First(). Instead, we construct a set
of all edge update keys to remove, and then do
a second pass to remove each from the bucket.
Additional log messages are included to notify
users on progress, as some have reported long
migration times.
In this commit, we fix a bug in the latest database migration when
migrating from 0.4 to 0.5. There's an issue in bolt db where if one
deletes a bucket that has a key with a nil value, it thinks that's a sub
bucket and attempts a bucket deletion. This will fail as it's not
actually a sub-bucket. We get around this by using a cursor to manually
delete items in the
bucket.
Fixes#1907.
In this commit, we fix a bug in the latest migration that could cause
the migration to end in a panic. Additionally, we modify the migration
to exit early if the bucket wasn't found, as in this case, no migration
is required.
Fixes#1874.
In this commit, we no longer assume that the bucket hierarchy has been
created properly when applying the latest DB migration. On older nodes
that never obtained a channel graph, or updated _before_ the query sync
stuff was added, then they're missing buckets that the migration expects
them to have.
We fix this by simply creating the buckets as we go, if needed.
In this commit, we fix a bug in the bucket creation code in
createChannelDB. This bug can cause migrations on older nodes to fail,
as we expect the bucket to already have been created. With this commit,
we ensure that all the buckets under the main node and edges bucket are
properly created. Otherwise, a set of the newer migrations will fail to
apply for nodes updating from 0.4.
In this commit, we move the declaration of the key for an unused bucket.
In the past, this bucket was used to store the revocation forwarding
package log. However, this has been moved under the key
`fwdPackagesKey`.
In this commit, we account for the additional case wherein the
announcement hasn't yet been written with the extra zero byte to
indicate that there aren't any remaining bytes to be read. Before this
commit, we accounted for the case where the announcement was written
with the extra byte, but now we ensure that legacy nodes that upgrade
will be able to boot properly.
In this commit, we add a new limit on the largest number of extra opaque
bytes that we'll allow to be written per vertex/edge. We do this in
order to limit the amount of disk space that we expose, as it's possible
that nodes may start to pad their announcements adding an additional
externalized cost as nodes may need to continue to store and relay these
large announcements.
In this commit, we add a mirror set of fields to the ones we recently
added to the set of gossip wire messages. With these set of fields in
place, we ensure that we'll be able to properly store and re-validate
gossip messages that contain a set of extra/optional fields.
In this commit, we ensure that we de-duplicate the set of channel edges
returned from ChanUpdatesInHorizon. Other subsystems within lnd use this
method to retrieve and send all the channels with updates within a time
series to network peers. However, since the method looks at the edge
update index, which can include up to two entries per edge for each
policy, it's possible that we'd send channel announcements and updates
twice, causing extra bandwidth.
timestamps
In this commit, we ensure policies for edges we create in
TestChanUpdatesInHorizon have different update timestamps. This ensures
that there are two entries per edge in the edge update index. Because of
this, the test will fail because ChanUpdatesInHorizon will return
duplicate channel edges due to looking at all the entries within the
edge update index. This will be addressed in a future commit to allow
the set of tests to pass once again.
In this commit, we introduce a migration to fix some of the recent
issues found w.r.t. the edge update index. The migration attempts to fix
two things:
1) Edge policies include an extra byte at the end due to reading an
extra byte for the node's public key from the serialized node info.
2) Properly prune all stale entries within the edge update index.
As a result of this migration, nodes will have a slightly smaller in
size channeldb. We will also no longer send stale edges to our peers in
response to their gossip queries, which should also fix the fetching
channel announcement for closed channels issue.
In this commit, we extend TestChannelEdgePruningUpdateIndexDeletion test
to include one more update for each edge. By doing this, we can
correctly determine whether old entries were properly pruned from the
index once a new update has arrived.
Due to entries within the edge update index having a nil value, the
tests need to be modified to account for this. Previously, we'd assume
that if we were unable to retrieve a value for a certain key that the
entry was non-existent, which is why the improper pruning bug was not
caught. Instead, we'll assert the number of entries to be the expected
value and populate a lookup map to determine whether the correct entries
exist within it.
In this commit, we fix a lingering issue within the edge update index
where entries were not being properly pruned due to an incorrect
calculation of the offset of an edge's last update time. Since the
offset is being determined from the end to the start, we need to
subtract all the fields after an edge policy's last update time from the
total amount of bytes of the serialized edge policy to determine the
correct offset. This was also slightly off as the edge policy included
an extra byte, which has been fixed in the previous commit.
Instead of continuing the slicing approach however, we'll switch to
deserializing the raw bytes of an edge's policy to ensure this doesn't
happen in the future when/if the serialization methods change or extra
data is included.
In this commit, we fix an off-by-one error when slicing the public key
from the serialized node info byte slice. This would cause us to write
an extra byte to all edge policies. Even though the values were read
correctly, when attempting to calculate the offset of an edge's update
time going backwards, we'd always be incorrect, causing us to not
properly prune the edge update index.
This commit splits FetchPaymentStatus and
UpdatePaymentStatus, such that they each invoke
helper methods that can be composed into different
db txns. This enables us to improve performance on
send/receive, as we can remove the exclusive lock
from the control tower, and allow concurrent calls
to utilize Batch more effectively.