First we need to agree on a definition of terms, and hopefully much of what you are trying to understand can be accomplished during that process.
There is digitization (which we need to understand to understand encoding):
This is the same as analog to digital conversion, which is done by taking samples of audio or video or still pixels at regular intervals of time (video or audio) or physical distance (geographical location of pixels for video or still pictures), and storing them as binary coefficient values, which places the media into the digital domain where it is represented only by a series of numbers.
The idea is that this information will be enough to represent the entire image/sound/whathaveyou when converted back to analog (which all sounds and images are converted back to at some point, even if that happens at the level of your retina) because as humans we only have the capability of perceiving analog images or sounds.
Why digitize at all? Because it provides a lot of benefits in storage, portability, manipulation, freedom from analog artifacts, and preservation of quality that can't be eroded in a hostile transport environment the way analog signals can. In the analog domain the message is married directly to the medium, and if something happens to the medium (lightning briefly creates a change in modulation amplitude of an AM radio signal, for instance) it alters the message (perceived as static, in that case). In digital, the information is in the realm of numbers, fully divorced from the medium, and the only thing that can change it is performing a purposeful mathematical operation on it, making it invulnerable to change otherwise, and giving us complete control over it.
HD video, for instance, has a pixel map (rows of pixels extending down the raster) and each pixel is represented by at least one binary coefficient. Each luminance pixel gets assigned a particular value, and each group of 4 pixels (two in one line and the two below them) get a chrominance value for R-Y and for B-Y (for 4:2:0). For video, the digital words created are usually 8, 10, or 12 bits in depth.
Audio is sampled typically at either 48k times per second or 44.1k, and binary coefficients with a bit depth of 16 to 24 (sometimes higher and more often) are created.
Digitization then basically turns an analog sound or image into a series of binary numbers, either to facilitate transportation, further manipulation, or for quality preservation, or all three.
Then there is encoding:
That usually means either formatting the digital information into a particular protocol, such as the 188-byte packet structure of digital TV, but it also usually means that there is some sort of compression involved. Compression, whether it be JPEG or H.265, uses intelligence to figure out what parts of the digitized file can be discarded or thrown away without us noticing, which makes the file size smaller without hopefully creating visual or aural artifacts. It uses perceptually-based encoding, meaning that it discards only the parts of the file that human perception is not sensitive to. Of course that is done with varying levels of success.
Most compression schemes are based on a DCT transform. MPEG2 is a common codec for video, and it uses some 30-odd different algorithms in its toolbox chained together to often achieve compression ratios of 100:1 or better, meaning that 99% of the image is discarded forever and completely (and hopefully later reapproximated properly in the decoder). Newer codecs are even more sophisticated and more complex.
Note that "encoding" can also refer to digitization, because that process "encodes" analog information into the digital domain of binary-coded words. But in practice that usually implies encoding by protocol and also by compression as well, so the meaning of the word then becomes a little grey and fuzzy, and usually the definition depends on the frame of reference.
And there is rendering:
Rendering may have a wide definition, but in professional video editing and in graphics programs like Photoshop it usually refers to creating a new simpler or single version of a complex file or set of files that combine to make an image, or flattening a complex file, or copying disparate elements of the editing or effects used to make a photo, video sequence, (or rarely an audio mix) and creating a single file that holds all of those relevant aspects.
For instance, in non-linear editing in Final Cut Pro you can create a sequence that will play back, but there is no true physical editing involved; the sequence is based on a set of pointers in an edit decision list that can be accessed instantly, so when you play the edited sequence in the editor it is simply jumping from pointer location to pointer location referencing the original video clips in storage. It appears to be a coherent whole in a serial sequence, but until it is rendered it actually is not.
Rendering copies the sequence into a new single file, so that it can be easily moved or transported from system to system, or stored economically and simply, or played by simpler playback systems or systems where it would be impractical to reference all of those separate files or parts. In Avid editing this is referred to as consolidation, which is probably a more-accurate term. It is also analogous to an audio mixdown where a large number of tracks are rendered down into a smaller number of tracks.
In PhotoShop you may have an image that you are editing that is based on multiple separate layers. Once created each edited layer exists in only RAM, and only the original source image exists as a separate stored file; a copy of the original source image as well as the new layers and adjustments exist as separate entities in RAM, and there is no separate file that represents the edited file. Even after you save it as a psd document the various aspects of it are separate entities existing as separate layers and individual adjustments.
But if you have edited the image to a point where you are sure you do not to make further changes that would involve serious backtracking, you can render that image into a separate single file, or flatten it, to use the vernacular. That has the advantage of freeing up RAM and application resources for other work, and gives the new finished image portability (and typically reduces the file size as an economic advantage). It also locks it into the current edited format.
The psd file is a single file if you merely save it, but it only makes sense in a PhotoShop environment, where loading that file loads all of those still-separate parameters and layers separately into RAM once again. Render or flatten it, and you have a smaller (file size), fixed, single-layer document that can be read in other programs as a single entity and can no longer be reverted to earlier versions (which is why we save copies when rendering).
Rendering mostly got its name from the days of slower computers, when to even get the full image it might take days of hours to create a single image. Think of the CSI techs spending hours trying to render the information from a fuzzy pic of a license plate into something sharp as a tack (OK, not really all that possible in real life). But once rendered into a new, usually smaller, separate file, that file can be manipulated in real time outside of the PhotoShop environment, locked into its current format, copied and pasted, and transported.
These days most of what would have been considered rendering back in the day can now happen so fast that it approaches or exceeds real time, so it may still technically be rendering, but that may be totally transparent to the user. So the faster our computers become, the closer to extinction the term "rendering" becomes.