MAvideo

V
Vision which runs like a red thread through the video work and gives a IRL point of reference to bring the viewer back to a point of connecting because they are being looked at while they are looking.
At the same time this progression of the eye, might suggest a humanoid AI forming. Beginning to take shape.

The eye continues to change as well.

NATURAL SETTINGS / SCENES

THE ANIMAL

Gottesanbetterin
Stabheuschrecke

VISION / INTELLIGENCE

how do they actually see ?!

SET

Flower arrangement and Insekts

TECH

Use of Xtra Macro cameralens.

THERMAL VISION AS NIGHT VISION ( GROUND LEVEL) MEETS HUMANS

THERMAL VISION DRONE

UNDERWATER VARIATIONS FROM SHALLOW TO DEEP

3D ANIMATED MODEL RENDERINGS (transition)

TECH

GoPro being moved and pulled over the branches

TIME

April / Mai FRÜHLING IM WALD

FOCUS

Movement as defining element.
How to come close to something, focus it, or move away from it.
e.g. following a human through the night.

FOCUS

Place/Location determines side of what type of underwater creature:

(1) Deep underwater/ Deep lake: Lake with divers in Germany

(2) Shallow water, warmer Temperature

A DIGITAL SHADOW - THE SIMULATION TO A REAL WORLD PLACE

DIGITAL SETTINGS / SCENES

Do I really need the photogrammetry? Is it really adding here? As I've never really seen photogrammetry of nature look 'good' , rather of buildings or artificial things. So then an option would be to use a nature point cloud but I've also seen this before and it's not really adding anything new.. so let's leave it out. Because I don't want to throw EVERYTHING in a pot. I want to make a selection and make that my own.

DECISIONS ON SHOTLIST

The split screen as a stylistic element can show on the one side sometimes the watching eye and on the other the watched 'other visions'. It is the computer vision learning.

I DO feel linked to the idea of having the video not be cut and dry finished video but a GENERATIVE AUDIOVISUAL ARTWORK in some way.

Maybe it's not by cutting with algorithm but letting the work come alive - letting the AI inside come alive - through
.... generative sound?
.................................?

* Also like the idea of presenting it in a way like the 'machine inside is breathing' or 'living' - don't yet know what this can mean in presentation terms.

Instead I think it can work nicely to use a 3D model in a 'start close and zoom out to reveal' approach. With correct and smart lighting of the model it could have an epic impact with right sound) - very similar to Emilija Škarnulytė t1/2 of architectural model.

FOCUS

The idea is to play on visual analogy. To create isolated shapes in the dark or 1/2 more landscape scenes that simulate nature, without actually being nature.

TECH

Blender

*
This video material will also be used for the LED Fan display

*Option: to add night vision in colour grade and not use original night vision camera

PRODUCTION

1 studio/set day of filming filming flower/ insect set

Der Versuch zu verstehen - Fragen stellen - Was machen wir damit - spielerischer Umgang und dadurch den Inhalt schon auf eine Weise versteht.

spandauer forst Rohrpfuhl

Constant Flux between the Computer generated and the real; Generator vs. Discriminator

The GAN morphing always has a sort of 'searching to settle' quality about it - so I could imagine this going through the video as the 'stream of generating' , always learning , always evolving and the other footage growing from/extending out of , ejecting from this stream. (* maybe 'styletransfer' is an option to create smooth transitions? or camera mapping ? or layering? ) And the main 'eye' materialising more and more and possibly even injecting 'itself' as a simulated human entity in to the footage - some footage could be 'input Data'/untouched and some could be 'tampered with' - e.g. putting in a human figure but with an effect.

The effects could be used as in a way the AI learned different anatomical characteristics of all different earthly beings and then applies them itself onto the generated footage. So they aren't literal but e.g. extremely blurriness on moving object or rays coming from a figure or thing.

** The idea of the 'injected form' can already take shape in the GAN footage.

** The GAN sequence itself can have parts where it is much more 'real world' set and giving recognisable landscapes - almost simulating a reality - and then it becomes less defined again and has more indiscernible objects and shapes.

Parts that are 'faster' more 'in flux' more hectic > It could be on the one hand: the morphing GAN sequence > the 'Generator' creating/ producing imagery as an actual GAN / machine learning network + the 3rd party video files edited in quick succession; the 'INPUT DATA' / The discriminator

AND on the other: the shot footage of nature that has a slower/longer rhythm/ tempo

The video might well be about patterns (in a way) about repetition in nature about mathematical , physical phenomena in nature.

The GANs can really allow me to open up new, additional worlds.

The question is where do I want to go? Should there be stronger overlap/similarity with the shot footage?
> wooded landscape ( landscapes that look as 'real world' as possible.)
> micro insect worlds
> underwater worlds

+ > micros coping single cell creatures? ( more undefined ; more in between )

+ > morphing with technology?

> the uncanny
> the unknown
> playing with perception

GAN 'imagined insertion/Injection'

'Real world insertion/Injection'

This consciousness of the eye is mainly fed with just image data which we too can see on the screen or is also textual data being given? Can the video consciousness become visible in the exhibition space by how the video is installed? or with what additionally?

The GENERATOR creating versions of our world

In between forms of 'becoming'

Light such model 'correctly' and then do close up camera-movement through the landscape

Actual heatcamera - built in drone
FLIR Thermal camera

In the editing I can - after first editing tests - now say that I think it can only work when creating smaller 'live footage clusters' which are then divided from each other through the reoccurring 'eye' component and the GAN sequences that can pick up on a similarity.

I also think I need rather LESS than more DIVERSITY. Because otherwise it will feel more hectic and disorienting. I think the video will build a stronger flow and be able to transmit the idea more clearly if we stay in one type of vision for longer/extended takes.

I realise only now that not only do I come at the image forming from the animal perception side - but now also from the machine side, what is computer vision? how do machines 'see' / 'learn' ? I still need to delve into this myself and with this fuller information image/footage selection and pairings and what kind of footage to shoot will be informed.

https://www.daz3d.com/simply-grass--grass-plants-and-clumps

https://www.daz3d.com/basic-nature-pack

https://www.daz3d.com/simply-grass-the-rough-stuff

Expanded Cinema = "expanded cinema is required for a new consciousness."
"The Intermedia network of the mass media is contemporary man's environment, replacing nature."
He uses recent scientific research into cellular memory and inherited memory to support his claim that this network conditions human experience. The Noosphere (a term Youngblood borrows from Teilhard de Chardin) is the organizing intelligence of the planet—the minds of its inhabitants. "Distributed around the globe by the intermedia network, it becomes a new technology that may prove to be one of the most powerful tools in man's history"

Based on the selection of the animals the next step now (08.04.22) is:

1_ DOP to make selection of Camera types and lenses.
2_ Location definition (what am I looking for)
3_ Location scouting excursions (where can I find what I am looking for)
4_ Shooting order and shooting schedule

The eyes and eye-blinking: can integration of blinking be a helpful transition technique in the editing if not done too often? And could it even allow for portrait of the animal's POV with cuts ( instead of only 1 one-take?)

Could think about how we have created special mics that mimic human ears for binaural sound recording and to create such apparatus for recording in respect to animal's hearing.

We humans are so heavily image/sight driven but for some animals sound is much more prominent and vital cue.

SHOOTINGSCHEDULE + LOCATIONSCOUTING

Working with different filters for different depths in a disparity map.
Do we need a Depth sensing camera like the kinect for this? Maybe not because we are working with movement in the image? Not necessarily depth?

"TouchDesigner will allow you to connect “everything to everything” in the digital media world. Everything that can be digitised can become an input to a TD application and everything that can be controlled digitally can be output from TD."

To communicate sound based information in a non-auditory way, sound is visualised often in graphs or other. Because of this I believe that there might be MORE starting point for data visualisation coming out of the sound research.

Art Direction - Set Design

Frog: possibly log for water and to be positioned in front of camera

Chameleon: large branch + plants

POST-PRODUCTION APPROACH

(1) Select Footage
(2) VFX > Tiere fertig bauen + für screening Format anpassen
() Develop Dataset visualisation
(3) Edit with Dataset visualisation

- DUE TO THE WIDE-ANGLE LENSE EVERYTHING IS FURTHER AWAY FROM THE CAMERA AND THE IMAGE IS GENERALLY MORE FLAT, THEREFORE HARDER TO GET REAL DETAIL, OR FOCUS IN FOREGROUND. I wish we would have discussed this more clearly because had a been away to which degree this is the case I might have favoured a different image quality over the very wide FOV

- DUE TO DISTORTING OF THE WIDE-ANGLE LENSE STITCHING THE IMAGE IS EVEN MORE DIFFICULT. I worry a bit about the outcome

Only animal that blinks is owl

"A recent study in Animal Behavior reveals that body mass and metabolic rate determine how animals of different species perceive time.

Time perception depends on how rapidly an animal's nervous system processes sensory information. To test this ability, researchers show animals a rapidly flashing light. If the light flashes quickly enough, animals (and humans) perceive it as a solid, unblinking light. The animal's behavior or its brain activity, as measured by electrodes, reveals the highest frequency at which each species perceives the light as flashing."

This will possible still change and get even longer when considering the CFFF (time perception) of these animals

ANIMALS WITH HIGHER CFF PERCEIVE TIME AS MOVING SLOWER ( often those with higher metabolism as image processing takes a lot of energy and small body size, especially if they are animals of prey)

ANIMALS WITH LOWER CFF can't process images as quickly and therefore PERCEIVE TIME AS MOVING FASTER

*ADD A BIOLOGY ATTACHMENT

For Compositing
(1) Give selected footage with preview name, raw name and timecode
(2) Compositing description and drawn sketches scanned in

For VFX
(1) Finished composited/rendered out footage
(2) VFX description document + ADD BIOLOGICAL INFOS + CFF per animal

"though much of this diversity is due to differences in the degree to which different animals in these taxa rely on vision, and as such probably doesn’t represent variation in the subjective experience of time." -Rethink priorities PDF

Notes for meeting with Tri

- Try-out: Can we make the compounds on the bee a bit more extreme in their breaking/overlap?
- Try-out: grasshopper: less purple to the polarisation effect
- Frog: create general blur in the backdrop, not only highlighting of the movement , Frogs only see movement more because everything else becomes a blur
- Falcon: show Leon's sketches , discuss also my other approach
- Mouse: blur?

EDITING: Vis a Vis

- The positions of the dataset of animal perception vis a vis the processing of a neural network will become more clear if the positions are at first not mixed but established on their own:

1) the dataset ( deconstruction and interpretation of the vision and hearing of animals ( biological neural networks))

face to face with

2) the ( audio + visual) processing of an artificial neural network (deconstructed and interpretation of computer vision process in low-level vision tasks)

<< both forms of 'vision' / 'perception' and the implicit difference of 'intelligence' are dealt with in similar way and become counterparts for one another

the act of processing the data becomes the binding/connecting element but carries less importance in the editing of the film and construction of the plot

*here: it is not important to show the network from the outside but to tell deep dive into neural network processing - not explained but experienced

This means:

1) * the representation of the dataset in void-space belongs to the chapter that establishes the dataset of 'animal intelligence'
* this needs time to be understood, what are we looking at, and give the time to dive into the multiple perspectives * sound and hearing - multiple perspectives
- in the edit this can be done by having the flashing image sequence of the dataset cut just before switching into the 3D void-space visualisation

2) * this part consists of showing: HOW an ANN processes the image (and sound) information provided by the dataset
> the cleanest way would be to use a network: train it on my data and show how the network 'sees' this data

> the direct comparison could be achieved by using the same exact running order of the network but then showing it as the optical flow and convolutional layer outputs

and ultimately break this down into machine-binarly-language-processing, as has been the plan

For these 'beings' /'entities' 'forms of intelligence' to 'face' one another they need to be separately established

The crux of the project is to use the same approach both towards biological /animal and artificial and make visible the implicit similarities and differences between the two.

Through this positioning of both 'entities' face to face the question of what is deemed / termed 'intelligent' is implicitly asked. The answer lies in the comparison and the viewer's own associations between both positions.

Was wenn ma die Arbeit nochmal anders denken könnte, also im Editing den Verarbeitungsprozess spürbar macht;
wenn man mit einem intakten Datensatzt beginnt der über die Zeit immer heruntergebrochener, immer vereinfachter, immer machineller und dadurch immer mehr für den Menschen unnachvollziehbarer wird. Quasi dieses herunterbrechnen durch die machine in seine Bestandteile erlebbar macht. Quasi den Datenverarbeitungsprozess über den Verlauf des Filmes simuliert. Am Ende bleibt die Linie, das 'aktionspotential'

Dramaturgie/ story-arch
moving from multiple highly complex perspectives into the relentlessly task focused machine.

Wenn ich mit einer Gegenüberstellung arbeite, dann muss ich auch mit einer Gegenüberstellung arbeiten und nicht alles zusammen matschen. Wenn 'vis a vis' das core-concept ist, muss das auch im Edit spürbar werden.

* with the animals I am also not showing a look outside - it's just straight into the perspective

2) the ( audio + visual) processing of an artificial neural network (deconstructed and interpretation of computer vision process in low-level vision tasks)

Within this I can follow two paths:
1) a convolutional network in low-level vision tasks that disassembles an image akin in some ways to what an eye does

2) to show how DIFFERENTLY artificial networks process images
> using a GAN trained on my data to show how a neural network seeeeees my data?!

It's not about using GANs or any neural network to execute on the speculative narrative and give 'truth' to my dataset, rather it is about using the data I created to give an insight into how computer vision works - to use my data not in a continuation of the narrative but as in-put data for an artistic research on computer vision.

The 'cleanest' way to give an insight into how the network sees my data is to actually give it to a network TRAINED on MY DATA and look at the outcome from that process.

2) the question once again is: WHICH ANNS ARE APPLICABLE TO MY USE CASE AND DATASET?

> video dataset
> about vision : ANNS involved in solving low-level vision tasks using video sequences: like MaskNet, FlowNet, DispNet used in Competitive Collaboration Framework

> Generative Adversarial Networks << why did I not decide to go with these from the star? because I thought they are not ANNs in the 'true' sense - they are a different form of 'machine learning' + I felt that the data I could provide is not appropriate to teaching a GAN because it is not a huge variety but as a video there are other aspects of more complex 'vision' that could be dealt with here. STYLEGAN is more about 'learning' the style of an image rather than engaging in aspects of 'low-level vision' from a fundamental perspective

However: "GANs require a lot of data because they use neural networks to generate new images and audio."
GANs are the form of network structure used to computer generate imagery - in this way they DO GIVE A SEEABLE SENSE OF HOW THE NETWORK 'SEES' THE PROVIDED DATA

AND WHAT DEFINES COMPUTER VISION - WHAT ARE THE CHARACTERISTICS

- lives from repetition not from 'abstraction' - 'learning' is not abstraction through meaning but fulfilling a task correctly through REPETITION
- datasets are LARGE with similarities + variety ( need for grouping while not over-fitting)
- datasets need to be homogenous in dimensions ( e.g. SQUARE 256x256; 1020 x 1020 , colour space ) > need to be STANDARDISED
- the 'bottleneck' through which the data needs to fit is very small or the network code will through an error
- problem-solving; task- based (task and yes/no feedback)